Volume 61, Number 7 OBSTETRICAL AND GYNECOLOGICAL SURVEY Copyright © 2006 by Lippincott Williams & Wilkins CME REVIEWARTICLE 21 CHIEF EDITOR’S NOTE: This article is part of a series of continuing education activities in this Journal through which a total of 36 AMA/PRA category 1 creditsTM can be earned in 2006. Instructions for how CME credits can be earned appear on the last page of the Table of Contents. Clinical Proteomics: A Novel Diagnostic Tool for the New Biology of Preterm Labor, Part I: Proteomics Tools Catalin S. Buhimschi, MD,* Carl P. Weiner, MD,† and Irina A. Buhimschi, MD* *Assistant Professor, Department of Obstetrics, Gynecology and Reproductive Science, Yale University School of Medicine, New Haven, Connecticut; and †Professor and Chair, Department of Obstetrics and Gynecology, University of Kansas School of Medicine, Kansas City, Kansas The molecular mechanisms regulating myometrial contractility and preterm premature rupture of the membranes leading to preterm birth are poorly understood. The completion of the human genome sequence led to the development of functional genomics and gene array technology to simultaneously identify candidate genes potentially involved in regulation of human parturition. However, the study of living systems can now be expanded past genomics based on the rationale that it is the protein products of the genes, not simply gene expression, that have effects and cause disturbances at the cellular level. Therefore, identification of disease biomarkers, followed by a description of their functional networks, has the potential to significantly aid the development of new strategies for the prediction, diagnosis, and prevention of preterm birth. Interest in mass spectrometry and its use as a new clinical diagnostic tool has grown rapidly and is poised to become an important medical field for the next century. Target Audience: Obstetricians & Gynecologists, Family Physicians Learning Objectives: After completion of this article, the reader should be able to state the general concept of proteomics, summarize the use of proteomics as a potential clinical tool as a biomarker of disease, and recall that proteomics can be a means for understanding mechanisms of disease states. Pregnancy is a special time during a woman’s reproductive life as a result of the unique physiology The authors have disclosed that they have no financial relationships with or interests in any commercial companies pertaining to this educational activity. The authors have disclosed that proteomic tools have not been approved by the U.S. Food and Drug Administration for diagnosis of human diseases and their application at this time remains for research purposes only. Lippincott Continuing Medical Education Institute, Inc. has identified and resolved all faculty conflicts of interest regarding this educational activity. Reprint requests to: Catalin S. Buhimschi, MD, Department of Obstetrics, Gynecology & Reproductive Science, Yale University School of Medicine, 333 Cedar Street, LCI 804, P.O. Box 208063, New Haven, CT 06520-8063. E-mail: catalin.buhimschi@yale.edu. and the presence of a developing fetus. Despite an impressive amount of effort and extensive research, our knowledge of parturition and fetal physiology remains limited. Scientists have exhaustively investigated over the past century “the timing of birth,” the development, physiology, and pathophysiology of the fetus, and its environment. Yet, our understanding of the biologic mechanisms that control the events initiating term or preterm delivery remains limited. As a direct consequence, we lack the therapeutic tools to block or circumvent the maladaptive process. Preterm delivery remains a major public health problem with lasting familial and societal repercussions (1). 481 482 Obstetrical and Gynecological Survey Prevention strategies have failed and the prevalence of preterm birth in the United States rose to an unprecedented 12.3% in 2003 (2). Preterm birth is associated with almost 70% of neonatal deaths and up to 75% of neonatal morbidity (1). It is critical that we develop a working understanding of the highly controlled and synchronized biochemical mechanisms that occur in the myometrium as it interacts with the fetal–placental compartments. For many years, research in preterm labor has concentrated on identifying and localizing individual factors when, similar to other pregnancyrelated disorders, preterm parturition involves complex molecular protein to protein interactions functioning in an interconnected cellular network regulated by receptors (3). Without a clear understanding of the intricate pregnancy microenvironment, the complexity of clinical management of preterm labor increases exponentially. In the absence of robust diagnostic tools, therapy cannot be provided or is delayed. This article has 2 goals: first, to familiarize the general obstetrician–gynecologist with the concept of proteomics and second, to provide a general overview of the role of clinical proteomics in the identification of disease biomarkers and the generation of protein profiles that make early diagnosis possible and reveal fundamental mechanisms that should lead to the first targeted treatments of preterm labor. INTRODUCTION TO GENERAL PROTEOMICS Proteomics Proteomics is the field of study that encompasses knowledge of the structure, function, and expression of all proteins in the biochemical or biologic context of an organism at a given moment (4). Developed in the postgenome era, the science of proteomics complements the genome initiative, which progressed during the 20th century from the original description of DNA to knowledge of the genes responsible for specific human diseases. The ultimate goal of identifying and sequencing of the human genome has become a reality (5). Tremendous advances in the field of genomics exploded during the sequencing of some 40,000 genes, paving the way for a new medical field, gene therapy (6,7). However, although genomics represents a significant advancement, the human genome fails to reflect the enormity and complexity of the human proteome. The concept of one gene:one protein has fallen by the wayside. The complex issue of posttranslational protein modification and variation in the sequence of amino acids can be addressed only by full knowledge of the proteome, not the genome. Unfortunately, mapping of the human proteome is an almost impossible goal to achieve, at least in the near future. Thus, it is appropriate to set more realistic, achievable goals. Until now, scientists have tended to concentrate on accumulating information about the nature of proteins and their absolute and relative levels in cells or different biologic compartments (8–10). Such data can be useful, but information inherent to the broader definition of proteomics must also be obtained if the true promise of preventing prematurity and its major complications is to be realized (6). Advances in protein analysis have provided valuable perspectives on the posttranscriptional regulation of gene expression and subtle protein–protein interactions (Fig. 1) (11,12). For example, there is a poor correlation between the abundance of mRNA and the amount of protein in human tissues, suggesting that posttranscriptional regulation of gene expression is common (11). This adds significantly to the difficulty of understanding a complex dynamic proteome. Acquiring knowledge of the function of thousands of proteins is a challenge and the means to support such endeavors must be provided. One response to new ideas or approaches is to claim they are not really new at all. In this instance, the distinction between protein chemistry and proteomics may be difficult for the unfamiliar to comprehend. Proteomics is not the science of protein biochemistry. Protein chemistry emphasizes the importance of understanding protein structure and function, and involves work toward complete sequence analysis and a mathematical representation of how the structure enables function (4). In contrast, proteomics is the study of multiprotein systems and focuses on the interplay of multiple, distinct proteins and their roles as part of a larger network. The analysis is directed toward complex mixtures and identification is based not necessarily on complete sequencing, but rather on partial sequence analysis aided by a large database and matching tools. In this respect, proteomics is functional biology, whereas protein chemistry is structural biology. There are 2 major issues confronting the field of proteomics: first is its breadth, because the network of proteins is often far larger than anticipated; and second, it is more complicated than sequencing genomes. To overcome these obstacles, new technologies are being constantly developed. Clinical Proteomics Y CME Review Article 483 Fig. 1. The cellular proteome at a relay from the genetic information to cellular function. Proteomic Tools Proteomics tools include protein separation and/or identification of proteins in biologic samples coupled to computational algorithms that allow the extraction of the relevant information from the totality of data. In its initial format, proteomics relied on high-resolution, 2-dimensional gel electrophoresis with isoelectric focusing in SDS-PAGE gels (2D-PAGE) This technique was used to separate, identify, quantitate, and catalog a large number of individual proteins present in complex samples such as cerebrospinal fluid, plasma, seminal fluid, and amniotic fluid (13–16). The first gel dimension allows separation according to protein charge by isoelectric focusing, whereas the second dimension allows separation by protein size. After separation, the proteins are visualized using gel staining procedures such as Coomassie blue, silver staining, or fluorescent tracers. Proteomics analysis using 2D-PAGE protein separation is often criticized because the process of image analysis necessary, to determine differential protein expression, can be laborious as a result of gel-to-gel variations that confound the analytic process. To eliminate this weakness, fluorescence 2-dimensional difference gel electrophoresis (DIGE) was developed (17). This technique allows multiple samples to be coseparated and visualized on one 2D-PAGE gel by labeling with different fluorescent dyes (Cy2, Cy3, and/or Cy5). Up to 3 images are captured on the gel using the Cy2, Cy3, and/or Cy5 excitation wavelengths. The images are then merged and differences between them deter- mined using image analysis software. To overcome the high cost of equipment as well as expendable supplies such as the fluorescent dyes, many academic institutions have implemented this technique in their protein core facility. Recent advances in mass spectrometry (MS) have allowed further refinements of the 2D-PAGE technique. Mass spectrometry now has the ability to identify and characterize picomole quantities of gel-separated proteins, making partial sequence analysis possible (18). The available instrumentation is highly sensitive, robust, and reliable for the analysis of peptides and integral proteins (19). Essentially, mass spectrometers consist of 3 parts: an ionization source, a mass analyzer, and an ion detector (Fig. 2) (20). The ionization source converts molecules into gas-phase ions, which are then separated by the mass analyzer and transferred to the ion detector. The mass spectrometer does not actually measure the molecular mass of the sought proteins directly, but rather the mass-to-charge ratio (m/z value) of the resulting ions. In many cases, the ions encountered in mass spectrometry have just one charge (z ⫽ 1), so the m/z value is numerically equal to the molecular (ionic) mass in Daltons. The mass analyzer uses physical properties such as their electric or magnetic field to separate ions by their m/z ratios. They can also be separated by their time of flight (TOF), the time it takes to reach the detector. The rule of thumb is that the larger the m/z 484 Obstetrical and Gynecological Survey Fig. 2. Components of a MALDI–TOF mass spectrometer with the data output (mass spectrum) shown below. ratio of the ion, the longer it will take to reach the detector. Two ionization techniques are commonly used: electrospray ionization and matrix-assisted laser desorption/ionization (MALDI). Electrospray ionization creates ions by the application of a electric potential to a flowing liquid such as a solvent, causing the fluid to charge and subsequently spray very small droplets containing the analyte (20). The solvent is then removed, most frequently by heat, and multiple charge ions are formed before their recognition by the ion detector. In matrix-assisted laser desorption/ionization, sample molecules are bombarded with a laser beam to induce ionization. The sample is premixed with a highly energy-absorbing molecule (a matrix compound), which transforms the laser energy into the excitation energy of the sample. This process leads to subsequent ejection of the matrix compound and ions into the gas phase of the mass analyzer so that they can be detected by the ion detector. The most commonly used instruments can be grouped into either single-stage mass spectrometers or tandem MS systems. Single-stage mass spectrometers, notably MALDI–TOF (matrix assisted laser desorption/ ionization time of flight), were used most frequently for large-scale protein identification from species with small or known genomes (18). Tandem MS instruments such as triple quadruple, ion trap, and the most recent advanced quadruple TOF (Q-TOF) allow protein identification by sequence database searching (19). The high accuracy of the Q-TOF technology makes the combination of MALDI–Q-TOF configurations the best for de novo protein sequencing. However, although accurate technology is essential for novel protein discovery, greater automation, increased comprehensiveness, and friendlier technology are equally important for the rapid and accurate diagnose of human diseases using proteomics. The goal of making a rapid diagnosis with the least amount of sample manipulation led to the development or surface-enhanced laser desorption/ionization (SELDI). When used in conjunction with protein chip arrays (21), SELDI allows for the isolation and identification in complex biologic samples of peptides and proteins with specific properties. Protein chip array assays using SELDI–TOF–MS technology provide a valuable research tool as a result of the multidimensional nature of protein separation, which can be optimized for complex mixtures of proteins. Moreover, SELDI can detect and quantitate both proteins and posttranslationally modified forms of these proteins in a single assay (22). The enhanced separation is made possible by using a variety of active surfaces on the protein chip arrays. The various array Clinical Proteomics Y CME Review Article surfaces interact differentially with constituents in the biologic sample based on their hydrophobicity (H4, H50 arrays), isoelectric point (SAX, WCX, CM10, Q10 arrays), metal affinity (IMAC arrays), or ability to bind to a specific antibody (PS10, PS20 arrays). Briefly, a crude biologic sample is placed on a small area of the protein chip array (spot). Specific chemical interactions occur between the chip surface and the biomolecules in the sample, depending on the binding conditions and chip surface. The protein chip surface preferentially binds some specific protein structures while repelling others, thus allowing selection of a specific set of proteins that can be subsequently subjected to mass spectrometry for further identification. By varying the chip surfaces, washing conditions, incubation times, laser intensities, and energy-absorbing molecules, an almost infinite number of experimental conditions can be designed for the optimal separation of one protein or a group of proteins from all the others. Such advances in proteomics would not be possible without the existence of large protein databases derived from organisms with known genomes. (23,24) The ExPASy server (ExPASy Proteomics Server, www.expasy.ch) and SwissProt database are perhaps the best examples, providing links to other proteomics servers around the world and tools for further sequence analysis. The continuous updating and validation of these databases are critical, because newly discovered posttranslational modifications and variants are published almost daily. Previously sequenced proteins and 2D-PAGE catalogs also stay at the core of proteomic discovery. Several databases include mapping of biologic fluids such as plasma, urine, cerebrospinal fluid, or tissues such as heart, kidney, and breast to allow experiments completed in the laboratory to be complemented by virtual experiments performed using bioinformatics tools (25). For example, one way to identify proteins that have been separated on gels (one-dimensional or 2-dimensional) is to subject them to trypsin digestion. Each protein generates a unique combination of fragments (tryptic digest fragments) whose masses allow insight into the identity of protein (3). This strategy for protein identification is known as peptide mass fingerprinting. Once there is a degree of confidence on the sought protein’s identity, more definite identification and confirmation can be achieved by de novo sequencing of tryptic fragments either by mass spectroscopy or using more traditional techniques such as specific antiprotein antibodies. The third essential tool of proteomics consists of computer software that can first, differentiate among 485 thousands of proteins characteristic for a living system and second, accurately identify the sequence of amino acids corresponding to a protein in the database. With the aid of specialized algorithms, the software compares the data with the information in the database. The investigator can analyze the results and evaluate the quality of the data much faster than with manual or visual discrimination. Given the wealth of heterogeneous proteomic data and the numerous bioinformatics tools available, scientists frequently are faced with the dilemma of which computer tools to use. Proteomics 2D-PAGE software such as Melanie offer sophisticated state-of-the-art analysis for the identification, quantification, and matching of the gels. Most combine comprehensive, advanced statistical and classification capabilities as well as versatile search engines and reporting functions. Melanie, PD Quest, Phoretics (for 2-dimensional gel analysis), and Ciphergen Biomarker Patterns software (for SELDI data) are just few of the software programs available. APPLICATIONS OF PROTEOMICS Proteomics Has 4 Principal Parts Data Mining Identification of proteins in a biologic sample. The goal is to catalog the proteome rather than to infer its composition based on gene expression. Differential Protein Expression Profiling The identification of multiple proteins in a biologic sample as a reflection of a particular state of the organism or cell (disease state). Expression profiling is essentially a more specialized form of mining and involves a differential analysis in which the proteomes of 2 conditions of the biologic system are compared (eg, disease–nondisease). Protein–Network Mapping Seeks to identify how proteins interact with one another in living systems. In reality, it is this interaction that determines the function of a biologic system. Proteomics offers the unique opportunity to characterize complex networks using affinity-capture techniques coupled with analytic proteomics methods. Protein mapping provides the opportunity to assess the status of all participants in the pathway simultaneously. 486 Obstetrical and Gynecological Survey Protein Modification Mapping Identifying how and where proteins are modified. Characterization of posttranscriptional protein modification is one of the most ambitious goals of proteomics. Although several techniques are used to detect modified proteins (eg, antibodies for specific phosphorylated amino acid residues or fragments of proteins), the precise sites of modification remain largely unknown. Over the last decade, proteome scientists have largely focused their attention on 2 major areas: expression proteomics, which seeks to quantify the up- or downregulation of proteins, and functional proteomics, which seeks to characterize protein activities, complex protein interactions, and signaling pathways (26). In the past, studies designed to investigate the expression of different proteins sought to first quantify and then compare the expression of proteins in abnormal versus normal clinical conditions. The ultimate goal was the recognition of a biomedical application; by this comparative approach, the newly identified proteins that differentiate the 2 conditions could be used as diagnostic biomarkers. Today, an emerging field, clinical proteomics, seeks to apply the science of proteomics in the search for biomarkers and the generation of protein profiles that can rapidly predict, diagnose, and monitor treatment of human diseases, including preterm birth (14,27–30). In part II, we will detail the emerging role of proteomics in identifying the causes of spontaneous preterm labor and birth. REFERENCES 1. Stoll BJ, Hansen NI, Adams-Chapman I, et al. National Institute of Child Health and Human Development Neonatal Research Network. Neurodevelopmental and growth impairment among extremely low-birth-weight infants with neonatal infection. JAMA 2004;292:2357–2365. 2. Hamilton BE, Martin JA, Sutton PD. Births: preliminary data for 2003. Natl Vital Stat Rep November 23, 2004;53(9). 3. Shankar R, Gude N, Cullinane F, et al. An emerging role for comprehensive proteome analysis in human pregnancy research. Reproduction 2005;129:685–696. 4. Lieber CD. Introduction to Proteomics. Tools for the New Biology. Totowa, NJ: Humana Press Inc, 2002. 5. Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science 2001;291:1304–1351. 6. Banks RE, Dunn MJ, Hochstrasser DF, et al. Proteomics: new perspectives, new biomedical opportunities. Lancet 2000; 356:1749–1756. 7. Zenclussen AC, Zenclussen ML, Ritter T, et al. The use of gene therapy tools in reproductive immunology research. Curr Gene Ther 2005;5:459–466. 8. Farag K, Hassan I, Ledger WL. Prediction of preeclampsia: can it be achieved? Obstet Gynecol Surv 2004;59:464–482. 9. Park KH, Yoon BH, Shim SS, et al. Amniotic fluid tumor necrosis factor-alpha is a marker for the prediction of earlyonset neonatal sepsis in preterm labor. Gynecol Obstet Invest 2004;58:84–90. 10. Lockwood CJ, Krikun G, Schatz F. Decidual cell-expressed tissue factor maintains hemostasis in human endometrium. Ann N Y Acad Sci 2001;943:77–88. 11. Anderson L, Seilhamer J. A comparison of selected mRNA and protein abundances in human liver. Electrophoresis 1997; 18:533–537. 12. Monti M, Orru S, Pagnozzi D, et al. Interaction proteomics. Biosci Rep 2005;25:45–56. 13. Wiederkehr F, Vonderschmitt DJ. [2-dimensional electrophoresis of cerebrospinal fluid in various neurological patients.] Schweiz Med Wochenschr 1985;115:368–373. 14. Chen JH, Chang YW, Yao CW, et al. Plasma proteome of severe acute respiratory syndrome analyzed by twodimensional gel electrophoresis and mass spectrometry. Proc Natl Acad Sci U S A 2004;101:17039–17044. 15. Fung KY, Glode LM, Green S, et al. A comprehensive characterization of the peptide and protein constituents of human seminal fluid. Prostate 2004;61:171–181. 16. Vuadens F, Benay C, Crettaz D, et al. Identification of biologic markers of the premature rupture of fetal membranes: proteomic approach. Proteomics 2003;3:1521–1525. 17. Unlu M, Morgan ME, Minden JS. Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis 1997;18:2071–2077. 18. Patterson SD, Aebersold R. Mass spectrometric approaches for the identification of gel-separated proteins. Electrophoresis 1995;16:1791–1814. 19. Gygi SP, Aebersold R. Mass spectrometry and proteomics. Curr Opin Chem Biol 2000;4:489–494. 20. Yates JR 3rd. Mass spectrometry. From genomics to proteomics. Trends Genet 2000;16:5–8. 21. Kuwata H, Yip TT, Yip CL, et al. Bactericidal domain of lactoferrin: detection, quantitation, and characterization of lactoferrin in serum by SELDI affinity mass spectrometry. Biochem Biophys Res Commun 1998;245:764–773. 22. Fung ET, Yip TT, Lomas L, et al. Classification of cancer types by measuring variants of host response proteins using SELDI serum assays. Int J Cancer 2005;115:783–789. 23. Hermjakob H, Apweiler R. The Proteomics Identifications Database (PRIDE) and the ProteomExchange Consortium: making proteomics data accessible. Expert Rev Proteomics 2006; 3:1–3. 24. Lo SL, You T, Lin Q, et al. SPLASH: systematic proteomics laboratory analysis and storage hub. Proteomics 2006;6: 1758–1769. 25. Hochstrasser DF. Proteome in perspective. Clin Chem Lab Med 1998;36:825–836. 26. Neubauer G, Gottschalk A, Fabrizio P, et al. Identification of the proteins of the yeast U1 small nuclear ribonucleoprotein complex by mass spectrometry. Proc Natl Acad Sci U S A 1997;94:385–390. 27. Petricoin EF, Ardekani AM, Hitt BA. Use of proteomic pattern in serum to identify ovarian cancer. Lancet 2002;359:572– 527. 28. Papadopoulos MC, Abel PM, Agranoff D, et al. A novel and accurate diagnostic test for human African trypanosomiasis. Lancet 2004;363:1358–1363. 29. Pucci-Minafra I, Fontana S, Cancemi P, et al. Proteomic patterns of cultured breast cancer cells and epithelial mammary cells. Ann N Y Acad Sci 2002;963:122–139. 30. Meehan KL, Holland JW, Dawkins HJ. Proteomic analysis of normal and malignant prostate tissue to identify novel proteins lost in cancer. Prostate 2002;50:54–63.