Analysis of the Human Serum Proteome Dr. Timothy D. Veenstra Director, Laboratory of Proteomics and Analytical Technologies and NCI-Frederick Biomedical Proteomics Program Challenge Goal: Eliminate Suffering and Death Due to Cancer by 2015. Dr. von Eschenbach, Director National Cancer Institute TRANSLATIONAL RESEARCH • Take research from the bench to bedside. • Obligation to public health. • Allow physicians to make better decisions in cancer management. Three Keys to Translational Cancer Research • Early Detection improved and bioinformatic If Development detected at anofearly stage, proteomics the five-year survival rates fortools most for diagnostic medicine. cancers is high even using currently available treatments. • Molecular Diagnostics NewisTarget (Global Proteomics) There a needDiscovery for biomarkers and novel diagnostic technologies Signal Transduction Pathway Profiling that are more accurate and can detect (Targeted early stageProteomics) cancers. • Molecular Targeted Therapeutics Implementation of new technologies to ongoing NCI-based clinical trials. The Importance of Early Detection of Ovarian Cancer ACTUAL WITH EARLY DETECTION 100 100 75 75 % 5 YR SURVIVAL % STAGE DISTRIBUTION 50 50 25 25 0 0 I II III IV I II III IV A SHIFT IN NUMBER OF PATIENTS DIAGNOSED AT EARLY STAGE WILL DRAMATICALLY EFFECT PATIENT SURVIVAL! Current Status of Ovarian Cancer Screening CA 125: a high-molecular-weight glycoprotein. CA 125 is elevated in 83% of patients with ovarian cancer. False Negative rates of 40-50% for stage I disease. CA-125 cannot be detected in tissue sections from 20% of ovarian cancers. Hence, the false negative rates using CA-125 will never be lower than 20%. Patterns of Proteomic Information in Serum Hypothesis: Tissues are continuously perfused by serum -- their histopathology may be reflected in serum proteomic “patterns.” 1. Signature proteins are products of the tumor-host Perfused Tissue microenvironment, and thereby unique to the tissue site and pathophysiological state. 2. These biomarkers are likely to be modified or cleaved “reporter” proteins/peptides that are produced/amplified at the tumor/host interface, are released, and partition to circulating carrier proteins. Patterns of Proteomic Information in Serum Pathologic Signature? “Proteomic” Mass Spectrum CAN PROTEIN PROFILING IDENTIFY PROTEIN EXPRESSION PATTERNS DIAGNOSTIC OF INVASIVE EPITHELIAL OVARIAN CANCER? Serum Proteomic Pattern Diagnostic Workflow Serum Hydrophobic PS-10 or PS-20 Cation Exchange Anion Exchange Antibody - Antigen Metal Affinity Receptor - Ligand Normal Phase DNA - Protein Protein Chip SELDI-TOF MS Diagnosis m/z Proteomic Pattern Pattern Recognition Application and Implementation of SELDI-QqTOF for Diagnostic Proteomics WCX2 ProteinChip Array Ciphergen SELDI-TOF MS Widely accessible Extensive m/z range (5-300,000) Low Resolution (~ 100-200) Low Mass Accuracy (~1000 ppm) ABI QSTAR Pulsar QqTOF MS More specialized knowledge required…? Limited m/z range? (5-12,000) Higher resolution (>9000 at m/z 1500) High mass accuracy (>50 ppm - external cal) Bioinformatic Analysis for the Discovery of Diagnostic Patterns Phase I: Pattern Discovery a. Unaffected samples 1000 2000 3000 Phase 2: Pattern Matching Test/validation sample for diagnosis b. Cancer samples 4000 5000 6000 m/z 1000 3000 4000 5000 6000 m/z Genetic algorithm + self-organizing cluster analysis “Survival of the fittest” discriminatory Patterns that discriminate “a” from “b” in the training set 2000 Lead diagnostic fingerprint (from training set) Normal Cancer New Sample and Modeling Breakdown Samples obtained from National Ovarian Cancer Early Detection Program, Northwestern University (Director: Dr. David Fishman) A. 84 training samples (28 Unaffected and 56 Ovarian Cancer) B. 87 blind testing samples (30 Unaffected and 57 Cancer) C. 77 blind validation samples (37 Unaffected and 40 Cancer) Total: 153 Ovarian Cancer; 95 Unaffected Metrics of “High Fitness” Models from QqTOF Data A. Results Breakdown of Four Models with Highest Diagnostic Accuracy Results: 100% sensitivity; 100% specificity State Testing Validation Normal Ovarian Cancer 31/31 63/63 37/37 40/40 Unaffected B. Key diagnostic features recognized within each model 5 4 Model 1 7060 1276.861 0 6900 Model 2 8606 1001.654 0 4260.403 1255.593 7046.018 m/z 4377.854 7060.121 8602.237 6004.417 7096.922 8664.385 2374.244 6352.723 7000 7100 7200 m/z 4292.900 6548.771 8500 5 8600 8540.536 8606 7202.716 9870.938 8605.678 8605.678 8706.065 7000 7100 m/z Conrads, T. P., Zhou, M., Petricoin, E, Liotta, L., and Veenstra, T. D., Expert Rev. Mol. Diagn., 3, 411-420. 7200 8700 7060.121 8706.065 0 6900 Model 4 1144.796 818.480 Ovarian Cancer 7060.121 7060 4 8605.678 Model 3 8709.548 0 8500 8600 9367.113 m/z 8700 B. Key diagnostic features recognized within each model Model 1 Model 2 Model 3 Model 4 1276.861 818.480 1144.796 1001.654 2374.244 6352.723 4260.403 1255.593 4292.900 6548.771 7046.018 4377.854 7060.121 7060.121 8602.237 6004.417 8605.678 7096.922 8664.385 7060.121 8706.065 8540.536 7202.716 9870.938 8605.678 8605.678 8706.065 8709.548 9367.113 BLINDED TEST RESULTS: Collaborators: Denise Ching, Kim Lyerly, Sam Wells, David Harpole; Duke U. Benign vs. Malignant (Spiral CT +) Pattern Recognition Method #1 Pattern Recognition Method #2 Pattern Recognition Method #3 Key ion features selected (m/z) 6851.505 2378.046 2371.398 6675.697 10070.302 2210.224 4914.232 6854.245 2620.747 4471.636 5086.187 6649.053 6854.456 1028, 1035, 1050, 1289, 1980, 2080, 2210, 2212, 2365, 2366, 2485, 2589 2897, 3158, 3435, 3538, 3763, 4062, 4071, 4307, 4315, 4482, 4491, 4559 4643, 5138, 5139, 5800, 5861, 5879 6414, 6432, 6629, 6646, 6660, 6852 6978, 7834, 7835, 7908, 7922, 7923 7935, 7953, 8329, 8330, 8601, 8617 8619, 8634, 8913, 8931, 9120 Specificity Sensitivity 69% 71% 85% 98% 89% 95% 100 Benign 50 Relative Intensity (%) 2365 0 1000.0 2370 2000.0 3000.0 2375 4000.0 6820 5000.0 6850 6000.0 6880 7000.0 8000.0 9000.0 1.0e4 100 . 1.1e4 1.2e4 Adenocarcinoma 50 2370 0 1000.0 2000.0 3000.0 2375 6820 4000.0 5000.0 6850 6000.0 6880 7000.0 8000.0 9000.0 1.0e4 1.1e4 1.2e4 1.1e4 1.2e4 100 Squamous 50 2365 0 1000.0 2000.0 2370 3000.0 2375 4000.0 6820 5000.0 6850 6000.0 m/z 6880 7000.0 8000.0 9000.0 1.0e4 6 4 2 0 6835 m/z 6875 Do we detect clinical biomarkers such as CA125 or PSA in proteomic patterns using SELDI? Short Answer: No. Is this due to the sensitivity of the instrument? Short Answer: No, it is a dynamic range issue. A SELDI-TOF can detect below 10-12 mol/L. Will a straight MALDI approach and high resolution MS without specifically targeting PSA, for example, allow detection of these low abundant biomarkers? Short Answer: No (see above) Are we trying to detect PSA and CA125 Short Answer: No Do we need better ways of diagnosing early stage cancer beyond CA125 and PSA? Short answer: Absolutely. Are all of the steps necessary to make proteomic pattern diagnostics clinically useful being evaluated? Short answer: Absolutely. Characterization of the Human Serum Proteome 22 PROTEINS COMPRISE 99% OF THE PROTEIN MASS IN SERUM! 90% 10% Human Serum Proteomic Investigation Three tracks: Global serum proteome survey Can we account for the presence of histopathologicallyrelated proteins/peptides in serum? Low molecular weight protein/peptide proteome Can we deplete the high molecular weight fraction for more effective interrogation of the source of the diagnostic information? Investigation of bound peptides to high abundant serum proteins Is there histopathological content bound to the highly abundant carrier proteins, such as albumin? Global Serum Proteome Survey Ampholyte Free Serum Peptide IEF (20 Fractions) Tryptic Digest 500000 100 90 80 70 300000 60 50 200000 40 30 100000 20 10 0 0 0 12 24 36 48 60 72 84 Time (min) Analyze by LC/MS/MS Strong Cation Exchange (140 Fractions) 96 % Salt Gradient Fluorescence 400000 GLOBAL ANALYSIS OF THE SERUM PROTEOME bpp.nci.nih.gov IEF 473 Proteins 957 Unique Peptides IEF/SCX 1143 Proteins 2071 Unique Peptides Total Proteins and Peptides Identified 1446 Unique Proteins 2649 Unique Peptides Analysis of the Human Serum Proteome King C. Chan, David A. Lucas, Denise Hise, Carl F. Schaefer, Zhen Xiao, George M. Janini, Kenneth H. Buetow, Haleem J. Issaq, Timothy D. Veenstra and Thomas P. Conrads Clinical Proteomics (2004) In Press ? Analysis of Identified Human Serum Proteins viral life cycle 0.04% Molecular Function antioxidant activity 0.05% apoptosis regulator activity transporter activity 0.54% 9.34% transcription regulator activity 5.94% toxin activity 0.05% surfactant activity 0.05% structural molecule activity 4.75% behavior 0.54% transport 6.93% biological_process unknown 2.43% response to stress 2.94% cell communication 9.93% response to external stimulus 7.76% cell cycle 2.36% pregnancy 0.29% pathogenesis 0.47% cell growth and/ or maintenance 13.31% binding activity 34.07% metabolism 19.43% cell motility 1.99% excretion 0.29% signal transducer activity 10.37% protein tagging activity 0.05% protein stabilization activity 0.05% motor activity 1.13% molecular function unknown 3.46% enzyme regulator activity 2.86% homeostasis 0.65% extracellular matrix organization 0.15% diuresis 0.04% cell adhesion molecule activity digestion 2.86% 0.22% chaperone activity enzyme activity 18.79% 0.70% cytoskeletal regulator activity 0.05% defense/immunity protein activity 4.86% death cellular process 20.67% 1.81% development 6.85% circulation 0.91% Biological Processes Cellular Component of Human Serum Proteins Cytoplasmic 3% virion <1% Cytoskeletal 3% cellular component unknown 7% Endoplasmic Reticulum 3% Golgi 2% Lysosomal 1% Membrane 30% Extracellular 15% Extracellular 8% Mitochondrial 4% Membrane 39% Intracellular 47% GO of Human Serum Proteome Nuclear 30% Intracellular 8% GO of Human Proteome Human Serum Proteomic Investigation Three tracks: Global serum proteome survey B. Key diagnostic features recognized within each model Model 3 ofModel 4 1 Model 2 presence Can weModel account for the histopathologicallyrelated proteins818.480 in serum? 1276.861 1144.796 1001.654 2374.244 6352.723 4260.403 1255.593 Low molecular protein/peptide proteome 4292.900weight 6548.771 7046.018 4377.854 7060.121 7060.121 8602.237 6004.417 Can we deplete7096.922 the high8664.385 molecular weight fraction 8605.678 7060.121 for more effective interrogation of7202.716 the source of the 8706.065 8540.536 diagnostic information? 9870.938 8605.678 8605.678 8706.065 8709.548 9367.113 Investigation of bound peptides to high abundant serum proteins Is there histopathological content bound to the highly abundant carrier proteins, such as albumin? High Molecular Weight Protein Depletion by Ultrafiltration Dilute raw serum 1:5 in 25 mM NH4HCO3, pH 8.2/20% acetonitrile 30 kDa MWCO Filter Centrifuge Tirumalai, R.S., Chan, K.C., Prieto, D.A, Issaq, H.J, Conrads, TP. and Veenstra, T.D. Mol. Cell Proteomics., (2003). Depletion of High MW Serum Proteins by Ultrafiltration HSA 21.5 kDa 14.4 kDa 6.0 kDa 3.5 kDa M LMW Ultrafiltrate Raw Serum Ultrafiltrate Tirumalai, R.S., Chan, K.C., Prieto, D.A, Issaq, H.J, Conrads, TP. and Veenstra, T.D. Mol. Cell Proteomics., (2003). MALDI-TOF MS of Ultrafiltered Serum Raw Serum 7.5 5 2.5 0 No Acetonitrile 7.5 5 2.5 0 20% Acetonitrile 7.5 5 2.5 0 2500 5000 7500 10000 12500 15000 m/z Tirumalai, R.S., Chan, K.C., Prieto, D.A, Issaq, H.J, Conrads, TP. and Veenstra, T.D. Mol. Cell Proteomics., (2003). High Molecular Weight Protein Depletion by Ultrafiltration Dilute raw serum 1:5 in 25 mM NH4HCO3, pH 8.2/20% acetonitrile Centrifuge 30 kDa MWCO Filter SCX Fractionation mLC-MS/MS Trypsin Digest Tirumalai, R.S., Chan, K.C., Prieto, D.A, Issaq, H.J, Conrads, TP. and Veenstra, T.D. Mol. Cell Proteomics., (2003). 880 Unique Peptides (341 Proteins) Identified from Human Serum LOW MOLECULAR WEIGHT Fraction Hypothetical proteins Enzymes Circulating proteins Coagulation & complement factors Structural proteins, nuclear proteins, transcription factors, oncogene products, etc. Transport and binding proteins Protease Inhibitors Proteases Channels, Receptors, Binding Proteins Cytokines, Growth Factors, Hormones Tirumalai, R.S., Chan, K.C., Prieto, D.A, Issaq, H.J, Conrads, TP. and Veenstra, T.D. Mol. Cell Proteomics., (2003). GLOBAL ANALYSIS OF THE SERUM PROTEOME www.bpp.nci.nih.gov Protein Peptide Xcorr Interferon g LKKYFNAG 2.48 Charge Compartment 2 extracellular Function defense/immunity 1674 Unique Proteins 3441 Unique Peptides 100 IEF/SCX IEF Materials and LMW Methods 473 Proteins 957 Unique Peptides 1143 Proteins 2071 Unique Peptides Relative Abundance (%) bpp-dev.nci.nih.gov 75 50 Data Analysis 25 0 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 m/z 308 Proteins 884 Unique Peptides Systems Analysis of Human Serum Serum Proteomic Analysis Three tracks: Global serum proteome survey Can we account for the presence of disease and cellular process-related proteins in serum? Low molecular weight protein/peptide proteome Can we deplete the high molecular weight fraction for more effective interrogation of the source of the diagnostic information? Investigation of bound peptides to high abundant serum proteins Is there histopathological content bound to the highly abundant carrier proteins, such as albumin? Targeted Serum Proteomics Utilize the character of serum – is the presence of albumin such a detriment, or is it something exploitable? Can we target the proteomic study of serum for disease diagnosis as we would signal transduction pathways? Diagnostic molecular sponges? Preliminary diagnostic studies are demonstrating that highly abundant HMW proteins actually contain bound diagnostic information. NH2 NH2 Lys Lys Bind Xlink Protein G Protein G Protein G NH2+Cl Cl+H2N C-(CH2)5-C H3CO OCH3 Incubate Serum Dimethyl pimelimidate (DMP) Wash Elute Protein G Protein G MALDI-TOF MS (Diagnostic) Centrifuge or Trypsin Digest/mLC-MS/MS (Discovery) 30 kDa MWCO Serum interactionomics studies have been completed for: HSA via Antibody Capture HSA via Dye-binding Apolipoprotein Transferrin IgG IgA IgM Prostate specific antigen (PSA) was detected bound to IgG and albumin but not in the global serum analysis. Using high abundance proteins as sponges, may increase the likelihood of detecting low abundant proteins in serum or plasma. Global Analysis of the Mouse Serum Proteome Intact Proteins 95 AU 15 195 9 3 -3 -5 0 15 30 45 60 75 90 0 Time (min) 15 30 45 60 75 90 Time (min) Digest into Peptides 500000 100 500000 100 90 90 80 50 200000 40 30 100000 20 10 0 0 0 12 24 36 48 60 72 84 96 Fractionate Using Strong Cation Exchange Time (min) 80 70 300000 60 50 200000 40 30 100000 % Salt Gradient 60 % Salt Gradient 70 300000 400000 Fluorescence 400000 Fluorescence AU Cation Exchange Anion Exchange 295 20 10 0 0 0 12 24 36 48 60 72 84 96 Time (min) Compile Results Analyze by LC/MS/MS Analyze by LC/MS/MS Gene Ontology of Mouse Serum Proteome Extracellular 6.2% Membrane 43.8% Intracellular 47.5% Global Analysis of the Mouse Serum Proteome 5053 Unique Proteins 11113 Unique Peptides Analysis and Bioinformatic Annotation is Continuing i.e. Comparison of mouse and human serum proteome Is mouse a reasonable model for studying human cancers? Cross Comparison of Mouse and Human Serum Proteome Human Mouse Total Number of Proteins Identified 1674 5059 Proteins Mapped to Locus Link Human/Mouse Pairs with >90% Similarity Human/Mouse Pairs with >80% Similarity Human/Mouse Pairs with >70% Similarity 1317 165 240 385 4637 166 244 401 Almost 30% of the human serum proteins identified had a homolog with >70% sequence similarity that was identified within the mouse serum proteome. What About “One Hit Wonders” In Biomarker Discovery Distribution of Unique Peptide Identifiers per Protein within Mouse Cortical Neuron Proteome 9 (1.3%) 8 (1.9%) 7 (2.6%) 10 (1.2%) >10 (5.0%) 6 (3.3%) 5 (4.4%) 1 (45.6%) 4 (6.3%) 3 (10.3%) 2 (18.1%) In most global proteomic surveys and quantitative proteomic studies using ICAT, a large fraction of the peptides are identified by a single unique peptide. Validation is a Key Component for Discovery-Driven Research Relative Abundance 100 ICAT-12/13C9 A C Q E Q I E A L L E S S L R y13 y12 y11 y10 y9 y8 y7 y6 y5 y4 y3 y2 50 S P Cyclin D1 0 81.0 82.0 83.0 84.0 Retention Time (min) 85.0 Actin Relative Abundance 100 13C /13C 9 0 ratio = 1.76 50 0 81.0 82.0 83.0 84.0 Retention Time (min) 85.0 Densitometric ratio = 2.41 Interstitial Cystitis and Antiproliferative Factor Interstitial cystitis (IC) is a debilitating chronic painful bladder disorder, of unknown etiology, from which approximately one million Americans suffer. Bladder epithelial cells from IC patients produce an antiproliferative factor (APF) that inhibits the proliferation of normal bladder epithelial cells in vitro and alters the production of specific growth factors. APF is a potential anti-bladder cancer agent, however, its identity is unknown. Relative Abundance (%) Identification of Fraction with APF-activity 100 80 1 2 3 4 5 6 60 40 20 0 0 10 20 30 40 Retention Time (min) % 3H-Thymidine Incorporation 100 50 0 1 2 3 4 Fraction 5 6 MS of Fraction with APF-activity Relative Abundance 100 X 80 60 40 Y 20 0 600 800 1000 1200 1400 1600 1800 2000 Relative Abundance (%) m/z 100 XIC m/z X XIC m/z Y 80 60 40 20 0 0 10 20 Retention Time (min) 30 40 Identification of APF by de novo Sequencing Relative Abundance (%) y6 100 80 b3 b4 b5 b6 b7 b8 60 40 20 0 Relative Abundance (%) ABC X X X X X X DEF 500 700 900 1100 1300 1500 100 80 60 40 20 0 250 350 450 550 m/z 650 750 850 Percentage Decrease in Live Cell Count Antiproliferative Activity of APF Peptide and Glycosylated Derivatives The APF peptide has 100% homology to a peptide within a known ligand receptor 120 100 80 60 40 VALIDATION!!! 20 Native APF Native Mock APF Synthetic APF Synthetic Peptide Alone Presence of APF peptide in interstitial cystitis patients confirmed by Northern blotting VALIDATION!!! APF as a Biomarker for Interstitial Cystitis APF has 100% homology to a peptide within a known ligand receptor APF is a single peptide biomarker/effector for patients with IC. If the “one-hit wonder” rule is followed, it would have been disregarded. CONCLUSIONS We have used high resolution MS and obtained 100% sensitivity and specificity for ovarian cancer diagnosis. Characterization of human (1447 proteins identified) and mouse serum proteome (~5000 proteins) demonstrates that proteins across all functional classes and cellular locations are present within serum. Unlikely that Ab-based detection will provide reliable specificity in disease marker detection. A potential archive of histopathological information is bound to highly abundant serum carrier proteins. Just because a protein is only identified by a single peptide does not mean it should be ignored. After all these studies are directly identifying peptides, not proteins. NCI BIOMEDICAL PROTEOMICS PROGRAM SCIENTIFIC TEAM AND COLLABORATORS Laboratory of Proteomics And Analytical Chemistry SAIC-Frederick Mass Spectrometry Center Thomas P. Conrads Radha Tirumalai Li-Rong Yu Ming Zhou Josip Blonder Zhen Xiao John Roman Separations Technology Lab Haleem Issaq, Head Proteomic Patterns Lance Liotta Emmanuel Petricoin Ben Hitt Vincent Fusaro Interstitial Cystitis Chris Michedja George Janini King Chan Stephen Fox NMR Lab Gwen Chmurny, Head Que Van John Klose Aaron Lucas Joseph Kates Director RTP, SAIC-Frederick David Goldstein CCR, NCI J. Carl Barrett Director CCR, NCI