N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach, Radoslav Goldman, Nathan J. Edwards Georgetown University, Department of Biochemistry and Molecular & Cellular Biology, Washington, DC Introduction and Background Hypothesis and Aims • Over half of all proteins are glycosylated (this rate is higher for secreted, cell-surface and extracellular matrix proteins) • Glycosylation mediates cell-cell & cell-matrix interactions • N-linked glycosylation is enzyme-directed, occurs on Asn residues • We hypothesize that N-glycopeptide MS/MS data interpretation can be automated by adopting an algorithm that uses (1) oxonium marker ions and intact peptide peak filters, (2) the sequence(s) of the protein(s) of interest and (3) mass-matching to publicly available glycan databases to match glycan-peptide pairs to glycopeptide MS/MS spectra. within the motif NXS/T (X ≠ Pro) • Tandem mass spectrometry (MS/MS) is used to study protein glycosylation; however, there are few software tools to aid in processing and interpretation of large glycopeptide MS/MS datasets, and manual interpretation of datasets is time consuming • The aim of this research is to develop a novel software tool for rapid interpretation of glycopeptide MS/MS datasets to facilitate the study of glycoprotein microheterogeneity GlycoPeptideSearch Software – Glycopeptide Discovery Workflow GlycoPeptideSearch Scheme Methods: GlycoPeptideSearch Software 11 x LC-MS/MS MS/MS Spectra 3288 w/ glycan oxonium ion (204, 366) peaks 2887 w/ 2+ “peptide” peaks 317 GlycomeDB1 263 Distinct Glycopeptides 53 Test Dataset (Haptoglobin Glycopeptides) • Proteolytic digest of Haptoglobin with trypsin and GluC • Hydrophilic interaction liquid chromatography (HILIC) of glycopeptides • Eleven glycopeptide fractions analyzed by nanoC18 RP LC-MS/MS using a Q-STAR Elite mass-spectrometer. • IDA: Four most abundant ions with 20 sec exclusion. • 15,780 MS and 3,288 MS/MS spectra (msconvert) • Automated in silico digestion (Trypsin, GluC) of user-submitted protein sequence & N-glycosylation site ID • Fixed (carbamidomethylation) & variable modifications (Methionine oxidation) considered • Spectra filtered for glycan oxonium ions & peptide + N-linked core fragments & mass-lookup in GlycomeDB (glycan database) • Isotope Cluster Scoring performed on precursor ion • Decoy (non-motif containing) peptides submitted to search to enable estimation of the False Discovery Rate • Open format (XML) spectra input and Excel output. Summary of Haptoglobin Glycopeptides Controlling Error Using Spectra Filters and Hit Filters No. of Accepted Spectra Peptide Intensity Threshold 450 400 350 300 250 200 150 100 50 0 I=5 I=3 I=4 I=2 I=1 FDR Determination I = 10 I = 20 I = 30 0.0 1.0 2.0 3.0 4.0 5.0 6.0 Estimated FDR (%) 7.0 8.0 9.0 Target Peptides (with NXS/T) Decoy Peptides (without motif) Peptide Fragments (#) # Accepted Spectra 600 F=0 GPS 500 400 300 Sample MS/MS Spectrum: Fraction 17, Scan 1407 F=2 200 100 F=3 F=4 0 0 5 10 15 Estimated FDR (%) 20 25 Target and Decoy Spectra Hits # Accepted Spectra Isotope Cluster Score 350 300 250 200 150 100 50 0 IC = 50 IC = 20 IC = 10 IC = 5 IC = 2 IC = 1 0 2 4 IC = 100 IC = 200 6 8 10 Estimated FDR (%) IC = 9999 12 FDR 14 References Results and Conclusion 1. Ranzinger, Herget, von der Lieth, Frank. Nucleic Acids Res. 39(Database issue):D373-376 (2011). • 52 glycan-peptide pairs matched 263 spectra (3.9% FDR). • 52% (136) of filtered spectra matched a single glycopeptide pair (<0.2 Da), only 8 spectra matched 2. Fujimura, Shinohara, Tossot, Pang, Kurogochi, Saito, Arai, Sadilek, glycopeptide pairs representing > 1 peptide. • 27 distinct non-isobaric glycans at 4 sites were discovered, consistent with published reports2. 3. Goldberg, Sutton-Smith, Paulson, Dell. Proteomics 5:865-875 Murayama, Dell, Nishimura, Hakomori. Int. J. Cancer 122:39–49 (2008). (2005). 4. Pompach, Chandler, Lan, Edwards, Goldman. J.Proteome Res. 11 (3); 1728-40 (2012). Conclusion: Using characteristics of glycopeptide spectra including oxonium ions and intact peptide peaks, it is possible to automate glycopeptide CID MS/MS data interpretation with low false discovery. Acknowledgements Kevin B Chandler is supported by a Graduate Research Fellowship from the National Science Foundation. Nathan J Edwards is supported, in part, by NIH/NCI/CPTI grant CA126189. Rado slav Goldman is supported by NCI’s R01 CA115625 and CA135069.