Mining Clinical Proteomes for Post-Translational Modifications David L. Tabb, Ph.D. david.l.tabb@vanderbilt.edu Overview • Analytical chemistry to enhance PTM ID • Database searching for PTMs • Sequence tagging for blind PTM searches • Problem 1: Localization ambiguity • Problem 2: Higher FDR as a function of PTMs Discovery Proteomics Protein Mixture Peptide Mixture High-Resolution Isolate Mass Spectrometry Ions of Peptide Peptide Liquid Electrospray Fractionation Chromatography Ionization Collide Ions to Dissociate Collect Fragments in Tandem MS Two types of measurements for each peptide: intact m/z (mass/charge) and a list of fragment m/zs. Immobilized metal affinity columns • Phosphopeptides bear affinity to metal. Iron, gallium, and titanium oxides have been used in columns. • Conducting peptide capture rather than protein capture can lead to a proliferation of one-hit wonders. • The enrichment from “IMAC” is not perfect, and acidic peptides may bind just as favorably. http://www.sigmaaldrich.com, “Phosphopeptide Enrichment Kit” pTyr antibodies • Monoclonal antibodies were raised against phosphorylated Tyr. • Technology propelled investigations of signals from pTyr, despite the relatively low frequency of this residue. http://www.cellsignal.com/products/7902.html Rush et al (2005) Nature Biotechnol. 23: 94-101. Glycan columns • Lectins are carbohydrate-binding proteins. • Columns of immobilized lectins can enrich for proteins with specific sugars. • Amidases may be used to label sugar sites. Hirabayashi (2004) Glycoconjugate J. 21: 35-40. Tateno et al (2007) Nature Protocols 2: 2529-2537. Electron transfer dissociation (ETD) • Charge difference draws positively-charged peptides to accept electrons from radicals. • The peptide is cleaved between nitrogen and alpha carbon for a particular amino acid (c-z). • This gentle process produces fragments without disrupting labile PTMs. • Fragments are measured with high mass accuracy in Orbitrap mass analyzer. + − Database search overview Eng et al (1994) J. Amer. Soc. Mass Spectrom. 5: 976-989. Yates et al (1995) Anal. Chem. 67: 1426-1436. Dynamic PTMs grow search space Because multiple PTMs may be in each peptide, adding PTMs to a search creates an exponential cost. Here, three sites lead to eight PTM variants. CASA1_BOVIN More PTMs, more candidates No PTMs PTMs on M, S, T, R, K... Larger peptides gain disproportionately more comparisons when PTMs are in play. Tabb, Anal. Chem. (2005) 77: 2464-2474. Amino acids differ in frequency Ala 8.26 Gly 7.08 Pro 4.71 Arg 5.53 His 2.27 Ser 6.58 Asn 4.05 Ile 5.94 Thr 5.34 Asp 5.46 Leu 9.66 Trp 1.09 Cys 1.37 Lys 5.83 Tyr 2.92 Gln 3.93 Met 2.41 Val 6.87 Glu 6.74 Phe 3.86 Adding a mass shift for a common amino acid slows performance far more than for a rare amino acid. Swiss-Prot release notes, Jan 9, 2015 Where can I find potential PTMs? http://www.unimod.org Search spaces and run-times Precursor Tolerance PTM Set Trypsin Specificity Variants ⱡ Comparisons Time (sec) 20 ppm (none) Unconstrain 1689446563 10503186797 1877 10 ppm (none) Semi 340208357 1243965215 261 20 ppm (none) Semi 340208357 2246604590 392 30 ppm (none) Semi 340208357 2943698877 471 40 ppm (none) Semi 340208357 3389096687 513 20 ppm (none) Fully 21393082 150628871 28 20 ppm NtermQ-17 Fully 22592311 166565665 31 20 ppm M+16 Fully 30951288 229212380 44 20 ppm M+16STY+80 Fully 476292461 986998364 265 20 ppm M+16STY+80 Semi 9501671953 16133509368 4678 ⱡ number of candidates generated after PTM expansion and filtering on basis of mass, length, invalid residues, excess numbers of PTMs 20141202_LZ_FFPE_standard_1, 16320 HCD MS/MS scans, MyriMatch 2.1.138, eight Linux CPUs, RefSeq 62 human How do comparisons scale? Trypsin Specificity PTMs, fully tryptic Area of circle represents the number of comparisons between a decorated peptide and an MS/MS. Sequence tagging uses spectra twice Dasari et al. (2010) J. Proteome Res. 9: 1716-1726 How should we field mass shifts? • • • • Dynamic: expand sequences exponentially Preferred: combine specified mass shifts Mutant: substitute residues for each other Blind: allow any shift on any single residue Dasari et al. (2011) Chem. Res. Tox. 24: 204-216 Blind searches yield mass shift tables Now in IDPicker 3.1 Dasari, Chem. Res. Tox. (2011) 24: 204-216. Sprung, Mol. Cell. Proteomics (2009) 8: 1988-1998. Gaining expertise in blind interpretation • Boring is more often correct than brilliant. +22 Da is Sodium, not Asp → His • Blind PTMs are useful for finding patterns of mass shifts; do not put faith in individual peptide-spectrum matches. • Unusual cleavages can appear as peptideterminal mass shifts in blind searches. Blind PTM workflow Holman et al. Meth. Molec. Biology (2013) 1002: 167 Problem 1: Site localization • Multiple PTM-decorations of a sequence may tie in score for a spectrum, or nearly tie. • One can say this peptide and PTM explain the spectrum, but is position correct? • Ascore defines a new score to estimate probability from differentiating fragments. • Delta score techniques compare original DB search scores of variants to assess site error. Beausoleil. Nature Biotechnology (2006) 24: 1285-1292. Savitski. Mol. Cell. Proteomics (2011) 10: M110.003830. Taus. J. Proteome Res. (2011) 10: 5354-62. Which fragment ions move? Shifts in PTM position cause changes for few fragments. These fragments take on special importance in localization. Absence of these fragments results in ambiguity. CASA1_BOVIN Problem 2: FDR escalation • Even though one controls global FDR, errors are not uniform throughout collection. • More PTMs implies more ways for software to distort a sequence to force it to fit a spectrum. • More modifiable sites implies more degrees of freedom for distortion. • Examine empirical FDR rates for peptides containing different numbers of PTMs. Real world data PTMs/Peptide Target Decoy FDR 0 58587 363 1.2% 1 169416 1046 1.2% 2 49911 706 2.8% 3 9678 321 6.6% 4 1265 89 14.1% Search engine: MS-GF+ v9733 Peptide Variants Modifications: [STY] 80 [M] 16, max PTMs=4 Data: 468 Q-Exactive LC-MS/MS Experiments Site: Broad Institute, TCGA Breast Collection PSM Filters: 0.01 PSM aggregate FDR, 2 spectra per peptide Result: Identified 316,3440 spectra to 291,382 peptide variants Summary • Recognizing PTMs through database search has been possible since 1995. It is the most common way that PTM inventories are built. • Adding even a few PTMs to database search will greatly reduce its speed (and sensitivity). • Blind search is appropriate only when you are determining which patterns of modification are present; it should not be your final search. • Software may be of assistance, but your eyes and critical thinking are your biggest assets.