How to identify peptides Gustavo de Souza IMM, OUS October 2013 Peptide or Proteins? Bottom-up Proteomics 2DE-based approach Peptide Mass Fingerprinting MALDI (Matrix Assisted Laser Desorption Ionization) Intensity Peptide Mass Fingerprinting m/z MS/MS Ion Mass Mass Mass Source Analyzer Analyzer Analyzer Collision cell Detector MS/MS 899.013 899.013 899.013 Fragmentation Nomenclature for peptide sequence-ions: Collision-Induced Dissociation (CID): MHnn+* + N2 --> b + y Electron Capture Dissociation (ECD): MHnn++ e- --> MHn(n-1)+· --> c + z· Fragmentation y 7 O y 6 y 5 R 2 O y 4 R 4 O H N HN 2 b 1 b 2 R 3 O b 4 R 8 H N OH N H O b 3 y 1 R 6 N H O y 2 H N N H R 1 y 3 R 5 N H O b 5 R 7 b 6 Roepstorff-Fohlmann-Biemann-Nomenclature O b 7 Fragmentation 12 aa … … b ions y ions MS/MS of a peptide LG_y2_13 #11793 RT: 84.81 AV: 1 NL: 3.57E5 T: ITMS + c ESI d w Full ms2 735.93@35.00 [ 190.00-1485.00] 100 y8 P y++13 95 VPTVDVSVVDLTVK 90 85 y10 80 75 70 65 60 55 50 45 y6 y9 40 b5 35 y12 y11 30 y4 25 y5 y7 20 b3 15 b6 y3 b4 10 5 0 200 b7 400 500 600 700 P y13 b9 y2 300 b10 b8 800 m/z 900 b11 1000 1100 b12 1200 b13 1300 1400 How to Identify MS/MS Stenn and Mann, 2004. Peptide Sequence Tags Autocorrelation Probability based match Submitting to Search How identification happen? Your data Step 1: which theoretical peptides has the same mass of the observed ion? Step 2: From those, which one have the most similar fragmentation pattern? Protein database (fasta) x x x High mass accuracy – what is it good for? All theoretical tryptic peptide masses from human IPI database Example Tryptic HSP-70 peptide: ELEEIVQPIISK, mass 1396.7813 Da Instrument LTQ QSTAR QSTAR LTQ-FT LTQ-FT LTQ-FT Mass Accuracy 500 20 ppm 10 ppm 2 ppm 1 ppm 0.5 ppm Calibration Ext. Ext. Int. Ext. Ext-SIM Int. 52 33 11 9 3 # of tryptic peptides for m/z 344 1396.7813 Defining the “Search Space” The “Search Space” 1/2 1 4/5 5/6 5 6 1/2/3 2/3 3/4 3 2 4 2 mcl 1 mcl 0 mcl 2/3/4 3/4/5 4/5/6 1/2 1 2/3 3 2 4 5 4/5 3/4 5/6 6 1 3 2 4 5 6 Importance of Search Space Size Search tool does not identify a peptide. It only reports the statiscally most suitable theoretical sequence related with the experimental data. If you increase the size of the database too much, or the size of the search space, false-positive rates also increase. Defining FDRs Steen and Mann, 2004 MOWSE Chance that two peptides with different sequences but approximate Mr and sharing MS/MS similarities. More variables inserted during search Higher chance to get random events Higher MOWSE score threshold Parameters that can modify the MOWSE calculation: -Database size; -MMD (measured mass deviation); -Number of PTMs choosen; -Data quality. Example of MMD issue Mycoplasma sp. sample (Munich 2006): -Database had ~ 700 entries; -Data accuracy had 0.7ppm average; -MMD used during search: 3 ppm. Probability Based Mowse Score Ions score is -10*Log(P), where P is the probability that the observed match is a random event. Individual ions scores > 7 indicate identity or extensive homology (p<0.05). Protein scores are derived from ions scores as a non-probabilistic basis for ranking protein hits. Strategies to Visualize FDRs Peng et al (2003). Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J Prot Res 2, 43-50. Reversed database sequence False positive identification using reversed database HSP-70 tryptic peptide (forward) (reverse) K ELEEIVQPIISK K SIIPQVIEELEK Peptide 1396.7813 Da Mr 1396.7813 Da Mascot checks both peptides Theoretical y series Theoretical y series y1 147.1 147.1 y2 234.1 276.2 y3 347.2 389.2 .... .... .... y11 1267.7 1309.7 Expected ions from reversed hit should not correlate with oberved ions on experiment Typical Result All peptides Mascot 160 140 Mascot Score 120 100 80 60 40 20 0 5 7 9 11 13 15 Seq lenght 17 19 21 23 25 How to Validate the Data Are there any Reversed hit protein with 2 peptides above MOWSE score? -No: All proteins identified with 2 peptides score higher than p<0.05 are good -Yes: Repeat mascot search with more stringent parameters. What about 1-hit wonders? (Proteins identified with only 1 peptide) How to Validate the Data All peptides Mascot 160 140 Mascot Score 120 100 80 60 40 20 0 5 7 9 11 13 15 17 19 21 Seq lenght Basically, the idea is to ”play around” with the statistics to make your result more reliable. 23 25 Take home message 1. Data quality (mass accuracy) and a well-defined search space are key for reliable peptide identification 2. Reliable identification is an interplay between asking enough without asking too much (careful when trying to get “as many IDs as I can”!) PTMs Gustavo de Souza IMM, OUS October 2013 PTMs in biology PTMs in biology Complexity of Protein Samples in Eukaryotes Modifications are specific to a group of amino acids What difference to expect at MS level? Larsen MR et al, 2006. Defining the “Search Space” PTM abundance in a cell Number of Peptides Total peptides in a sample Modified peptides Abundance level Differences from 10e2 to 10e4 PTM abundance in a cell Stable vs. Labile PTMs Larsen MR et al, 2006. Neutral loss Boersema PJ et al, 2009. Identifying Labile PTMs Larsen MR et al, 2006. HCD fragmentation Larsen MR et al, 2006. Status of PTM coverage Lemeer and Heck, 2009. Status of PTM coverage Derouiche A et al, 2012. Take home message - Depending on PTM, identification can be very easy or very hard - Dependent on stability under fragmentation and abundance in the sample - ID improvement was mostly defined by instrumentation improvements (sensitivity etc)