Proteomics Informatics – Protein characterization I: post-translational modifications (Week 10) Post-translational modification • Biologically important post-translational modification (phosphorylation, acetylation, glycosylation, etc.) • Introduced on purpose during sample preparation (alkylation, iTRAQ, TMT etc.) • Side-products of sample preparation (oxidation, deamidation, carbamylation, formylation etc.) Post-translational modification Mann and Jensen, Nature Biotech. 21, 255 (2003) Phosphorylation examples Unmodified b 1 --261.1556 2 421.1862 3 520.2546 4 621.3022 5 718.3549 6 819.4025 7 920.4502 8 1080.481 9 1167.513 10 1281.556 11 1382.603 12 1495.687 13 1610.714 14 1723.798 15 1820.851 16 1951.891 17 2038.923 18 2135.976 19 20 --- F I C V T P T T C S N T I D L P M S P R y --2163.024 2049.94 1889.909 1790.841 1689.793 1592.741 1491.693 1390.645 1230.615 1143.583 1029.54 928.4923 815.4083 700.3814 587.2974 490.2447 359.2042 272.1722 175.1195 pS18 b --261.1556 421.1862 520.2546 621.3022 718.3549 819.4025 920.4502 1080.481 1167.513 1281.556 1382.603 1495.687 1610.714 1723.798 1820.851 1951.891 2118.923 2215.976 --- 1F 2I 3C 4V 5T 6P 7T 8T 9C 10 S 11 N 12 T 13 I 14 D 15 L 16 P 17 M 18 S 19 P 20 R y --2243.024 2129.94 1969.909 1870.841 1769.793 1672.741 1571.693 1470.645 1310.615 1223.583 1109.54 1008.492 895.4083 780.3814 667.2974 570.2446 439.2042 272.1722 175.1195 pT5 b --261.1556 421.1862 520.2546 701.3022 798.3549 899.4025 1000.45 1160.481 1247.513 1361.556 1462.603 1575.687 1690.714 1803.798 1900.851 2031.891 2118.923 2215.976 --- y" 1 F --2I 2243.024 3C 2129.94 4V 1969.909 5T 1870.841 6P 1689.793 7T 1592.741 8T 1491.693 9C 1390.645 10 S 1230.615 11 N 1143.583 12 T 1029.54 13 I 928.4923 14 D 815.4083 15 L 700.3814 16 P 587.2974 17 M 490.2447 18 S 359.2042 19 P 272.1722 20 R 175.1195 Potential modifications Enrichment Strategies for the Detection of Phosphorylated Peptides Enrichment Strategies for the Detection of Phosphorylated Peptides Unphosphorylated single phosphorylation multiple phosphorylation • Hydrophilic Interaction Chromatography (HILIC) • Phosphopeptides elute later than their unphosphorylated counterparts • Stationary phase is hydrophilic • Mobile phase is hydrophobic Enrichment Strategies for the Detection of Phosphorylated Peptides SCX Time (min) neutral peptides basic peptides • Strong Cation Exchange Chromatography • Stationary phase is negatively charged • Mobile phase is a buffer that is increasing the pH (if peptide becomes neutral it elutes) • Neutral peptides elute earlier: XXpSxxxxxR/K • Positive peptides elute late: XXXXHXXXXR/K Several Strategies are often combined Loss of the phosphate group Localization of modifications Probability of Localization 1.2 1 0.8 Phosphopeptide identification 0.6 0.4 0.2 0 0 5 10 15 20 25 Number of fragment ions mprecursor = 2000 Da Dmprecursor = 1 Da Dmfragment = 0.5 Da Phosphorylation Localization of modifications Probability of Localization 1.2 1 0.8 dmin>=3 for 47% of human tryptic peptides Localization (dmin=3) 0.6 0.4 0.2 ID 3 0 0 5 10 15 20 Number of fragment ions 25 mprecursor = 2000 Da Dmprecursor = 1 Da Dmfragment = 0.5 Da Phosphorylation Localization of modifications Probability of Localization 1.2 1 dmin=2 for 33% of human tryptic peptides 0.8 Localization (dmin=2) 0.6 0.4 ID 3 2 0.2 0 0 5 10 15 20 Number of fragment ions 25 mprecursor = 2000 Da Dmprecursor = 1 Da Dmfragment = 0.5 Da Phosphorylation Localization of modifications Probability of Localization 1.2 1 dmin=1 for 20% of human tryptic peptides 0.8 0.6 Localization (dmin=1) 0.4 ID 3 2 1 0.2 0 0 5 10 15 20 Number of fragment ions 25 mprecursor = 2000 Da Dmprecursor = 1 Da Dmfragment = 0.5 Da Phosphorylation Localization of modifications Probability of Localization 1.2 1 0.8 0.6 0.4 Localization (d=1*) 0.2 ID 3 2 1 1* 0 0 5 10 15 20 Number of fragment ions 25 mprecursor = 2000 Da Dmprecursor = 1 Da Dmfragment = 0.5 Da Phosphorylation Localization of modifications Peptide with two possible modification sites Localization of modifications Peptide with two possible modification sites Intensity MS/MS spectrum m/z Localization of modifications Peptide with two possible modification sites Matching Intensity MS/MS spectrum m/z Localization of modifications Peptide with two possible modification sites Matching Intensity MS/MS spectrum m/z Which assignment does the data support? 1, 1 or 2, or 1 and 2? Visualization of evidence for localization AAYYQK AAYYQK Visualization of evidence for localization AAYYQK AAYYQK Visualization of evidence for localization 1 2 3 1 2 3 Estimation of global false localization rate using decoy sites False localization frequency By counting how many times the phosphorylation is localized to amino acids that can not be phosphorylated we can estimate the false localization rate as a function of amino acid frequency. 0.02 0.015 0.01 0.005 0 0 0.05 Y 0.1 Amino acid frequency 0.15 How much can we trust a single localization assignment? If we can generate the distribution of scores for assignment 1 when 2 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment. 1. 2. S S m 1 m 1 S m 2 S1 2 2 2 F (S 1 )dS 1 m p 2 1 0 F (S 1 )dS 1 0 S 2 1 2 2 2 Is it a mixture or not? If we can generate the distribution of scores for assignment 2 when 1 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment. 1. 2. S m 1 S m 2 1 S p2 m 2 Sm 2 1 1 1 ( ) F S 2 dS 2 0 F 0 S 1 2 1 1 1 ( S 2) dS 2 Localization of modifications Peptide with two possible modification sites Matching Intensity MS/MS spectrum m/z Which assignment does the data support? 1, 1 or 2, or 1 and 2? p p p p 2 1 2 1 2 1 2 1 p and p p and p p and p p and p 1 th 2 1 th 2 1 th 2 1 th 2 p th 1 and 2 th 1 p p (S 1 S 2 p th Ø th 1 or 2 p m m 2 1 p ) 1 2 Top down / bottom up Top down intensity Bottom up mass/charge Charge distribution Top down Bottom up 2+ 31+ intensity intensity 27+ 3+ 4+ 1+ mass/charge mass/charge Isotope distribution Top down Bottom up m = 1878 Da intensity intensity m = 1035 Da mass/charge mass/charge Fragmentation Top down Bottom up Fragmentation Alternative Splicing Top down Exon 1 2 Bottom up 3 Correlations between modifications Top down Bottom up The Nucleosome Core Complex H3 H3 ‘tail’ H4 H2A H2B Luger et al., Nature, 389, 251-260, 1997 The N-terminal Tails of Histone H3 and H4 M M P M P Ac P Ac M Ac P M M M P M M H3 1-ARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVKKPHRYRPTVALRE-50 Ac P M Ac Ac Ac Ac M Ac H4 1-SGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYE-52 P Phosphorylation M Methylation: mono-, di-, or trimethylation Ac Acetylation The Histone Code Hypothesis Specific post translational modifications (PTMs) of the N-terminal tails of histones function as a scaffold for binding of protein factors leading to transcriptional activation or inactivation. Jenuwein, T., Allis, C.D., Science, 293, 2001 Interdependence of Modifications is lost in Standard Mass Spectrometry Analysis M M P Ac P M P Ac Ac M Ac P M Ac MM P M M H3 1-ARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVKKPHRYRPTVALRE-50 M TKQTAR 3-8 M P Ac KSTGGKAPR 9-17 Ac M KQLATKAAR 18-26 Ac M KSAPATGGVKKPHR 41-50 27-40 YRPTVALRE Histone Proteins are a Highly Complex Mixture of a Single Protein…. M M M M ARTKQTARKSTGAKAPRKQLASKAARKSAPATGGIKKPHRFRPGTVALRE M Ac ARTKQTARKSTGAKAPRKQLASKAARKSAPATGGIKKPHRFRPGTVALRE M M M M ARTKQTARKSTGAKAPRKQLASKAARKSAPATGGIKKPHRFRPGTVALRE M Ac ARTKQTARKSTGAKAPRKQLASKAARKSAPATGGIKKPHRFRPGTVALRE M M M M ARTKQTARKSTGAKAPRKQLASKAARKSAPATGGIKKPHRFRPGTVALRE M M M M ARTKQTARKSTGAKAPRKQLASKAARKSAPATGGIKKPHRFRPGTVALRE ……………… and many many more! Protocol LTQ-ETD/PTR LTQ-FTMS Glu-C generated N-terminal H3 peptide (1-50) N 50 4 9 14 18 23 27 36 37 546.3 547.6 +10 +11 • Isolate m/z ± 0.5 Da +9 +12 549.1 550.4 551.9 +8 +7 m/z • 60 ms ETD 544.9 •~ 3 min acquisition 245.2 + 10 charge states 346.3 982.5 502.4 D 1.4 Da D 1.4 Da 824.5 D 1.4 Da 892.5 630.5 731.5 672.3 288.1 1647.9 1055.6 571.3 479.9 802.5 958.6 1715.0 1216.7 401.8 1129.6 1255.2 1616.0 m/z m/z 1784.1 1878.2 1515.4 1373.8 1424.8 1937.8 Group ‘4’: 4 Acetyl Groups Relative Abundance 100 c2 c3 z9 c4 z7 z2 z4 z3 * * * c5 c6 z5 z6 c7 * * * ** * c8 * * * ** z14 c9 z10c10 z12 z11 * c11c12 c13 * ** * * * z15 * c16 z16 * c17 0 400 M M 800 Ac Ac m/z Ac Ac 1200 1600 M AR T K Q TAR K S T GAKAP R K Q LAS KAAR K SAPAT G G I K K P H R F R P G T VAL R E M M M Ac Ac Ac Ac AR T K Q TAR K S T GAKAP R K Q LAS KAAR K SAPAT G G I K K P H R F R P G T VAL R E M Ac Ac Ac Ac M M A R T K Q TA R K S T GA K A P R K Q LA S K AA R K S A PAT G G I K K P H R F R P G T V A L R E 2000 Group ‘5’: 5 Acetyl Groups c4 Relative Abundance 100 c2 c 3 K4: trimethyl c6 c5 z3 z2 * *z * z5 z6z 4 7 * * c7 z15 z11 c11 * * c8 * z12 * c z14 * c16z16z17 c z10c9 c10* c12 17 c13 14 * * * * * 0 400 600 z9 800 1000 1200 1400 1600 1800 2000 m/z M M M Ac Ac Ac Ac Ac AR T K Q TAR K S T GAKAP R K Q LAS KAAR K SAPAT G G I K K P H R F R P G T VAL R E Proteomics Informatics – Protein characterization I: post-translational modifications (Week 10)