Lecture 10 Interpretation of Mass Spectra A. Peptide Mass Fingerprinting B. MS/MS sequencing Oct 2010 SDMBT 1 General workflow for proteomic analysis Sample Sample preparation Protein mixture Sample separation and visualisation Comparative analysis Peptides Digestion Mass spectrometry MS data Database search Protein identification Oct 2010 SDMBT 2 Peptide Mass Fingerprinting (PMF) Protein separated on 2D-gel Tryptic digest Experimentally On MALDI-TOF match Virtual Tryptic Digest all known proteins Peptides Trypsin cuts at C-terminal side of lysine and arginine – size of peptides unique for each protein Oct 2010 SDMBT 3 Peptide Mass Fingerprinting (PMF) Recall: Tryptic digest of β-casein Major peaks at: 646 742 748 780 830 2186 King’s College London (Pierce) Oct 2010 SDMBT 4 Peptide Mass Fingerprinting (PMF) Virtual peptide digest Amino acid Sequence from GenBank http://www.ncbi.nlm.nih.gov/entrez/ Peptide Mass Fingerprinting (PMF) Virtual peptide digest Convert to FASTA format http://bioinformatics.org/sms2/genbank_fasta.html Peptide Mass Fingerprinting (PMF) Virtual peptide digest http://www.expasy.org/ Peptide Mass Fingerprinting (PMF) Virtual peptide digest Results of virtual tryptic digest Compare with experimental peaks in MALDI-TOF of tryptic digest Major peaks at: 646 742 748 780 830 2186 Peptide mass fingerprinting (PMF) Peptide masses are matched against theoretical digests of proteins in databases Matches are ranked by the number of matching peptides Confidence in the identity is given by •a large gap in the number of matching peptides between the 1st and 2nd ranked protein •good coverage of the 1st ranked protein with the experiment results Oct 2010 SDMBT 9 Peptide Mass Fingerprinting (PMF) Variables for database search Choice of database (public or private) Species of origin Molecular weight and pI range Enzyme used for digest Modifications (reduction, alkylation, phosphorylation) Tolerance Oct 2010 SDMBT 10 PMF using MS-FIT http://prospector.ucsf.edu/ Oct 2010 SDMBT 11 PMF using MS-FIT Choice of database Choice of enzyme Oct 2010 SDMBT 12 PMF using MS-FIT Tolerance Choice of modifications Peaks entered here Oct 2010 SDMBT 13 Peptide Mass Fingerprinting (PMF) results for tryptic digest of β-casein Same protein across 4 similar species Oct 2010 SDMBT 14 Peptide Mass Fingerprinting (PMF) results for tryptic digest of β-casein Does this agree with position in 2D-gel? Note: do not need match all peaks or whole protein to identify protein! Oct 2010 SDMBT 15 Limitations of PMF This method assumes that databases are complete but the genomes of only some organisms are completely sequenced, high confidence matches might not be available But homology between organisms allow for good results No information about amino acid sequence, only identity of protein. The amino acid sequence in slide 15 is only the ‘predicted sequence’ based on virtual digest. Oct 2010 SDMBT 16 Peptide Mass Fingerprinting (PMF) Database search is only good as the database and the input data e.g. MALDI spectra often have peaks due to trypsin autolysis and keratin degradation (Promega) Oct 2010 SDMBT 17 Peptide Mass Fingerprinting (PMF) If the MS is too noisy….. Real world MS data (L&T Inc) Oct 2010 SDMBT 18 Peptide Mass Fingerprinting (PMF) Exercise: Identify this protein Oct 2010 SDMBT 19 MS/MS sequencing Fragmentation of peptides causes cleavages along the peptide backbone Comparison of MS-MS spectra allows in theory determination of possible amino acid sequences manually (slides 21-33) Sequences matched to databases to determine identity and sequence of proteins (slides 34 onward) Adds another layer of certainty in the identification of the peptide and hence to the protein Oct 2010 SDMBT 20 MS/MS sequencing TRYPTIC PEPTIDES IN MS/MS C-terminal always Arginine (R) or Lysine (K) By convention N-terminal on left N-terminal of peptide Trypsin cuts C terminal side of R/K Proteins digested into peptides by trypsin All tryptic peptides have similar structure – because digested by trypsin When peptides ionised usually– 2+ charge on either end of peptide MS/MS fragmentation of peptide in 6 ways leads to ….. By convention, ion fragments are called…. IMPORTANT Although 6 possible ways, generally b and y ions are most common It is in general not always to predict what sort of ions will be produced Explain how does ionisation break up? In theory 8 y-ions and b-ions possible but not all may be observed Left-hand side N-terminus Right-hand side C-terminus Residue mass of amino acid C-terminal Residue mass+19 N-terminal Residue mass+1 In practice, not all y and b ions observed (cannot be predicted) MS/MS sequencing Difference betw y ions= Residue mass (see next page) Just looking at the y ions y7 y-ions contain the C-terminus y6 Gly (G) Ala (A) y4 y3 57.1 Ala (A) Cys (C) 70.9 102.8 57.3 71 y5 y2 246.2 therefore … Gly (G) 303.3 374.2 AGCAG….CO2H 477.0 534.3 605.3 Residue masses of amino acids Residue mass = Molecular weight of amino acid –18 (2xH + 1xO) Note: some have very similar molecular weights letter name mass, Da letter name mass, Da G glycine 57.02 D aspartic acid 115.03 A alanine 71.04 Q glutamine 128.06 S serine 87.03 K lysine 128.09 P proline 97.05 E glutamic acid 129.04 V valine 99.07 M methionine 131.04 T threonine 101.05 H histidine 137.06 C cysteine 103.01 F phenylalanine 147.07 I isoleucine 113.08 R arginine 156.10 L leucine 113.08 Y tyrosine 163.06 N asparagine 114.04 W tryptophan 186.08 (N.S. Weld) Oct 2010 SDMBT 26 MS/MS sequencing Just looking at the b ions b-ions contain the N-terminus b2 Ala (A) Ala (A) Gly (G) 71.1 57.2 b3 Cys (C) b5 b6 102.8 b4 70.3 Gly (G) Ala (A) b8 b7 57.5 170.9 242.0 299.2 therefore … NH2-…….AGCAGA 402.0 472.5 70.5 530.2 600.7 MS/MS sequencing Combine the results….. from y-ions… …….AGCAG….CO2H from b-ions … NH2-…….AGCAGA…. Partial sequence - NH2-….AGCAGA….CO2H Need to know how to interpret MS – which peaks are y- and b-? Which are y2, y3 etc? Difficult to tell the amino acids at the beginning and the end MS/MS sequencing Useful numbers and Hints for MS-MS spectra ym ions - add all m residue masses + 19 bn ions – add all n residue masses + 1 cm ions – add all m residue masses +17 zn ions – add all n residue masses + 2 am ions – add all m residue masses - 27 xn ions – add all n residue masses + 45 MS/MS sequencing Where do these numbers come from? + NH3 O CH3 OH NH + NH H3N H O Definition of residue mass of amino acid = Molecular weight of amino acid – 18 (2xH + 1xO) O b ion has 1 extra hydrogen Compared to “residue mass of amino acid” CH3 CH3 OH H2N b-ion (b1) C + O HN H O MS/MS sequencing + NH3 Where do these numbers come from? O CH3 OH NH + NH H3N H O O NH2 Residue Mass of Gly y-ion (y2) Residue Mass of Lys NH2 O O + H3N OH H NH NH NH H OH O O Residue mass of Gly+Lys + 2xH + 1xH+1xO = sum of residue masses+19 MS/MS sequencing Draw the a,b,c and x,y,z ions from this dipeptide and Calculate the m/z ratios NH O NH OH H2N O H2N MS/MS sequencing CH3 O O NH H2N OH NH H3C CH3 O NH2 Draw the a,b,c and x,y,z ions from this tripeptide And calculate the m/z ratio MS/MS sequencing Peptide after ionisation by MALDI or ESI Fragmentation experimental match Virtual Fragmentation Fragment peptides Oct 2010 SDMBT 34 eg peptide from human catalase LSQEDPDYGIR Protein Prospector – MS-Product http://prospector.ucsf.edu/ Paste amino acid sequence All predicted a, b, y ions etc. MS-MS data – amino acid sequence – protein identification e.g. if MS-MS of a A peptide of mass 1292.61 has the following peaks 1179.53 1092.50 964.44 835.39 720.37 623.31 508.29 345.22 288.20 175.12 First number - must be mass of peptide+1 i.e. [M+H]+ In ESI-MS tryptic peptide is usually 2+ – it is actually [M+2H]2+ MS/MS sequencing Output – protein identified MS/MS sequencing Each of the fragments identified as y or b ions – the user does not have to assign the peaks or work out residual masses MS/MS sequencing More complex example….. MS-MS of a peptide with mass 1217.58 with peaks at 1088.54 975.46 847.40 746.35 631.32 457.28 358.21 243.13 300.16 371.19 Yeast alcohol dehydrogenase – But deliberately missed out one y ion and all except 3 b ions Still able to identify the protein. Even though info incomplete All peaks identified as y or b ions