Protein Sequencing and Identification by Mass Spectrometry Motivation • Proteins are working units of the cells – The number of found genes is much less than the number of expressed proteins – Directly related with cell processes and diseases DNA SNP ~30,000 human genes mRNA Protein Alternative Post-translational splicing Modification >100,000 RNA messages >1,000,000 distinct protein forms Breaking Protein into Peptides and Peptides into Fragment Ions • Proteases, e.g. trypsin, break protein into peptides. • A Tandem Mass Spectrometer further breaks the peptides down into fragment ions and measures the mass of each piece. • Mass Spectrometer accelerates the fragmented ions; heavier ions accelerate slower than lighter ones. • Mass Spectrometer measures mass/charge ratio of an ion. Peptide Fragmentation Collision Induced Dissociation H+ H...-HN-CH-CO Ri-1 Prefix Fragment . . . NH-CH-CO-NH-CH-CO-…OH Ri Ri+1 Suffix Fragment • Peptides tend to fragment along the backbone. • Fragments can also lose neutral chemical groups like NH3 and H2O. Ideal Mass Spectrum Real Mass Spectrum Mass Spectra 57 Da =K‘G’ D D V 99 Da = ‘V’ L L H2O G D K V G mass 0 • The peaks in the mass spectrum: – Prefix and Suffix Fragments. – Fragments with neutral losses (-H2O, -NH3) – Noise and missing peaks. Protein Identification with MS/MS G V D K Peptide Identification: Intensity MS/MS L mass 00 Protein Identification by Tandem Mass Spectrometry MS/MS instrument S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6 T: + c d Full ms2 638.00 [ 165.00 - 1925.00] 850.3 100 95 687.3 90 85 588.1 80 75 70 65 Relative Abundance S e q u e n c e 60 55 851.4 425.0 50 45 949.4 40 326.0 35 Database search •Sequest de Novo interpretation •Sherenga 524.9 30 25 20 589.2 226.9 1048.6 1049.6 397.1 489.1 15 10 629.0 5 0 200 400 600 800 1000 m/z 1200 1400 1600 1800 2000 De novo Peptide Sequencing S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6 T: + c d Full ms2 638.00 [ 165.00 - 1925.00] 850.3 100 95 687.3 90 85 588.1 80 75 70 Relative Abundance 65 60 55 851.4 425.0 50 45 949.4 40 326.0 35 524.9 30 25 20 589.2 226.9 1048.6 1049.6 397.1 489.1 15 10 629.0 5 0 200 400 600 800 1000 m/z 1200 1400 Sequence 1600 1800 2000 Building Spectrum Graph • How to create vertices (from masses) • How to create edges (from mass differences) • How to score paths • How to find best path Edges of Spectrum Graph • Two vertices with mass difference corresponding to an amino acid A: – Connect with an edge labeled by A • Gap edges for di- and tri-peptides References • Neil C. Jones and Pavel A. Pevzner, An Introduction to Bioinformatics Algorithms, MIT Press, 2004. • Dancik V, Addona TA, Clauser KR, Vath JE, and Pevzner PA., “De novo peptide sequencing via tandem mass spectrometry”, J Comput Biol. 1999 FallWinter;6(3-4):327-42.