Protein sequencing and Mass Spectrometry Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation Single Stage MS Mass Spectrometry LC-MS: 1 MS spectrum / second Tandem MS Secondary Fragmentation Ionized parent peptide The peptide backbone The peptide backbone breaks to form fragments with characteristic masses. H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH N-terminus Ri-1 AA residuei-1 Ri AA residuei Ri+1 AA residuei+1 C-terminus Ionization The peptide backbone breaks to form fragments with characteristic masses. H+ H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH N-terminus Ri-1 AA residuei-1 Ri AA residuei Ri+1 AA residuei+1 Ionized parent peptide C-terminus Fragment ion generation The peptide backbone breaks to form fragments with characteristic masses. H+ H...-HN-CH-CO N-terminus Ri-1 AA residuei-1 NH-CH-CO-NH-CH-CO-…OH Ri AA residuei Ri+1 AA residuei+1 Ionized peptide fragment C-terminus Tandem MS for Peptide ID 88 S 1166 145 G 1080 292 F 1022 405 L 875 534 E 762 663 E 633 778 D 504 907 E 389 1020 L 260 1166 K 147 % Intensity 100 [M+2H]2+ 0 250 500 750 m/z 1000 b ions y ions Peak Assignment 88 S 1166 145 G 1080 292 F 1022 405 L 875 534 E 762 663 E 633 778 D 504 907 E 389 1020 L 260 1166 K 147 b ions y ions y6 100 % Intensity Peak assignment implies Sequence (Residue tag) Reconstruction! [M+2H]2+ y5 b3 y2 y7 y3 b4 y4 b5 b6 b7 b8 b9 y8 0 250 500 750 m/z 1000 y9 Database Searching for peptide ID • For every peptide from a database – Generate a hypothetical spectrum – Compute a correlation between observed and experimental spectra – Choose the best • Database searching is very powerful and is the de facto standard for MS. – Sequest, Mascot, and many others Spectra: the real story • Noise Peaks • Ions, not prefixes & suffixes • Mass to charge ratio, and not mass – Multiply charged ions • Isotope patterns, not single peaks Peptide fragmentation possibilities (ion types) xn-i yn-i vn-i yn-i-1 wn-i zn-i -HN-CH-CO-NH-CH-CO-NHRi CH-R’ i+1 ai R” i+1 bi low energy fragments ci di+1 bi+1 high energy fragments Ion types, and offsets • • • • • P = prefix residue mass S = Suffix residue mass b-ions = P+1 y-ions = S+19 a-ions = P-27 Mass-Charge ratio • The X-axis is (M+Z)/Z – Z=1 implies that peak is at M+1 – Z=2 implies that peak is at (M+2)/2 • M=1000, Z=2, peak position is at 501 – Suppose you see a peak at 501. Is the mass 500, or is it 1000? Spectral Graph 87 G 144 • Each prefix residue mass (PRM) corresponds to a node. • Two nodes are connected by an edge if the mass difference is a residue mass. • A path in the graph is a de novo interpretation of the spectrum Spectral Graph • • Each peak, when assigned to a prefix/suffix ion type generates a unique prefix residue mass. Spectral graph: – – – Each node u defines a putative prefix residue M(u). (u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0. Paths in the spectral graph correspond to a interpretation 0 87 100 S 273275 144 146 G 200 E 332 300 K 401 Re-defining de novo interpretation • Find a subset of nodes in spectral graph s.t. – – – – 0, M are included Each peak contributes at most one node (interpretation)(*) Each adjacent pair (when sorted by mass) is connected by an edge (valid residue mass) An appropriate objective function (ex: the number of peaks interpreted) is maximized G 87 0 87 273275 144 146 100 S 144 G 200 E 332 300 K 401 Two problems • Too many nodes. – Only a small fraction are correspond to b/y ions (leading to true PRMs) (learning problem) – Even if the b/y ions were correctly predicted, each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem). – In general, the forbidden pairs problem is NP-hard 0 87 100 S 273275 144 146 G 200 E 332 300 K 401 However,.. • The b,y ions have a special non-interleaving property • Consider pairs (b1,y1), (b2,y2) – If (b1 < b2), then y1 > y2 Non-Intersecting Forbidden pairs 0 87 S • • 100 G 200 E 300 332 400 K If we consider only b,y ions, ‘forbidden’ node pairs are non-intersecting, The de novo problem can be solved efficiently using a dynamic programming technique. The forbidden pairs method • There may be many paths that avoid forbidden pairs. • We choose a path that maximizes an objective function, – EX: the number of peaks interpreted The forbidden pairs method • Sort the PRMs according to increasing mass values. • For each node u, f(u) represents the forbidden pair • Let m(u) denote the mass value of the PRM. 0 87 100 u 200 300 332 f(u) 400 D.P. for forbidden pairs • Consider all pairs u,v – m[u] <= M/2, m[v] >M/2 • Define S(u,v) as the best score of a forbidden pair path from 0>u, v->M • Is it sufficient to compute S(u,v) for all u,v? 0 87 100 u 200 300 332 400 v D.P. for forbidden pairs • Note that the best interpretation is given by max ((u,v )E ) S(u,v) 0 87 100 300 200 u v 332 400 D.P. for forbidden pairs • Note that we have one of two cases. • Case 1. 1. 2. Either u < f(v) (and f(u) > v) Or, u > f(v) (and f(u) < v) – Extend u, do not touch f(v) S(u,v) max u':(u,u')E ( ) u' f (v) S(u,u') 1 0 u 100 300 200 v f(u) 400 The complete algorithm for all u /*increasing mass values from 0 to M/2 */ for all v /*decreasing mass values from M to M/2 */ if (u > f[v]) S[u,v] max (w,u)E S[w,v] 1 w f (v ) else if (u < f[v]) S[u,v] max (v,w )E S[u,w] 1 If (u,v)E w f (u) /*maxI is the score of the best interpretation*/ maxI = max {maxI,S[u,v]}