Document

advertisement
CSE182-L7
Protein sequencing and Mass Spectrometry
Fa 05
CSE182
Announcements
• Midterm 1: Nov 1, in class.
• Assignment 2: Online, due October 20.
Fa 05
CSE182
Trivia Quiz
• What research won the Nobel prize in
Chemistry in 2004?
• In 2002?
Fa 05
CSE182
How are Proteins Sequenced? Mass
Spec 101:
Fa 05
CSE182
Nobel Citation 2002
Fa 05
CSE182
Nobel Citation, 2002
Fa 05
CSE182
Mass Spectrometry
Fa 05
CSE182
Sample Preparation
Enzymatic Digestion
(Trypsin)
+
Fractionation
Fa 05
CSE182
Single Stage MS
Mass
Spectrometry
LC-MS: 1 MS spectrum / second
Fa 05
CSE182
Tandem MS
Secondary Fragmentation
Ionized parent peptide
Fa 05
CSE182
The peptide backbone
The peptide backbone breaks to form
fragments with characteristic masses.
H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH
N-terminus
Ri-1
AA residuei-1
Fa 05
Ri
AA residuei
CSE182
Ri+1
AA residuei+1
C-terminus
Ionization
The peptide backbone breaks to form
fragments with characteristic masses.
H+
H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH
N-terminus
Ri-1
AA residuei-1
Ri
AA residuei
Ri+1
AA residuei+1
Ionized parent peptide
Fa 05
CSE182
C-terminus
Fragment ion generation
The peptide backbone breaks to form
fragments with characteristic masses.
H+
H...-HN-CH-CO
N-terminus
Ri-1
AA residuei-1
NH-CH-CO-NH-CH-CO-…OH
Ri
AA residuei
Ri+1
AA residuei+1
Ionized peptide fragment
Fa 05
CSE182
C-terminus
Tandem MS for Peptide ID
88
S
1166
145
G
1080
292
F
1022
405
L
875
534
E
762
663
E
633
778
D
504
907
E
389
1020
L
260
1166
K
147
% Intensity
100
[M+2H]2+
0
250
Fa 05
500
750
m/z
CSE182
1000
b ions
y ions
Peak Assignment
88
S
1166
145
G
1080
292
F
1022
405
L
875
534
E
762
663
E
633
778
D
504
907
E
389
1020
L
260
1166
K
147
b ions
y ions
y6
100
% Intensity
Peak assignment implies
Sequence (Residue tag)
Reconstruction!
[M+2H]2+
y5
b3
y2
y7
y3
b4
y4 b5
b6
b7
b8
b9 y8
0
250
Fa 05
500
750
m/z
CSE182
1000
y9
Database Searching for peptide ID
• For every peptide from a database
– Generate a hypothetical spectrum
– Compute a correlation between observed and
experimental spectra
– Choose the best
• Database searching is very powerful and is the de
facto standard for MS.
– Sequest, Mascot, and many others
Fa 05
CSE182
Spectra: the real story
• Noise Peaks
• Ions, not prefixes & suffixes
• Mass to charge ratio, and not mass
– Multiply charged ions
• Isotope patterns, not single peaks
Fa 05
CSE182
Peptide fragmentation possibilities
(ion types)
xn-i
yn-i
vn-i
yn-i-1
wn-i
zn-i
-HN-CH-CO-NH-CH-CO-NHRi
CH-R’
i+1
ai
R”
i+1
bi
Fa 05
low energy fragments
ci
di+1
CSE182
bi+1
high energy fragments
Ion types, and offsets
•
•
•
•
•
P = prefix residue mass
S = Suffix residue mass
b-ions = P+1
y-ions = S+19
a-ions = P-27
Fa 05
CSE182
Mass-Charge ratio
• The X-axis is (M+Z)/Z
– Z=1 implies that peak is at M+1
– Z=2 implies that peak is at (M+2)/2
• M=1000, Z=2, peak position is at 501
– Suppose you see a peak at 501. Is the mass 500, or is it
1000?
Fa 05
CSE182
Isotopic peaks
• Ex: Consider peptide SAM
• Mass = 308.12802
• You should see:
308.13
• Instead, you see
308.13
Fa 05
CSE182
310.13
Isotopes
• C-12 is the most common. Suppose C-13 occurs with
probability 1%
• EX: SAM
– Composition: C11 H22 N3 O5 S1
• What is the probability that you will see a single C-13?
•
11
10
  0.01 (0.99)
1have
 isotopes. Can you compute the
Note that C,S,O,N all
isotopic distribution?
Fa 05

CSE182
All atoms have isotopes
• Isotopes of atoms
– O16,18, C-12,13, S32,34….
– Each isotope has a frequency of occurrence
• If a molecule (peptide) has a single copy of C-13, that will
shift its peak by 1 Da
• With multiple copies of a peptide, we have a distribution of
intensities over a range of masses (Isotopic profile).
• How can you compute the isotopic profile of a peak?
Fa 05
CSE182
Isotope Calculation
• Denote:
– Nc : number of carbon atoms in the peptide
– Pc : probability of occurrence of C-13 (~1%)
– Then
N  0
N
Pr[Peak at M]   C pc 1 pc  C
 0 
Nc=50
+1
NC  1
NC 1
Pr[Peak at M  1] =  pc 1 pc 
 1 
Nc=200
+1
Fa 05
CSE182
Isotope Calculation Example
•
•
•
•
•
•
Suppose we consider Nitrogen, and Carbon
NN: number of Nitrogen atoms
PN: probability of occurrence of N-15
Pr(peak at M)
Pr(peak at M+1)?
Pr(peak at M+2)?
N  0
N N  0
N
Pr[Peak at M]   C pc 1 pc  C  N pN 1 pN  N
 0 
 0 
N  1
N 1N  0
N
Pr[Peak at M  1] =  C pc 1 pc  C  N pN 1 pN  N
 1 
 0 
N  0
N N  1
N 1
  C pc 1 pc  C  N pN 1 pN  N
 0 
 1 
How do we generalize? How can we handle Oxygen (O-16,18)?
Fa 05
CSE182
General isotope computation
• Definition:
– Let pi,a be the abundance of the isotope with mass i Da above
the least mass
– Ex: P0,C : abundance of C-12, P2,O: O-18 etc.
• Characteristic polynomial
 (x)  a p0,a  p1,a x  p2,a x 

in (x) (a binomial convolution)
2
• Prob{M+i}: coefficient of
xi

Fa 05
CSE182
Na
Isotopic Profile Application
•
•
•
In DxMS, hydrogen atoms are exchanged with deuterium
The rate of exchange indicates how buried the peptide is (in
folded state)
Consider the observed characteristic polynomial of the isotope
profile t1, t2, at various time points. Then
t (x)  t1(x)( p0, H  p1, H )N
•
•
2
H
The estimates of p1,H can be obtained by a deconvolution
Such estimates at various time points should give the rate of
incorporation of Deuterium, and therefore, the accessibility.

Fa 05
CSE182
Quiz
• How can you determine the charge on a peptide?
 Difference between the first and second isotope
peak is 1/Z
 Proposal:
Given a mass, predict a composition, and the isotopic
profile
 Do a ‘goodness of fit’ test to isolate the peaks
corresponding to the isotope
 Compute the difference
Fa 05
CSE182
Tandem MS summary
• The basics of peptide ID using tandem MS is
simple.
– Correlate experimental with theoretical spectra
• In practice, there might be many confounding
problems.
• A toolkit that resolves some of these problems
will be useful.
Fa 05
CSE182
MS Quiz:
• Why aren’t all tandem MS peaks of the same
intensity?
• Do the intensities for a peptide vary from
spectrum to spectrum?
Fa 05
CSE182
De novo interpretation of mass spectra
• The so called de novo algorithms focus exclusively
on the D module.
• There is no database (I/F).
• Limited scoring and validation
Fa 05
CSE182
Computing possible prefixes
•
•
•
•
•
•
We know the parent mass M=401.
Consider a mass value 88
Assume that it is a b-ion, or a y-ion
If b-ion, it corresponds to a prefix of the peptide with residue mass 88-1 =
87.
If y-ion, y=M-P+19.
– Therefore the prefix has mass
• P=M-y+19= 401-88+19=332
Compute all possible Prefix Residue Masses (PRM) for all ions.
Fa 05
CSE182
Putative Prefix Masses
• Only a subset of the prefix
masses are correct.
• The correct mass values
form a ladder of aminoacid residues
Prefix Mass
M=401
88
145
147
276
S
0
Fa 05
b
87
144
146
275
y
332
275
273
144
G
E
K
87 144
273
401
CSE182
Spectral Graph
87
Fa 05
G
144
• Each prefix residue mass
(PRM) corresponds to a
node.
• Two nodes are connected
by an edge if the mass
difference is a residue
mass.
• A path in the graph is a de
novo interpretation of the
spectrum
CSE182
Spectral Graph
•
•
Each peak, when assigned to a prefix/suffix ion type generates a unique
prefix residue mass.
Spectral graph:
–
–
–
Each node u defines a putative prefix residue M(u).
(u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0.
Paths in the spectral graph correspond to a interpretation
0
87
100
S
Fa 05
273275
144 146
G
200
332
300
E
K
CSE182
401
Re-defining de novo interpretation
•
Find a subset of nodes in spectral graph s.t.
–
–
–
–
0, M are included
Each peak contributes at most one node (interpretation)(*)
Each adjacent pair (when sorted by mass) is connected by an edge (valid residue
mass)
An appropriate objective function (ex: the number of peaks interpreted) is
maximized
G
87
0
87
Fa 05
273275
144 146
100
S
144
G
200
332
300
E
K
CSE182
401
Two problems
• Too many nodes.
– Only a small fraction are correspond to b/y ions (leading to true
PRMs) (learning problem)
– Even if the b/y ions were correctly predicted, each peak generates
multiple possibilities, only one of which is correct. We need to find a
path that uses each peak only once (algorithmic problem).
– In general, the forbidden pairs problem is NP-hard
0
87
100
S
Fa 05
273275
144 146
G
200
332
300
E
K
CSE182
401
However,..
• The b,y ions have a special non-interleaving
property
• Consider pairs (b1,y1), (b2,y2)
– If (b1 < b2), then y1 > y2
Fa 05
CSE182
Non-Intersecting Forbidden pairs
0
87
S
•
•
100
G
200
300
E
332
400
K
If we consider only b,y ions, ‘forbidden’ node pairs are non-intersecting,
The de novo problem can be solved efficiently using a dynamic programming
technique.
Fa 05
CSE182
The forbidden pairs method
• There may be many paths that avoid forbidden
pairs.
• We choose a path that maximizes an objective
function,
– EX: the number of peaks interpreted
Fa 05
CSE182
The forbidden pairs method
• Sort the PRMs according to increasing mass values.
• For each node u, f(u) represents the forbidden pair
• Let m(u) denote the mass value of the PRM.
0
87
100
300
200
f(u)
u
Fa 05
332
CSE182
400
D.P. for forbidden pairs
• Consider all pairs u,v
– m[u] <= M/2, m[v] >M/2
• Define S(u,v) as the best score of a forbidden pair path from 0>u, v->M
• Is it sufficient to compute S(u,v) for all u,v?
0
87
100
300
200
u
Fa 05
332
400
v
CSE182
D.P. for forbidden pairs
• Note that the best interpretation is given by
max ((u,v )E ) S(u,v)

Fa 05
0
87
100
300
200
u
v
CSE182
332
400
D.P. for forbidden pairs
•
Note that we have one of two cases.
•
Case 1.
1.
2.
Either u < f(v) (and f(u) > v)
Or, u > f(v) (and f(u) < v)
–
Extend u, do not touch f(v)
S(u,v)  max
u':(u,u')E
(
)
u' f (v)
S(u,u')  1

0
u
100
300
200
v
Fa 05
CSE182
f(u)
400
The complete algorithm
for all u /*increasing mass values from 0 to M/2 */
for all v /*decreasing mass values from M to M/2 */
if (u > f[v])
S[u,v]  max (w,u)E  S[w,v]  1
else if (u < f[v])


 w f (v ) 
If (u,v)ES[u,v]  max (v,w )E  S[u,w]  1

/*maxI is the score of the best interpretation
*/
 w f (u)

 maxI = max {maxI,S[u,v]}

Fa 05
CSE182
De Novo: Second issue
• Given only b,y ions, a forbidden pairs path will solve the
problem.
• However, recall that there are MANY other ion types.
–
–
–
–
Fa 05
Typical length of peptide: 15
Typical # peaks? 50-150?
#b/y ions?
Most ions are “Other”
• a ions, neutral losses, isotopic peaks….
CSE182
De novo: Weighting nodes in Spectrum Graph
• Factors determining if the ion is b or y
– Intensity
– Support ions
– Isotopic peaks (InsPecT’)
Fa 05
CSE182
De novo: Weighting nodes
• A
probabilistic
network to
model support
ions (Pepnovo)
Fa 05
CSE182
De Novo Interpretation Summary
• The main challenge is to separate b/y ions from
everything else (weighting nodes), and separating
the prefix ions from the suffix ions (Forbidden
Pairs).
• As always, the abstract idea must be
supplemented with many details.
– Noise peaks, incomplete fragmentation
– In reality, a PRM is first scored on its likelihood of being correct, and
the forbidden pair method is applied subsequently.
Fa 05
CSE182
Fa 05
CSE182
Download