PPTX - HUPO Proteomics Standards Initiative

advertisement
Modification Site Localization
•Why is this a problem?
•Calculating localization reliability
•Ways of representing reliability
•Modification ambiguity
PTM Analysis: An Exploding Field
•Large-scale PTM characterization studies are now common
•Phosphorylation
•O-GlcNAcylation
•Acetylation
•…
•Database search engines can identify modified peptides and report
a measure of reliability for peptide IDs
•Peptide Level: p-value; e-value
•Dataset Level: FDR
•Most search engines do not assess modification site assignment
reliability.
•No standard FLR calculation method
Search Engine Performance for Site Assignment
•Database search engines are optimized for peptide identification
•Optimal parameters for discriminating between correct and random
answers are not same as for site identification
•More peaks may be needed for site assignment
•Reliability of modified peptide identifications is higher than PTM site
assignments
•What most search engines do:
•Report site consistent with data
•May be more than one site equally consistent with the data
•No information about how reliable site assignment is
Bradshaw et al. J Mass Spectrom (2010) 45 10 1095-1097
There are Mistakes In The Literature
•There are several large-scale PTM datasets where site assignment was
‘by manual verification’.
•Did authors carefully look at 1000+ spectra?
•Results from publications are used to populate other databases
Phosphosite
SwissProt
Evidence for Serine 486 Phosphorylation
•Spectrum from publication reporting unambiguous assignment of serine
4 (serine 487) phosphorylation.
Annotated spectra associated with publications are useful!
Why I highlighted this example
•I found this modification site in my own data in 2006
SwissProt Entry of this protein in 2006
Site Assignment Scoring Methods (1)
Probability of randomly observing a given peak
•A-Score (Gygi)
•PTM Score (Mann)
•Probability calculation based on unit mass measurement and assuming
all masses equally possible at random:
•e.g. if considering 4 peaks per 100 Da, then probability of random
match of a given peak is 4%
•A-score is a number; PTM score reports a probability
•How valid are these assumptions?
•Nominal mass may be appropriate for poor mass accuracy ion trap
data, but not for high mass accuracy data
•Could adjust probability calculation to more mass ‘bins’
•All masses are not equally probable; e.g. for b ions:
•201 – EA, LP, IP, TV
204 – Not possible
•202 – NS
205 – FG, CT
•203 – MA, CV, TT
206 – Not possible
Site Assignment Scoring Methods (2)
Score/probability difference
•Compare search engine probabilities for peptide IDs with different site
assignments
•Mascot Delta Score
•SLIP Score
e.g. Top scoring assignment:
E-value: 1E-5
Next best site assignment:
E-value 1E-4; SLIP score=10
Next best site assignment:
E-value 1E-3; SLIP score=20
Advantages:
•Can be calculated as part of database search
•Accounts for variation of probability of observing different masses
•If search engine makes use of mass accuracy, score will adjust to data of
different mass accuracy
Assessing Reliability of Site Localization Scoring
•Data from 180 synthetic phosphopeptides
•Tested with wide range of fragmentation data (CID, HCD, ETD, MSA…)
•Comparison of Mascot Delta Score to A-score
•SLIP Score in Protein Prospector
•PhosphoRS used different set of synthetic phosphopeptides
Savitski et al. Mol Cell Proteomics (2011) M110.003830
SLIP Score vs A-Score vs MD-Score
Dataset: QTOF Micro CID Data of 180 synthetic phosphopeptides1
•Modification sites known
Data Searched by Mascot:
Data Searched by PP:
2174 correct spectra matches
2334 correct spectra matches
SLIP Score
A-Score
MD-Score
Site IDs
2053
1584
1840
Incorrect Sites
130
138
201
FLR
6.3%
8.7%
10.9%
1 Site Possible
164
Ambiguous
220
590
334
Baker et al. Mol Cell Proteomics (2011) M111.008078
Decoy Sites for Estimating PEP (Local FLR)
•Test Dataset: Synaptic phosphopeptides acquired in LTQ-Orbitrap Velos
(IT-CID): 70,000 phosphopeptide spectra identified
•Altered Batch-Tag to allow for phosphorylation of Pro and Glu
•Filtered results to only phosphopeptide IDs containing one S, T or Y
•Modification site known
0.40
0.35
Local FLR
0.30
0.25
0.20
Phospho E
0.15
Phospho P
0.10
SLIP Score
0.05
0.00
0
2
4
6
8
10
12
14
16
18
20
SLIP
PP
SiteScore
Score
•Local FLR: SLIP score of 6 = 95% correct
•Global FLR (matches to phosphoP and phosphoE) similar to QTOF Micro data.
•Similar score threshold appropriate for ion trap CID and quadrupole CID data
Representing Ambiguity
VATVSVLATR – Singly phosphorylated
Phospho@5=3
Best site assignment with associated score. No information as to which is second
best site.
Example software: A-Score; Mascot Delta Score; SLIP Score
Phospho@3|5
Indicating inability to differentiate between two sites, either due to no
information, or confidence below a defined threshold
Example software: SLIP Score; VML Score
VAT(0.1)VS(0.89)VLAT(0.01)R
Probabilities for all potential site assignments within peptide are reported
Example software: PTM Score / MaxQuant; PhosphoRS
Representing Ambiguity
VATVSVLATR – Doubly phosphorylated
Phospho@3=12; Phospho@5=3
Best site assignments with associated scores. Separate score calculated for each
site assignment. Score is in comparison to best assignment not containing a
particular modification site; i.e. @3 is relative to when residues 5 and 9 are
modified.
Phospho@3=12; Phospho@5|9
One site has confidence measure; other site does not.
VAT(0.95)VS(0.9)VLAT(0.15)R
Probabilities are combination probabilities for one of the two modifications.
Site-Level or Peptide-Level Assesment for Localization
Reliability
All current software reports reliability for individual site localizations, but
software could in theory calculate a reliability for the combination of
modifications reported:
e.g. VAT(0.95)VS(0.9)VLAT(0.15)R
Could be reported as VAT(phospho)VS(phospho)VLATR with probability
(0.95x0.9=) 0.86
Modification Ambiguity
•Some modifications are isobaric
•Acetyl vs Trimethyl; Phospho vs Sulfo; Ser->Thr vs Methyl
•Some combinations of modifications are isobaric /isomeric with a single
modification
•Methyl + Methyl vs Dimethyl
•Carbamidomethyl + Carbamidomethyl vs GlyGly (ubiquitin)
•Carbamidomethyl + methyl vs propionamide (acrylamide)
•Acetyl + K+/Ca2+ adduct vs phospho
Modification Ambiguity
•Many of the published site localization software were specifically written for
phospho, so will not work for other PTMs.
•Site localization scoring based on search engine results should work for all
modifications
•SLIP score; Mascot Delta score; VML score
•However, they will only be meaningful if the competing modification alternatives
were considered in the initial database search
•If carbamidomethyl modification of lysines or N-termini in addition to
cysteines was not considered, then two carbamidomethyl modifications may
not be considered as an alternative to ubiquitination.
•Knowledge of modifications considered relevant to evaluating site
localization reliability
PTMs in Crosslinked Peptides
For crosslinked peptides, ambiguity may be between peptides:
CAMKER
TMAKER
Oxidation could be on methionine in either peptide.
What is an Acceptable FLR?
•2012 iPRG study involved identification of modified peptides
•Participants were asked to return results with 1% FDR at PSM level
•They were asked to indicate for which peptides they thought PTM site
assignments were reliable
•Modified peptides were spiked in, so correct site localizations were known
What was reliability of results reported?
16
14
12
10
8
6
4
2
0.01 <5%
11821
45511
14152
47603
52781
14151
0.5 10% <30%
74564
<1
23117
34284i
40104i
92653
1%
23068
77777i
42424i
97053i
87133i
58409
94158i
1-2%
87048i
1%
11211
5%
93128i
5%
33564
58288v
0
71755v
Spiked Peptide PSM FLR (%)
18
<1%
Download