Modification Site Localization •Why is this a problem? •Calculating localization reliability •Ways of representing reliability •Modification ambiguity PTM Analysis: An Exploding Field •Large-scale PTM characterization studies are now common •Phosphorylation •O-GlcNAcylation •Acetylation •… •Database search engines can identify modified peptides and report a measure of reliability for peptide IDs •Peptide Level: p-value; e-value •Dataset Level: FDR •Most search engines do not assess modification site assignment reliability. •No standard FLR calculation method Search Engine Performance for Site Assignment •Database search engines are optimized for peptide identification •Optimal parameters for discriminating between correct and random answers are not same as for site identification •More peaks may be needed for site assignment •Reliability of modified peptide identifications is higher than PTM site assignments •What most search engines do: •Report site consistent with data •May be more than one site equally consistent with the data •No information about how reliable site assignment is Bradshaw et al. J Mass Spectrom (2010) 45 10 1095-1097 There are Mistakes In The Literature •There are several large-scale PTM datasets where site assignment was ‘by manual verification’. •Did authors carefully look at 1000+ spectra? •Results from publications are used to populate other databases Phosphosite SwissProt Evidence for Serine 486 Phosphorylation •Spectrum from publication reporting unambiguous assignment of serine 4 (serine 487) phosphorylation. Annotated spectra associated with publications are useful! Why I highlighted this example •I found this modification site in my own data in 2006 SwissProt Entry of this protein in 2006 Site Assignment Scoring Methods (1) Probability of randomly observing a given peak •A-Score (Gygi) •PTM Score (Mann) •Probability calculation based on unit mass measurement and assuming all masses equally possible at random: •e.g. if considering 4 peaks per 100 Da, then probability of random match of a given peak is 4% •A-score is a number; PTM score reports a probability •How valid are these assumptions? •Nominal mass may be appropriate for poor mass accuracy ion trap data, but not for high mass accuracy data •Could adjust probability calculation to more mass ‘bins’ •All masses are not equally probable; e.g. for b ions: •201 – EA, LP, IP, TV 204 – Not possible •202 – NS 205 – FG, CT •203 – MA, CV, TT 206 – Not possible Site Assignment Scoring Methods (2) Score/probability difference •Compare search engine probabilities for peptide IDs with different site assignments •Mascot Delta Score •SLIP Score e.g. Top scoring assignment: E-value: 1E-5 Next best site assignment: E-value 1E-4; SLIP score=10 Next best site assignment: E-value 1E-3; SLIP score=20 Advantages: •Can be calculated as part of database search •Accounts for variation of probability of observing different masses •If search engine makes use of mass accuracy, score will adjust to data of different mass accuracy Assessing Reliability of Site Localization Scoring •Data from 180 synthetic phosphopeptides •Tested with wide range of fragmentation data (CID, HCD, ETD, MSA…) •Comparison of Mascot Delta Score to A-score •SLIP Score in Protein Prospector •PhosphoRS used different set of synthetic phosphopeptides Savitski et al. Mol Cell Proteomics (2011) M110.003830 SLIP Score vs A-Score vs MD-Score Dataset: QTOF Micro CID Data of 180 synthetic phosphopeptides1 •Modification sites known Data Searched by Mascot: Data Searched by PP: 2174 correct spectra matches 2334 correct spectra matches SLIP Score A-Score MD-Score Site IDs 2053 1584 1840 Incorrect Sites 130 138 201 FLR 6.3% 8.7% 10.9% 1 Site Possible 164 Ambiguous 220 590 334 Baker et al. Mol Cell Proteomics (2011) M111.008078 Decoy Sites for Estimating PEP (Local FLR) •Test Dataset: Synaptic phosphopeptides acquired in LTQ-Orbitrap Velos (IT-CID): 70,000 phosphopeptide spectra identified •Altered Batch-Tag to allow for phosphorylation of Pro and Glu •Filtered results to only phosphopeptide IDs containing one S, T or Y •Modification site known 0.40 0.35 Local FLR 0.30 0.25 0.20 Phospho E 0.15 Phospho P 0.10 SLIP Score 0.05 0.00 0 2 4 6 8 10 12 14 16 18 20 SLIP PP SiteScore Score •Local FLR: SLIP score of 6 = 95% correct •Global FLR (matches to phosphoP and phosphoE) similar to QTOF Micro data. •Similar score threshold appropriate for ion trap CID and quadrupole CID data Representing Ambiguity VATVSVLATR – Singly phosphorylated Phospho@5=3 Best site assignment with associated score. No information as to which is second best site. Example software: A-Score; Mascot Delta Score; SLIP Score Phospho@3|5 Indicating inability to differentiate between two sites, either due to no information, or confidence below a defined threshold Example software: SLIP Score; VML Score VAT(0.1)VS(0.89)VLAT(0.01)R Probabilities for all potential site assignments within peptide are reported Example software: PTM Score / MaxQuant; PhosphoRS Representing Ambiguity VATVSVLATR – Doubly phosphorylated Phospho@3=12; Phospho@5=3 Best site assignments with associated scores. Separate score calculated for each site assignment. Score is in comparison to best assignment not containing a particular modification site; i.e. @3 is relative to when residues 5 and 9 are modified. Phospho@3=12; Phospho@5|9 One site has confidence measure; other site does not. VAT(0.95)VS(0.9)VLAT(0.15)R Probabilities are combination probabilities for one of the two modifications. Site-Level or Peptide-Level Assesment for Localization Reliability All current software reports reliability for individual site localizations, but software could in theory calculate a reliability for the combination of modifications reported: e.g. VAT(0.95)VS(0.9)VLAT(0.15)R Could be reported as VAT(phospho)VS(phospho)VLATR with probability (0.95x0.9=) 0.86 Modification Ambiguity •Some modifications are isobaric •Acetyl vs Trimethyl; Phospho vs Sulfo; Ser->Thr vs Methyl •Some combinations of modifications are isobaric /isomeric with a single modification •Methyl + Methyl vs Dimethyl •Carbamidomethyl + Carbamidomethyl vs GlyGly (ubiquitin) •Carbamidomethyl + methyl vs propionamide (acrylamide) •Acetyl + K+/Ca2+ adduct vs phospho Modification Ambiguity •Many of the published site localization software were specifically written for phospho, so will not work for other PTMs. •Site localization scoring based on search engine results should work for all modifications •SLIP score; Mascot Delta score; VML score •However, they will only be meaningful if the competing modification alternatives were considered in the initial database search •If carbamidomethyl modification of lysines or N-termini in addition to cysteines was not considered, then two carbamidomethyl modifications may not be considered as an alternative to ubiquitination. •Knowledge of modifications considered relevant to evaluating site localization reliability PTMs in Crosslinked Peptides For crosslinked peptides, ambiguity may be between peptides: CAMKER TMAKER Oxidation could be on methionine in either peptide. What is an Acceptable FLR? •2012 iPRG study involved identification of modified peptides •Participants were asked to return results with 1% FDR at PSM level •They were asked to indicate for which peptides they thought PTM site assignments were reliable •Modified peptides were spiked in, so correct site localizations were known What was reliability of results reported? 16 14 12 10 8 6 4 2 0.01 <5% 11821 45511 14152 47603 52781 14151 0.5 10% <30% 74564 <1 23117 34284i 40104i 92653 1% 23068 77777i 42424i 97053i 87133i 58409 94158i 1-2% 87048i 1% 11211 5% 93128i 5% 33564 58288v 0 71755v Spiked Peptide PSM FLR (%) 18 <1%