Proteomics of Tumor Extracellular Matrix The “Matrisome Project” Karl Clauser Proteomics Platform of the Broad Institute Alexandra Naba Koch Institute for Cancer Research @ MIT – Hynes Lab The ECM is a prominent part of the tumor microenvironment Tumor cells Normal epithelial cells Basement membrane Immune cells: macrophages, lymphocytes… Fibroblasts ECM Lymphatic vessels Blood vessels endothelial cells, pericytes Adapted from Joyce JA. and Pollard JW., Nature Review Cancer (2009) The ECM gene expression is dysregulated in tumors Fibronectin / Smooth Muscle Actin Normal pancreatic islet arteriole Normal Pancreas Tumor Marked up-regulation of fibronectin-rich matrix around vascularized RIPTAg pancreatic tumors Astrof S. et al., MCB (2004) ECM organization is often altered in tumors GPR56 TGM2 Merge (blue: DAPI – nuclei) Non-metastatic melanoma Metastatic melanoma Lei Xu The Extracellular Matrix: Not Just Pretty Fibrils! ECM GF GF Integrins GF receptors Focal Adhesion Talin, Paxillin, Vinculin, FAK, Src… Survival Migration Proliferation Morphology Why study the tumor matrisome? The ECM provides biophysical and biochemical cues promoting cell growth, invasion and metastasis. Goals: 1. What is the source of the tumor ECM (tumor or stroma?) 2. 3. What are the changes in the matrisome during tumor progression? Invasion? Angiogenic switch? Metastatic dissemination? Can ECM proteins serve as prognostic/diagnostic tool? The extracellular matrix Structural proteins (fibrils): Proteoglycans: -Glycosaminoglycans covalently bound to a core protein: (Aggrecan, Decorin, Perlecan, Versican…) - Hyaluronan -Collagens (30 genes) -Elastin Matrisome ~ 500-700 proteins Adhesive glycoproteins: ECM associated proteins: - Fibronectin -ECM modification enzymes (LOX, TG2…) - Laminins (12 genes) -Matrix Metalloproteinases -Thrombospondins (5 genes) -Growth Factors (VEGF, TGFb…) -Vitronectin, and many more… Why using a proteomics-based approach? Correlation of the protein expression with the mRNA data? The insolubility of the ECM proteins: an advantage! Enrichment of ECM protein from normal tissue Purification Steps Tissue: Mechanical Lysis C Chemical Lysis (High salt Buffer) N M CS Collagen 180kDa 180kDa ECM proteins: Membrane protein solubilization (DOC, NP-40) 120kDa 83kDa 8-fold enrichment VI (ECM) Laminin (ECM) Integrin b1 (PM) Tf Receptor (PM) 55kDa Cytoskeletal protein solubilization (SDS) Tubulin 49kDa 38kDa Insoluble fraction = ECM-enriched fraction Actin (Cytoskeleton) (Cytoskeleton) GAPDH (Cytosol) 16kDa Histones (Nucleus) The proteomics workflow Solubilize in 8M urea PNGase-F Deglycosylation 2M urea Lys-C, trypsin Digestion to peptides 2M urea Agilent 3100 Reversed Offgel Electrophoresis phase Separation by peptide pI Desalting Thermo LTQ-Orbitrap LC-MS/MS Spectrum Mill identification of peptides and proteins LC/MS/MS MS 8MS/MS Intensity Relative Abundance LC Retention time (min) m/z Quantitation Identification 1 cycle: 3sec. 1MS scan 8 most abundant peptides 2nd MS 8 MS/MS: peptide sequence Database search parameters Content of the lung matrisome - unfractionated Mass Spec Intensity Number of Peptides 0.08% 2.07% Number of Proteins 0.64% 3.55% 8.92% 0.30% 17.81% 15.38% 9.79% 29,59% (50) 0.33% 8.51% 0.66% 77.85% 4.05% 59,41% (1291) 19.53% 10.65% 2.39% 2.35% 4.83% Protein sub-cellular location classified from literature knowledge and GO annotation 7.10% 5.92% 8.28% The lung matrisome Basement membrane Collagens components Col4a1 Col1a1 Col4a2 Col1a2 Col4a3 Col3a1 Col4a4 Col5a1 Col5a2 Lama2 Col6a1 Lama3 Col6a2 Lama4 Col6a3 Lama5 Col12a1 Lamb2 Col29a1 Lamb3 Lamc1 Lamc2 Nidogen-1 Perlecan Others Proteoglycans Enzymes Agrin Decorin Dermatopontin Efemp-1 Elastin Emilin-1 Fibrinogen, alpha Fibrinogen, beta Fibrinogen, gamma Fibrillin-1 Fibronectin Fibulin-5 Mfap-4 Nephronectin Periostin Prolargin Tinagl1 Vitronectin von Willebrand Factor Biglycan Lumican Mimecan LOXL1 TG2 Growth Factors LTBP1 TGFbi Off Gel Electrophoresis: principle IPG gel strip Pi (A) pH gradient Pi (B) Capacity 50-100ug total peptide Separation into 12 fractions, pI 3-10 Each fraction is analyzed by LC/MS/MS On average ~ 1800 proteins identified (6 times more than without OGE) OGE fractionation: normal lung 5557 Unseparated sample Overlap of Distinct Peptides in Fractions 3000 1 frxn 9178 83% # distinct peptides 2500 2000 11 frxns 10 frxns 9 frxns 8 frxns 1500 7 frxns 6 frxns 1000 5 frxns 4 frxns 500 3 frxns 0 2 frxns 1 2 3 4 5 6 pI resolution 7 8 9_10 11 12 13 10 11 12 13 OGE fr # 10 9 pI median 8 7 6 5 4 3 0 1 2 3 4 5 6 7 OGE fr # 8 9 1 frxns The lung matrisome before/after OGE Mass Spec Intensity Number of Peptides Number of Proteins Before OGE 22.15% 40.59% 29,59% 50 prot. 59,41% 1291 peptides 77.85% 70.41% ~ 2x more ~ 2x more After OGE 9,74% 105 prot. 27,81% 26,94% 2328 peptides 72,19% 73,06% ECM 90,26% Non-ECM Coverage and Sensitivity Improvements from OGE Increased coverage 1 peptide before OGE Undetected before OGE Lung matrisome after OGE Basement membrane components Collagens Others Proteoglycans Enzymes Growth Factors Col4a1 Col1a1 Agrin Mfap-2 Asporin LOX LTBP1 Col4a2 Col1a2 ECM-1 Mfap-4 Biglycan LOXL1 LTBP2 Col4a3 Col3a1 Elastin Multimerin-1, -2 Decorin LOXL2 LTBP4 Col4a4 Col5a1, a2, a3, a6 Emilin-1 Nephronectin Dermatopontin LOXL3 TGFbi Lama1 Col6a1, a2, a3 Emilin-2 Papilin Lumican TGM2 Lama2 Col7a1 Fras-1 Periostin Mimecan ADAM10 Lama3 Col9a2 Frem-1 Prolargin ADAMTS9 Lama4 Col12a1 Frem-2 Prg2 ADAMTS17 Lama5 Col14a1 Fibrinogen, a SPARC ADAMTSL1 Lamb2 Col15a1 Fibrinogen, b Spondin-1 ADAMTSL4 Lamb3 Col16a1 Fibrinogen, g Thrombospondin-1 ADAMTSL5 Lamc1 Col18a1 Fibrillin-1 Thrombospondin-4 MMP9 Lamc2 Col23a1 Fibronectin Tinagl1 MMP19 Nidogen-1 Col24a1 Fibulin-1, -2 Tenascin-X TIMP3 Perlecan Col27a1 Fibulin-5 Vitronectin Pcolce Col28a1 Hemicentin-1, von Willebrand Factor Col29a1 Hemicentin-2 Additional ECM proteins detected after OGE Plod-1, -3 A very reproducible approach The comparison of 2 samples processed in parallel lead to a > 90% identity The difference between the matrisome of 2 different organs represents less than 10% of the proteins identified. Comparison of the lung and colon matrisome: Identification of proteins exclusively present in the lung and participating in the TGFb regulation axis LTBP2: binds TGFb family member Thrombospondin-1: the TSP1 knock-out mice get pneumonia that can be ameliorated by TGFb activation The complex domain structures of ECM proteins Fibronectin S S Fibrillin-1 LTBP-1 Thrombospondin-1 Hynes RO., Science (2009) Focusing on ECM proteins - Help from Bioinformatics? Knowledge-based Annotation We will miss unknown proteins Combination of the two approaches: OGE proteomics data Complete Matrisome 85 domains found in proteins involved in : cell adhesion, GF, enz. (InterPro IDs) Extraction of the proteins that contains at least one domains We will miss unknown proteins that have no domains! Annotations of the protein sub-cellular location The GO annotations for cellular compartment unsatisfactory Can be inconsistent for mouse vs. human Tgm2 Protein-glutamine gamma-glutamyltransferase 2 (human) mitochondrion|mitochondrion|plasma membrane|plasma membrane| Tgm2 Protein-glutamine gamma-glutamyltransferase 2 (mouse) proteinaceous extracellular matrix|cytosol|membrane| Lamb2 laminin, beta 2 (human) extracellular region|basal lamina|extracellular space|nucleus|cytoplasm|endoplasmic reticulum|laminin-11 complex| Lamb2 laminin, beta 2 (mouse) basement membrane|basement membrane| Fbn1 fibrillin 1 (human) microfibril|microfibril|extracellular region|basement membrane|extracellular space| Fbn1 fibrillin 1 (mouse) microfibril|extracellular region|proteinaceous extracellular matrix| EMILIN1 (human) extracellular region|proteinaceous extracellular matrix|extracellular space|nucleus|nucleolus|centrosome| Emilin1 (mouse) extracellular region|proteinaceous extracellular matrix|extracellular space| Characterization of the tumor matrisome 1. Understanding the origin of tumor ECM 2. Can we observe changes in the matrisome during the course of tumor progression? Invasion? Angiogenic switch? Metastatic dissemination How different is the ECM from when compared to the ECM of the primary tumor? 3. Can we correlate changes in the matrisome to the invasiveness of a tumor? Can ECM serve as a diagnostic / prognostic tool in clinics? Of mouse or man? SC Injection of A375 Human Melanoma Cells “NSG” mouse NOD/SCID/IL2R Tumor Collection --Tumor ECM preparation Proteomics pipeline Proteins secreted by the tumor cells: human sequence Proteins secreted by the stromal cells: mouse sequence VTN - Vitronectin is secreted by the stroma A375-1A Human Mouse Human Mouse Human Mouse Human Mouse Human Mouse 10 peptides detected 0 human only 8 mouse only 2 shared Emilin-1 is predominantly secreted by the tumor A375-1A 37 peptides detected 20 human only 9 mouse only 8 shared Human Mouse Human Mouse Human Mouse Human Mouse Human Mouse Human Mouse Human Mouse Human Mouse Human Mouse Human Mouse Human Mouse MS intensity (H/M) = 6.3 BGN - Biglycan is Predominantly Secreted by the Stroma A375-1A 11 peptides detected 2 mouse only 1 human only 8 shared Mouse Human Mouse Human Mouse Human Mouse Human MS intensity (M/H) = 460 Challenges in MS Quantitation of Tumor vs. Stroma Secretion 16 peptides detected 10 mouse only 5 human only 1 shared A375-1A Mouse Human Mouse Human Mouse Human Mouse Human Mouse Human Mouse Human Mouse Human Use all distinguishing peptides Skip - cleavage site alteration - unpaired peptides Use only similar pairs with same charge # peptides 10M, 5H 3M, 3H MS intensity (M/H) 1.6 0.7 Protein Grouping for Species in Xenograft Tumors protein group shared shared human mouse human mouse human subgroup 1 subgroup 2 Subgroup specific - ON 1. total MS1 intensity for only human 2. total MS1 intensity for only mouse Subgroup specific - OFF 1. total MS1 intensity for human + shared 2. total MS1 intensity for mouse + shared Of mouse or man? Tumor Tumor Both Only More Similar Indistinguishable Stroma More Stroma Only COL11A1 COL11A2 COL19A1 COL27A1 COL4A4 EMILIN2 LAMA3 LAMA5 LTBP1 LTBP1 LTBP3 MMP14 PCOLCE PLOD1 PLOD2 TIMP1 Bgn Col12a1 Col16a1 Col1a1 Col1a2 Col2a1 Col3a1 Col5a2 Col5a3 Fbn1 Fgb Fn1 Ltbp4 Lum Postn No shared peptides No shared peptides EMILIN1 HSPG2 LAMA4 LAMB1 LAMC1 LOXL2 PLOD3 TGFBI TINAGL1 Adamtsl1 Col4a1 Col4a2 Col4a5 Col6a1 Col6a2 Col6a3 Lamb2 Loxl3 Mfap2 Nid1 Prelp Tgm2 Tnc Tnxb COL13A1 Col13a1 COL18A1 Col18a1 COL22A1 Col22a1 ECM1 Ecm1 VWA1 Vwa1 Col4a3bp Efemp2 MFAP1 Sparc Thbs4 Timp3 Vcan Aspn Lama2 Col10a1 Lamb3 Col13a1 Loxl1 Col14a1 Ltbp2 Col15a1 Mfap4 Col23a1 Mfap5 Col28a1 Mmp19 Col4a3 Nid2 Col5a1 Ogn Col7a1 Thbs1 Col9a1 Tnn Col9a2 Vtn Col9a3 Vwa5a Dcn Vwf Dpt E330026B02Rik Efemp1 Eln Fbln1 Fbln2 Fbn2 Fga Fgfbp3 Fgg More: >5x Gm7455 Similar: -5 to 5x HMCN2 Of mouse or man? Fibrillar Basement Collagens Membrane (1,2,3,5,11) HMCN2 HSPG2 Lama2 LAMA3 LAMA4 LAMA5 LAMB1 Lamb2 Lamb3 LAMC1 Nid1 Nid2 Col4a1 Col4a2 Col4a3 Col4a3bp COL4A4 Col4a5 Col15a1 COL18A1 Col18a1 Col1a1 Col1a2 Col2a1 Col3a1 Col5a1 Col5a2 Col5a3 COL11A1 COL11A2 More: >5x Similar: -5 to 5x Matricellular Hemostatic ECM ECM ECM Proteo Modifying Growth Proteins Glycoproteins glycans Enzymes Factors Others Other Collagens Col6a1 Col6a2 Col6a3 Col7a1 Col8a1 Col9a1 Col9a2 Col9a3 Col10a1 Col12a1 COL13A1 Col13a1 Col14a1 Col16a1 COL19A1 COL22A1 Col22a1 Col23a1 COL27A1 Col28a1 E330026B02Rik Gm7455 Fga Fgb Fgg Vtn VWA1 Vwa1 Vwa5a Vwf Sparc Thbs1 Thbs4 Tnc Tnn Tnxb Tumor Only Tumor More Both Similar Aspn Bgn Dcn Dpt Lum Ogn Vcan Adamtsl1 Loxl1 LOXL2 Loxl3 MMP14 Mmp19 PCOLCE PLOD1 PLOD2 PLOD3 Tgm2 TIMP1 Timp3 Indistinguishable Fgfbp3 LTBP1 LTBP1 Ltbp2 LTBP3 Ltbp4 TGFBI Stroma More ECM1 Ecm1 Efemp1 Efemp2 Eln EMILIN1 EMILIN2 Fbln1 Fbln2 Fbn1 Fbn2 Fn1 MFAP1 Mfap2 Mfap4 Mfap5 Postn Prelp TINAGL1 Stroma Only Can we identify trends? Basement membrane produced by combination of tumor/stroma Predominantly produced by the tumor ECM modifying enzymes Laminins Growth Factors Predominantly produced by the stromal cells Proteoglycans Most Collagens Characterization of the tumor matrisome 1. Understanding the origin of tumor ECM 2. Can we observe changes in the matrisome during the course of tumor progression? Invasion? Angiogenic switch? Metastatic dissemination How different is the ECM from when compared to the ECM of the primary tumor? 3. Can we correlate changes in the matrisome to the invasiveness of a tumor? Can ECM serve as a diagnostic / prognostic tool in clinics? Ongoing work… Comparing metastatic and non-metastatic tumors: Limitation of the A375 xenograft model: Subcutaneous injection Metastatic only by tail vain injection What would be the control normal matrisome? Xu L. et al., PNAS (2006) Mouse model of mammary carcinoma: MMTV-PyMT 4 wks Premalignant mammary gland No tumor palpable 8 wks 15 wks Palpable tumor (1wk pp) Late Stage Metastatic tumor tumor A transgenic mouse strain that expresses the polyoma middle T oncogen (PyMT) under the mouse mammary tumor virus promoter (MMTV) in the mammary gland. Carcinomas develop in the mammary gland and mimics human disease stages. In parallel, mammary gland of age-matched WT FVB mice are collected. Guy CT. et al., MCB (1992) Lin EY. et al., (2003) The MMTV-PyMT mRNA signature Molecular expression profiling of tumors initiated by transgenic overexpression of polyoma middle T antigen (PyMT) targeted to the mouse mammary gland. Procollagen type I, 2 Procollagen type III, 1 Biglycan Fibrinogen-like Nidogen-1 LOXL MMP-2 Laminin B1 subunit 1 Collagen Procollagen type XI, 1 Syndecan-2 And: Procollagen type IV, 2, 3 and 6, procollagen type XV, lumican, nephronectin, MMP-14 How well do the array data reflect in protein-level changes? Desai KV. et al., PNAS (2002) Qiu TH. et al., Cancer Res (2004) Acknowledgment • Richard Hynes Lab • Proteomics Platform Alexandra Naba John Lamar Hui Liu • Bioinformatics Core Facility (KI) Charlie Whittaker TMEN (TUMOR MICROENVIRONMENT NETWORK NCI) Steve Carr Jake Jaffe U54-CA126515 Xenotransplant model of mammary carcinoma MDA-MB-231: Human mammary carcinoma cell line LM2: Highly metastatic derivative [Massagué Lab] Orthotopic injection in the mammary fat pad [John Lamar] Comparison of the primary tumor matrisome to the “normal” mammary gland matrisome Identification of the ECM proteins synthesized by the tumorassociated stroma and not by the normal stroma. Manipulation of gene expression: validation Characterization of the ECM changes at the angiogenic switch Model system: RIP-Tag mouse 9 wks 12 wks A transgenic mouse strain that expresses the simian virus 40 large T antigen (TAg) under the rat insulin II promoter (RIP) in the b-pancreatic islet cells. Carcinomas develop in the pancreatic islets and progress through characteristic stages. Human disease: Insulinoma (only in very rare case malignant) Folkman, J. et al. (1989) Nature 339 Characterization of the tumor matrisome 1. Understanding the origin of tumor ECM 2. Can we observe changes in the matrisome during the course of tumor progression? Invasion? Angiogenic switch? Metastatic dissemination How different is the ECM from when compared to the ECM of the primary tumor? 3. Can we correlate changes in the matrisome to the invasiveness of a tumor? Can ECM serve as a diagnostic / prognostic tool in clinics? Can we correlate changes in the matrisome to the invasiveness/aggressiveness of a tumor? Collaboration with MGH: Colon cancer sample +/- Liver metastasis Patient history Can ECM proteins serve as prognostic/diagnostic tool? A B R F Proteome Informatics Research Group iPRG: Informatic Evaluation of Phosphopeptide Identification and Phosphosite Localization ABRF 2010, Sacramento, CA March 22, 2010 A B R F A Challenging Problem Proteome Informatics Research Group 4/7 DSAIPVEsDtDDEGAPR 3/7 DSAIPVESDtDDEGAPR P(m/z) -H3PO4 879 14/21 said can identify peptide but can not localize site A B R F Proteome Informatics Research Group Solution • Not fun to do by hand! • Software available that evaluates ‘sitedetermining ions’ – Generate per residue localization scores – Examples: Ascore, PTM Score (MaxQuant), pFind, PhosphoScore, etc. A B R F Proteome Informatics Research Group Study Goals 1. Evaluate the consistency of reporting phosphopeptide identifications and phosphosite localization across laboratories 2. Characterize the underlying reasons why result sets differ 3. Produce a benchmark phosphopeptide dataset, spectral library and analysis resource A B R F Proteome Informatics Research Group Study Design • Use a common dataset • Use a common sequence database • Allow participants to use the bioinformatic tools and methods of their choosing • Use a common reporting template • Fix the identification confidence (1% FDR) • Require an indication of phosphosite ambiguity per spectrum • Ignore protein inference – for now A B R F Proteome Informatics Research Group Soliciting Participants and Logistics Study advertised on the ABRF website and listserv, Molecular and Cellular Proteomics blogsite, GenomeWeb and by direct invitation from iPRG members 1. Email participation request to ‘iPRGxxxx@gmail.com’ Participant 2. Send official study letter with instructions iPRG members Questions / Answers 3. All further communication (e.g., questions, submission) through ‘iPRGxxx.anonymous@gmail.com’ “Anonymizer” A B R F Proteome Informatics Research Group Study Materials and Instructions to Participants • 1 Orbitrap XL dataset (3 files) – RAW, mzML, mzXML, MGF, pkl or dta – conversions by ProteoWizard • 1 FASTA file (SwissProt human seq’s. v57.1) • 1 template (Excel) • 1 on-line survey (Survey Monkey) 1. Analyze the dataset 2. Report the phosphopeptide spectrum matches in the provided template 3. Complete an on-line survey 4. Attach a 1-2 page description of your methodology A B R F Proteome Informatics Research Group Reporting Template ABRF iPRG 2010 Study Template: Phosphorylated Peptide Analysis Instructions: Please fill in all REQUIRED fields. After deleting the example rows, create a new row for each phosphopeptide spectrum match. Multiple rows MAY be used to report ambiguous phosphosite localizations. Phosphorylated residues MUST be indicated in the 'Peptide Sequence' field, and results should be sorted by 'Peptide Identification Score' from most to least confident. Additional instructions can be found above each field header. Results should be emailed to 'anonymous.iPRG2010@gmail.com' no later than Jan. 10, 2010. Please make sure to fill out the REQUIRED survey ---------------------> REQUIRED FIELDS Identifiers should be unique scan numbers from data file but may also refer to a Name of data file merged range of (e.g., MS/MS scans (e.g., D20090930_PM_ Scan:19, K562_SCX2316.19.19.3.dta, IMAC_fxn03) 2316.19.19.3.pkl). Precursor m/z as submited to search engine Use lowercase s, t or y (e.g. SLsGSsPCPK) OR a trailing symbol (e.g. SLS#GS#PCPK) OR a string in parentheses (e.g. SLS(ph)GS(ph)PCPK) immediately following each phosphorylated Precursor residue. Only phosphorylation of S, T and Y will charge be compared; all other modifications (e.g., reported by oxidized M) will be ignored. It will be assumed search that all modifications indicated on S, T or Y are engine phosphorylations. Protein identifier(s) from Fasta file. Use Total number multiple values if of peptide is found in phosphorylati multiple proteins, ons as e.g., Q9NZ18; evidenced by Q9UQ35. Protein the precursor inference will not m/z and MS2 be scored. spectrum. 'Y' indicates this match is BETTER than the confidence threshold. 'N' indicates the match is WORSE. Please report BOTH types of identifications in your ranked list. Is this match above 1% FDR identification threshold (Y|N)? Num. Peptide Identification Phospho sites Certainty 1Y Indicate 'Y' if ALL phosphorylations have been confidently localized. 'N' if one or more have not. Are ALL phosphosites unambiguously localized (Y|N)? Phosphosite Localization Certainty Y Peptide identification score reported by search engine (e.g., E-value, pvalue, probability, Mascot score, etc.) Peptide Identification Score 0.0002097 Precursor Precursor File Spectrum Identifier m/z Charge Peptide Sequence D20090930_PM_K562_SCX-IMAC_fxn03 Scan:908 558.7576 2 qGsPVAAGAPAK Accession(s) Q9NZI8 D20090930_PM_K562_SCX-IMAC_fxn04 Scan:2017 710.82233 2 TsPDPSPVSAAPSK Q13469 1Y N 45.41 D20090930_PM_K562_SCX-IMAC_fxn03 Scan:683 692.28891 2 _APQTS(ph)S(ph)SPPPVR_ Q8IYB3 2Y N 30.09 D20090930_PM_K562_SCX-IMAC_fxn03 Scan:4832 775.3548 2 SQtPPGVAtPPIPK Q15648 2Y N 31.79 D20090930_PM_K562_SCX-IMAC_fxn03 Scan:641 D20090930_PM_K562_SCX-IMAC_fxn03 Scan:641 590.2127 590.2127 2 SLsGSsPcPK 2 sLSGSsPcPK Q9UQ35 Q9UQ35 2Y 2Y N N 0.0112023 0.0915611 A B R F • 59 requests / 32 submissions (54% return) 2 retractions + 7 iPRG members and 1 guest Proteome Informatics Research Group Resource Lab Status Conduct both core functions and noncore lab research 3% Membership (n=33) 39% 43% Core only 15% 45% 55% Non-core research lab ABRF Member Non-member Primary Job Function 3% Type of Lab 6% 6% Bioinformatician/Developer 18% 6% Academic 9% Director/Manager 12% Biotech/Pharma/Industry 58% 9% Mass Spectrometrist Contract Research Org 73% Lab Scientist Other Government Other Proteomics Experience Location 20 6% Asia 15% Australia/New Zealand 15 9% 70% Europe North Amercia 10 5 0 1-2 years 3-4 years 5-10 years >10 years Unanswered S Sp ec t ru m M ill op ho ss i or e na to r Sc ph o ph i Pr Ph os Ph os or e Sc NN M yr iM at ch PL Qu an t e P h M ax et Q M SP uan e t Op pSe a en M rch S Pr ot /TO ei nP PP ro ph et Pv Sp ie ec tru w m M i th ll eg pm ro p iP TP co Ta t nd em OM S SE S A QU E M yr ST iM at c i n- h Pe pt hou id eP se ro ph e In t sP ec Sc T af f Pe old pA RM Pe L pt ize r pF Sp ind ec tra ST X! Proteome Informatics Research Group M ax ou s m co re st o In -h cu As M as A B R F Software Tools Used Peptide Identification 16 14 12 10 8 6 4 2 0 6 Phosphosite Localization 5 4 3 2 1 0 A B R F Proteome Informatics Research Group Sample: Protocol: Lysis: SCX: IMAC: MS/MS: The SCX/IMAC Enrichment Approach for Phosphoproteomics 7.5x10e7 human K562 human chronic myelogenous leukemia cells, 4mg lysate Villen, J, and Gygi, SP, Nat Prot, 2208, 3, 1630-1638. 8M urea, 75mM NaCl, 50 mM Tris pH 8.2, phosphatase inhibitors PolyLC - Polysulfoethyl A 9.4 mm X 200mm, elute: 0-105mM KCl , 30% Acn . Sigma - PhosSelect Fe IMAC beads, bind: 40% Acn, 0.1% formic acid, elute: 500 mM K2HPO4 pH 7 Thermo Fisher Orbitrap XL, high-res MS1 scans in the Orbitrap (60k), Top-8 fragmented in LTQ, exclude +1 and precursors w/ unassigned charges, 20s exclusion time, precursor mass error +/- 10 ppm A B R F Preliminary Analysis of SCX Fractions and Dataset Selection Proteome Informatics Research Group 1600 3500 Fraction overlap # distinct peptides Precursor z 3000 # spectra 2500 2000 z4 z3 1500 z2 1000 1200 6 frxns 5 frxns 4 frxns 800 3 frxns 2 frxns 1 frxns 400 500 0 0 2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 SCX fr # 7 8 9 10 11 12 SCX fr # 100% 80% 6SC 5SC 60% Solution charge 4SC 3SC 40% 2SC 1SC 20% 0SC % distinct peptides 100% % distinct peptides 6 80% 60% # phosphosites 3P 2P 40% 1P 20% -1SC 0% 0% 2 3 4 5 6 7 8 SCX fr # 9 10 11 12 2 3 4 5 6 7 8 9 10 11 12 SCX fr# 54 A B R F Preliminary Analysis of SCX Fractions and Dataset Selection Proteome Informatics Research Group 3500 Precursor z 3000 # spectra 2500 2000 Frxn 3: multi-phosphosites Frxn 4: single phospho, single basic Frxn 12: multi-basic residues (RHK) z4 z3 1500 z2 1000 500 0 2 3 4 5 6 7 8 9 10 11 12 SCX fr # 100% 80% 6SC 5SC 60% Solution charge 4SC 3SC 40% 2SC 1SC 20% 0SC % distinct peptides % distinct peptides 100% 80% 60% # phosphosites 3P 2P 40% 1P 20% -1SC 0% 0% 2 3 4 5 6 7 8 SCX fr # 9 10 11 12 2 3 4 5 6 7 8 9 10 11 12 SCX fr# 55 A B R F Proteome Informatics Research Group Notes on Analysis • Identification and localization were analyzed separately • All non-phosphopeptide IDs removed • Reported confidence indicators were used as filters – Id = Is peptide spectrum match above 1% FDR? (Y|N) – Loc = Are ALL phosphosites unambiguously assigned? (Y|N) • Mods indicated on S|T|Y are assumed phos, others ignored – Unique peptide comparison SLsm(ox)DSQVPVYSPSIDLK SLSm(ox)DSQVPVYSPsIDLK SLSMDSQVPVYSPSIDLK – Phosphopeptide comparison SLsm(ox)DSQVPVYSPSIDLK SLSm(ox)DSQVPVYSPsIDLK SLsMDSQVPVYSPSIDLK SLSMDSQVPVYSPsIDLK A B R F From 30,000 Ft. Proteome Informatics Research Group 8000 # spectra Id Yes # spectra Loc Yes # unique Peptides UC ID Yes 7000 6000 5000 3,571 4000 3000 2,084 2000 1,623 1000 y n y n y n n y n y n y n y y n n n y n n n n n n n n y n n n n y n n n n n n n n n n n n n n n n n y n y y y y y y n y n n n y y n Ma Se Sc Ih As y n y n y n y y n n y n n y y n y n y n n n n n n n n n n y n y n n n n n y y n y y y y y y y n n n Pf Se Mu* Om Ma Sm Mu Sm Mu Mu Xt Ma Mu Ma Ma In Ma Mu Se Mu Se Mu Ma Xt Ih As As As Mq As Ph y Mq Sm Ih Sm Ih Pl 77115 15769 n y 66514 74637 n y 77114 18621 n 61963v y 20814 n 97219i n n 63103 n n 65211 y n 71263 47587 Pf 91943i n 66398 n 56365 n n n 29850v y y 50308i y 20109 n n n 40816i y 20441v n n 13867 n 85246 n y n y y n 870484i y y 45682 y y 870486i y 92536i y 53706 84940v y 20899i 13800 Pre masses adjusted n n y N term acetyl y Used > 1 search engine y y n Pre mass filtering n y y Solution charge filtering n n y Localization software y y y Peptide ID Software Mu* Mu*Se Localization software Ih Ih As 86010 22730 87133 14941 0 y Ps Pl Nn Mu Pv Ih Ma=Mascot, Ms=MsInspect, Mu=Multiple, Mu*=Multiple + Spec Lib, Om=OMSSA, Pf=pFind, Pv=Pview, Sc=Scaffold, Se=SEQUEST, Sm=Spectrum Mill, Xt=X!Tandem As=Ascore, Ih=In-house, In=InsPecT, Mq=MaxQuant, Nn=NNScore, Ph=Phosphinator, Pl=Phosphate Localization Score, Ps=PhosphoScore, Sm=Spectrum Mill A B R F Proteome Informatics Research Group Quotes From Participants (grading?) • “It is hard to see how the results of this analysis will be assessed. How will the accuracy of individual methods be determined? It is always easy to find more matches but less easy to determine whether they are credible.” • “The most challenging part of this study is phosphate localization. But, the data in this study is not a gold standard. So, it's hard to judge which method works. .” • “Well-designed study. I am looking forward to the results, and I am honestly wondering how the study designer comes up with a "model answer" (or maybe there won't be one?).” • “How sure are the authors of phosphosite localisation? Have synthetic peptides been made to validate many of the peptides?” A B R F Relative Performance: Identification By Fraction Proteome Informatics Research Group 4000 # spectra Id Yes Frxn 3 # spectra Id Yes Frxn 4 # spectra Id Yes Frxn 12 # spectra Id Yes 3500 3000 Performance was not equivalent across the 3 fractions for all participants. 2500 2000 1500 1000 500 77115 15769 77114 66514 18621 74637 20814 61963v 65211 63103 97219i 47587 71263 66398 91943i 50308i 29850v 56365 40816i 20109 13867 20441v 45682 870484i 85246 92536i 870486i 20899i 53706 86010 13800 84940v 87133 22730 14941 0 4000 # unique peptides UC Id Yes Frxn 4 3000 # unique peptides UC Id Yes Frxn 12 2500 2000 1500 1000 500 77115 15769 77114 66514 18621 74637 97219i 20814 61963v 65211 63103 91943i 47587 71263 56365 66398 20109 50308i 29850v 20441v 40816i 870484i 85246 13867 870486i 45682 20899i 53706 92536i 13800 84940v 87133 22730 86010 0 14941 # unique peptides UC Id Yes # unique peptides UC Id Yes Frxn 3 3500 Some participants saw more unique peptides than others. A B R F Proteome Informatics Research Group How Much Did Phosphopeptide Identifications (Spectra and Unique Peptides) Vary? Fr 3 Fr 4 Fr 12 3000 2000 Unique Phosphopeptide CVs Fr3=74% Fr4=31% Fr12=84%* 1000 # Unique Peptides (Fr 12) # Spectra ID=Y (Fr 12) # Unique Peptides (Fr 4) # Spectra ID=Y (Fr 4) # Unique Peptides (Fr 3) 0 # Spectra ID=Y (Fr 3) Num. of Spectra or Unique Peptides 4000 A B R F Proteome Informatics Research Group One Participant Wonders Frxn 3 – most multiple phos per peptide Frxn 4 – most phosphopeptides 1800 4000 # spectra Id Yes Shared 1500 3500 # spectra Id Yes Unique to Participant # spectra Id Yes Shared # spectra Id Yes Unique to Participant 3000 # spectra # spectra 1200 900 600 2500 2000 1500 1000 300 500 14941 87133 22730 86010 13800 84940v 20899i 53706 92536i 870486 i 45682 870484 i 85246 13867 20441v 40816i 20109 50308i 29850v 56365 66398 91943i 47587 71263 65211 63103 97219i 20814 61963v 18621 74637 15769 77114 66514 77115 Frxn 12 – highest precursor charges 2100 1800 # spectra Id Yes Shared # spectra Id Yes Unique to Participant Gray means – Number of spectra where < 2 people agreed on the Id 85246: 1205 spectra with 3-15 phosphosites, 624 spectra with 4-15 1500 1200 900 20814: ?, Frxn 12 >> Frxn 3,4 600 300 0 14941 87133 22730 86010 13800 84940v 20899i 53706 92536i 870486 i 45682 870484 i 85246 13867 20441v 40816i 20109 50308i 29850v 56365 66398 91943i 47587 71263 65211 63103 97219i 20814 61963v 18621 74637 15769 77114 66514 77115 # spectra 14941 87133 22730 86010 13800 84940v 20899i 53706 92536i 870486 i 45682 870484 i 85246 13867 20441v 40816i 20109 50308i 29850v 56365 66398 91943i 47587 71263 65211 63103 97219i 20814 61963v 18621 74637 15769 77114 66514 77115 0 0 77114, 77115: merged multiple scans, so can’t be compared with other 33 A B R F On Average, How Similar Are the Sets of Confidently Identified Unique Peptides? Proteome Informatics Research Group Pairwise Comparisons of Unique Peptides - All Fractions 120 Other 100 overlap Median % 80 60 57% ± 14 40 20 n=35 0 14941 87133 22730 86010 13800 84940v 20899i 53706 92536i 870486i 45682 870484i 85246 13867 20441v 40816i 20109 50308i 29850v 56365 66398 91943i 47587 71263 65211 63103 97219i 20814 61963v 18621 74637 15769 77114 66514 77115 Participant overlap unique to participant unique to other sorted by # spectra Id=Y A B R F Unique Peptide %Overlap 14941 14941 87133 22730 86010 13800 84940v 20899i 53706 92536i 870486i 45682 870484i 85246 13867 20441v 40816i 20109 50308i 29850v 56365 66398 91943i 47587 71263 65211 63103 97219i 20814 61963v 18621 74637 15769 77114 66514 77115 82.5 73.5 79.8 75.6 78.4 75.9 73.2 69.3 72.5 74.6 69.2 15.8 71.5 61.7 57.2 55.1 65.4 52.4 59.2 70.2 45.5 44.3 47.2 39.4 60.6 37.6 53.9 43.7 38.3 44.8 25.5 22.8 17.4 11.4 87133 82.5 75.2 77.4 74.6 80.2 76.7 76 69.5 71.3 81.3 73.4 15.8 79 61.1 64 60.6 71.7 58.7 65.5 70.6 47 49.4 50.3 43.7 67.6 42 60.1 48 42.4 49.6 28.4 25.6 19.2 12.8 22730 73.5 75.2 70.3 68.5 79.4 66.6 71.4 65 68 69.5 70 14.2 68.9 54.9 57.9 51 63.2 52.6 56.5 64.7 40.8 43.8 42.9 39.3 60.7 35.9 53.9 41.6 38.7 44.9 25 22.7 17.2 11.4 86010 79.8 77.4 70.3 85.4 73.2 73.6 69.3 70 70 70.4 66.8 14.7 68.7 59.2 56 52.7 61.7 50.9 57.3 69.6 44.4 42.8 46 37.8 58.1 36 52.7 42 37.6 43.9 24.7 22.3 16.4 11.1 13800 84940v 20899i 75.6 78.4 75.9 74.6 80.2 76.7 68.5 79.4 66.6 85.4 73.2 73.6 71.3 71.7 71.3 70.7 71.7 70.7 67.6 75.1 69.5 69 67.3 67.9 69.4 67.5 70.8 69.7 73.6 73.2 66.3 69.5 68.9 14.3 14.7 14.6 69.2 74.2 72.9 59.1 58.2 63.5 57.2 61.1 62.2 52.3 53.8 57.6 63.1 67 70.3 52.3 56.2 58.4 57.5 61.3 64.4 69.4 67.9 72.2 45.7 44.1 50.7 43.5 47.9 48.5 46.6 46.5 52.5 39.3 42.1 43.2 60.1 65 65.9 37.3 39.6 41 54.1 58.3 59.5 42.8 45 47.8 38.8 40.9 42.4 45.4 47.4 48.4 25.5 27.3 29.1 23.5 24.7 26.2 16.8 18.8 18.4 11.8 12.4 13 53706 92536i 870486i 73.2 69.3 72.5 76 69.5 71.3 71.4 65 68 69.3 70 70 67.6 69 69.4 75.1 67.3 67.5 69.5 67.9 70.8 64.6 66 64.6 67.1 66 67.1 71.2 63.5 66.4 68.1 64.2 90.9 15 12.7 13.7 73.8 67.7 68.4 55.5 55.9 60 57.7 57 59.2 53 50.3 52.8 66.7 61.4 66.9 54.2 53.1 54.6 58.9 57.7 59.1 65.2 81.7 69 43.4 45.8 47.7 44.5 45.2 45.7 45.6 44.9 49.5 41 39.2 41.4 62.5 59.5 62.3 37.7 36.5 38.1 56.3 55.4 55.7 42.8 42.9 45.5 39.7 38.4 41 47.8 46.3 48.4 26.5 25.9 27.7 24.3 23.4 25.1 17.4 17.8 17.7 12.2 11.8 12.7 45682 870484i 74.6 69.2 81.3 73.4 69.5 70 70.4 66.8 69.7 66.3 73.6 69.5 73.2 68.9 71.2 68.1 63.5 64.2 66.4 90.9 68.7 68.7 16 13.9 73.8 71 60.2 58.1 61.8 60.6 56.5 54 69.4 69.7 57.3 56.8 63.1 61.5 65 66.1 46.6 46.7 48.1 47.4 49.9 50.1 43.1 43.5 65.1 64.6 41.7 39.7 58.7 57.7 48.3 47.1 42 42.7 49.5 50.7 28.6 28.9 26.2 26.3 18.8 18.8 13.3 13.3 85246 15.8 15.8 14.2 14.7 14.3 14.7 14.6 15 12.7 13.7 16 13.9 14.2 12.4 11.4 14 13.3 10.9 12.4 12.9 9.2 8.4 10.6 9.2 12.1 9 11.4 8.8 9.3 9.3 5.9 5.8 4 3.9 13867 20441v 40816i 71.5 61.7 57.2 79 61.1 64 68.9 54.9 57.9 68.7 59.2 56 69.2 59.1 57.2 74.2 58.2 61.1 72.9 63.5 62.2 73.8 55.5 57.7 67.7 55.9 57 68.4 60 59.2 73.8 60.2 61.8 71 58.1 60.6 14.2 12.4 11.4 62.1 64.1 62.1 58.1 64.1 58.1 60.5 55.5 55.5 73.5 63.6 70.2 61.3 59 74 67.3 58.1 64.7 69.9 59.9 61.7 49.8 53.7 57.4 52 47.9 57.1 52.7 52.5 56.5 46.4 45.8 49.7 71.3 62.6 71.9 43.9 44.2 46.4 64 58.5 67.7 50.7 57.2 58 44.5 44.3 55.1 53.2 45 51.1 30.6 31.5 36.3 28.5 30 35 20 18.5 19.6 14.4 15.5 17.8 20109 50308i 29850v 55.1 65.4 52.4 60.6 71.7 58.7 51 63.2 52.6 52.7 61.7 50.9 52.3 63.1 52.3 53.8 67 56.2 57.6 70.3 58.4 53 66.7 54.2 50.3 61.4 53.1 52.8 66.9 54.6 56.5 69.4 57.3 54 69.7 56.8 14 13.3 10.9 60.5 73.5 61.3 55.5 63.6 59 55.5 70.2 74 60.3 54 60.3 70.2 54 70.2 59.4 68.8 64.6 53.6 67.1 58 45.5 59.7 67.1 45 53.9 62.2 48.6 58.7 59.8 45.5 52.6 53.4 56.6 75.9 70.4 38.6 49.2 50.2 56.7 70.3 68.2 46.1 55.7 63.5 41.3 50.7 58.7 44 53.6 51.5 29.1 36.1 41.1 27.5 33.7 39.6 19.1 20.6 20.6 19.2 16.9 19.7 56365 59.2 65.5 56.5 57.3 57.5 61.3 64.4 58.9 57.7 59.1 63.1 61.5 12.4 67.3 58.1 64.7 59.4 68.8 64.6 61.6 56.7 56.8 68.6 51.4 68.8 47.6 66.2 55.5 51 56.8 37.3 35.7 21.5 18.1 66398 91943i 70.2 45.5 70.6 47 64.7 40.8 69.6 44.4 69.4 45.7 67.9 44.1 72.2 50.7 65.2 43.4 81.7 45.8 69 47.7 65 46.6 66.1 46.7 12.9 9.2 69.9 49.8 59.9 53.7 61.7 57.4 53.6 45.5 67.1 59.7 58 67.1 61.6 56.7 50.8 50.8 48.7 56.5 49.9 61 43.5 50.6 65 58.1 40.2 46.2 59.4 57.3 47.4 60.6 41.5 51.8 49.4 45.6 28.7 42 26.3 45.5 18.8 19 13.3 22.6 47587 44.3 49.4 43.8 42.8 43.5 47.9 48.5 44.5 45.2 45.7 48.1 47.4 8.4 52 47.9 57.1 45 53.9 62.2 56.8 48.7 56.5 52.5 48.5 56.1 42.6 54.6 55.4 44.9 45.5 44.7 36.6 21.7 21.2 71263 47.2 50.3 42.9 46 46.6 46.5 52.5 45.6 44.9 49.5 49.9 50.1 10.6 52.7 52.5 56.5 48.6 58.7 59.8 68.6 49.9 61 52.5 49.1 57.9 44.8 59.2 56.1 53.1 52.6 41.2 42.4 20 21.7 65211 39.4 43.7 39.3 37.8 39.3 42.1 43.2 41 39.2 41.4 43.1 43.5 9.2 46.4 45.8 49.7 45.5 52.6 53.4 51.4 43.5 50.6 48.5 49.1 52.3 49.3 51.4 50.3 43.7 42.2 39.3 39.3 20.7 22.4 63103 97219i 60.6 37.6 67.6 42 60.7 35.9 58.1 36 60.1 37.3 65 39.6 65.9 41 62.5 37.7 59.5 36.5 62.3 38.1 65.1 41.7 64.6 39.7 12.1 9 71.3 43.9 62.6 44.2 71.9 46.4 56.6 38.6 75.9 49.2 70.4 50.2 68.8 47.6 65 40.2 58.1 46.2 56.1 42.6 57.9 44.8 52.3 49.3 49.2 49.2 67.5 47.5 57.3 49.7 51.7 37.7 54.1 38.5 36.5 31.3 35.5 33.6 21.4 18.6 17.7 17.3 20814 61963v 53.9 43.7 60.1 48 53.9 41.6 52.7 42 54.1 42.8 58.3 45 59.5 47.8 56.3 42.8 55.4 42.9 55.7 45.5 58.7 48.3 57.7 47.1 11.4 8.8 64 50.7 58.5 57.2 67.7 58 56.7 46.1 70.3 55.7 68.2 63.5 66.2 55.5 59.4 47.4 57.3 60.6 54.6 55.4 59.2 56.1 51.4 50.3 67.5 57.3 47.5 49.7 56.5 56.5 53.6 48 54.8 44.6 37.2 40.7 37.5 40.9 20.9 20.1 19.3 19.7 18621 38.3 42.4 38.7 37.6 38.8 40.9 42.4 39.7 38.4 41 42 42.7 9.3 44.5 44.3 55.1 41.3 50.7 58.7 51 41.5 51.8 44.9 53.1 43.7 51.7 37.7 53.6 48 43.3 36.8 42.5 18.2 21.7 74637 44.8 49.6 44.9 43.9 45.4 47.4 48.4 47.8 46.3 48.4 49.5 50.7 9.3 53.2 45 51.1 44 53.6 51.5 56.8 49.4 45.6 45.5 52.6 42.2 54.1 38.5 54.8 44.6 43.3 33.4 32 21 17.7 15769 25.5 28.4 25 24.7 25.5 27.3 29.1 26.5 25.9 27.7 28.6 28.9 5.9 30.6 31.5 36.3 29.1 36.1 41.1 37.3 28.7 42 44.7 41.2 39.3 36.5 31.3 37.2 40.7 36.8 33.4 41.4 17.1 24 77114 22.8 25.6 22.7 22.3 23.5 24.7 26.2 24.3 23.4 25.1 26.2 26.3 5.8 28.5 30 35 27.5 33.7 39.6 35.7 26.3 45.5 36.6 42.4 39.3 35.5 33.6 37.5 40.9 42.5 32 41.4 16.5 50 66514 17.4 19.2 17.2 16.4 16.8 18.8 18.4 17.4 17.8 17.7 18.8 18.8 4 20 18.5 19.6 19.1 20.6 20.6 21.5 18.8 19 21.7 20 20.7 21.4 18.6 20.9 20.1 18.2 21 17.1 16.5 77115 11.4 12.8 11.4 11.1 11.8 12.4 13 12.2 11.8 12.7 13.3 13.3 3.9 14.4 15.5 17.8 19.2 16.9 19.7 18.1 13.3 22.6 21.2 21.7 22.4 17.7 17.3 19.3 19.7 21.7 17.7 24 50 10.1 10.1 Descending # of total spectra Id=Y n=35 13867 50308i 20899i 73.5 72.9 73.5 70.3 72.9 70.3 79 71.7 76.7 73.8 69.4 73.2 69.9 67.1 72.2 71.3 75.9 65.9 74.2 67 70.7 71 69.7 68.9 68.4 66.9 70.8 67.3 68.8 64.4 71.5 65.4 75.9 62.1 63.6 63.5 64.1 70.2 62.2 69.2 63.1 71.7 73.8 66.7 69.5 68.7 61.7 73.6 61.3 70.2 58.4 67.7 61.4 67.9 64 70.3 59.5 68.9 63.2 66.6 60.5 60.3 57.6 52.7 58.7 52.5 52 53.9 48.5 50.7 55.7 47.8 49.8 59.7 50.7 53.2 53.6 48.4 46.4 52.6 43.2 44.5 50.7 42.4 43.9 49.2 41 30.6 36.1 29.1 28.5 33.7 26.2 20 20.6 18.4 14.4 16.9 13 14.2 13.3 14.6 87133 79 71.7 76.7 81.3 70.6 67.6 80.2 73.4 71.3 65.5 82.5 61.1 64 74.6 76 77.4 58.7 69.5 60.1 75.2 60.6 50.3 49.4 48 47 49.6 43.7 42.4 42 28.4 25.6 19.2 12.8 15.8 45682 73.8 69.4 73.2 81.3 65 65.1 73.6 68.7 66.4 63.1 74.6 60.2 61.8 69.7 71.2 70.4 57.3 63.5 58.7 69.5 56.5 49.9 48.1 48.3 46.6 49.5 43.1 42 41.7 28.6 26.2 18.8 13.3 16 66398 69.9 67.1 72.2 70.6 65 65 67.9 66.1 69 61.6 70.2 59.9 61.7 69.4 65.2 69.6 58 81.7 59.4 64.7 53.6 49.9 48.7 47.4 50.8 49.4 43.5 41.5 40.2 28.7 26.3 18.8 13.3 12.9 63103 84940v 870484i 870486i 71.3 74.2 71 68.4 75.9 67 69.7 66.9 65.9 70.7 68.9 70.8 67.6 80.2 73.4 71.3 65.1 73.6 68.7 66.4 65 67.9 66.1 69 65 64.6 62.3 65 69.5 67.5 64.6 69.5 90.9 62.3 67.5 90.9 68.8 61.3 61.5 59.1 60.6 78.4 69.2 72.5 62.6 58.2 58.1 60 71.9 61.1 60.6 59.2 60.1 71.3 66.3 69.4 62.5 75.1 68.1 66 58.1 73.2 66.8 70 70.4 56.2 56.8 54.6 59.5 67.3 64.2 67.1 67.5 58.3 57.7 55.7 60.7 79.4 70 68 56.6 53.8 54 52.8 57.9 46.5 50.1 49.5 56.1 47.9 47.4 45.7 57.3 45 47.1 45.5 58.1 44.1 46.7 47.7 54.1 47.4 50.7 48.4 52.3 42.1 43.5 41.4 51.7 40.9 42.7 41 49.2 39.6 39.7 38.1 36.5 27.3 28.9 27.7 35.5 24.7 26.3 25.1 21.4 18.8 18.8 17.7 17.7 12.4 13.3 12.7 12.1 14.7 13.9 13.7 56365 67.3 68.8 64.4 65.5 63.1 61.6 68.8 61.3 61.5 59.1 59.2 58.1 64.7 57.5 58.9 57.3 64.6 57.7 66.2 56.5 59.4 68.6 56.8 55.5 56.7 56.8 51.4 51 47.6 37.3 35.7 21.5 18.1 12.4 14941 20441v 40816i 71.5 62.1 64.1 65.4 63.6 70.2 75.9 63.5 62.2 82.5 61.1 64 74.6 60.2 61.8 70.2 59.9 61.7 60.6 62.6 71.9 78.4 58.2 61.1 69.2 58.1 60.6 72.5 60 59.2 59.2 58.1 64.7 61.7 57.2 61.7 58.1 57.2 58.1 75.6 59.1 57.2 73.2 55.5 57.7 79.8 59.2 56 52.4 59 74 69.3 55.9 57 53.9 58.5 67.7 73.5 54.9 57.9 55.1 55.5 55.5 47.2 52.5 56.5 44.3 47.9 57.1 43.7 57.2 58 45.5 53.7 57.4 44.8 45 51.1 39.4 45.8 49.7 38.3 44.3 55.1 37.6 44.2 46.4 25.5 31.5 36.3 22.8 30 35 17.4 18.5 19.6 11.4 15.5 17.8 15.8 12.4 11.4 13800 69.2 63.1 71.7 74.6 69.7 69.4 60.1 71.3 66.3 69.4 57.5 75.6 59.1 57.2 67.6 85.4 52.3 69 54.1 68.5 52.3 46.6 43.5 42.8 45.7 45.4 39.3 38.8 37.3 25.5 23.5 16.8 11.8 14.3 53706 73.8 66.7 69.5 76 71.2 65.2 62.5 75.1 68.1 66 58.9 73.2 55.5 57.7 67.6 69.3 54.2 64.6 56.3 71.4 53 45.6 44.5 42.8 43.4 47.8 41 39.7 37.7 26.5 24.3 17.4 12.2 15 86010 29850v 92536i 68.7 61.3 67.7 61.7 70.2 61.4 73.6 58.4 67.9 77.4 58.7 69.5 70.4 57.3 63.5 69.6 58 81.7 58.1 70.4 59.5 73.2 56.2 67.3 66.8 56.8 64.2 70 54.6 67.1 57.3 64.6 57.7 79.8 52.4 69.3 59.2 59 55.9 56 74 57 85.4 52.3 69 69.3 54.2 64.6 50.9 70 50.9 53.1 70 53.1 52.7 68.2 55.4 70.3 52.6 65 52.7 54 50.3 46 59.8 44.9 42.8 62.2 45.2 42 63.5 42.9 44.4 67.1 45.8 43.9 51.5 46.3 37.8 53.4 39.2 37.6 58.7 38.4 36 50.2 36.5 24.7 41.1 25.9 22.3 39.6 23.4 16.4 20.6 17.8 11.1 19.7 11.8 14.7 10.9 12.7 20814 64 70.3 59.5 60.1 58.7 59.4 67.5 58.3 57.7 55.7 66.2 53.9 58.5 67.7 54.1 56.3 52.7 68.2 55.4 53.9 56.7 59.2 54.6 56.5 57.3 54.8 51.4 53.6 47.5 37.2 37.5 20.9 19.3 11.4 22730 68.9 63.2 66.6 75.2 69.5 64.7 60.7 79.4 70 68 56.5 73.5 54.9 57.9 68.5 71.4 70.3 52.6 65 53.9 51 42.9 43.8 41.6 40.8 44.9 39.3 38.7 35.9 25 22.7 17.2 11.4 14.2 20109 60.5 60.3 57.6 60.6 56.5 53.6 56.6 53.8 54 52.8 59.4 55.1 55.5 55.5 52.3 53 52.7 54 50.3 56.7 51 48.6 45 46.1 45.5 44 45.5 41.3 38.6 29.1 27.5 19.1 19.2 14 71263 52.7 58.7 52.5 50.3 49.9 49.9 57.9 46.5 50.1 49.5 68.6 47.2 52.5 56.5 46.6 45.6 46 59.8 44.9 59.2 42.9 48.6 52.5 56.1 61 52.6 49.1 53.1 44.8 41.2 42.4 20 21.7 10.6 47587 61963v 91943i 52 50.7 49.8 53.9 55.7 59.7 48.5 47.8 50.7 49.4 48 47 48.1 48.3 46.6 48.7 47.4 50.8 56.1 57.3 58.1 47.9 45 44.1 47.4 47.1 46.7 45.7 45.5 47.7 56.8 55.5 56.7 44.3 43.7 45.5 47.9 57.2 53.7 57.1 58 57.4 43.5 42.8 45.7 44.5 42.8 43.4 42.8 42 44.4 62.2 63.5 67.1 45.2 42.9 45.8 54.6 56.5 57.3 43.8 41.6 40.8 45 46.1 45.5 52.5 56.1 61 55.4 56.5 55.4 60.6 56.5 60.6 45.5 44.6 45.6 48.5 50.3 50.6 44.9 48 51.8 42.6 49.7 46.2 44.7 40.7 42 36.6 40.9 45.5 21.7 20.1 19 21.2 19.7 22.6 8.4 8.8 9.2 74637 53.2 53.6 48.4 49.6 49.5 49.4 54.1 47.4 50.7 48.4 56.8 44.8 45 51.1 45.4 47.8 43.9 51.5 46.3 54.8 44.9 44 52.6 45.5 44.6 45.6 42.2 43.3 38.5 33.4 32 21 17.7 9.3 65211 46.4 52.6 43.2 43.7 43.1 43.5 52.3 42.1 43.5 41.4 51.4 39.4 45.8 49.7 39.3 41 37.8 53.4 39.2 51.4 39.3 45.5 49.1 48.5 50.3 50.6 42.2 43.7 49.3 39.3 39.3 20.7 22.4 9.2 18621 97219i 44.5 43.9 50.7 49.2 42.4 41 42.4 42 42 41.7 41.5 40.2 51.7 49.2 40.9 39.6 42.7 39.7 41 38.1 51 47.6 38.3 37.6 44.3 44.2 55.1 46.4 38.8 37.3 39.7 37.7 37.6 36 58.7 50.2 38.4 36.5 53.6 47.5 38.7 35.9 41.3 38.6 53.1 44.8 44.9 42.6 48 49.7 51.8 46.2 43.3 38.5 43.7 49.3 37.7 37.7 36.8 31.3 42.5 33.6 18.2 18.6 21.7 17.3 9.3 9 Descending similarity (median % overlap) 15769 30.6 36.1 29.1 28.4 28.6 28.7 36.5 27.3 28.9 27.7 37.3 25.5 31.5 36.3 25.5 26.5 24.7 41.1 25.9 37.2 25 29.1 41.2 44.7 40.7 42 33.4 39.3 36.8 31.3 41.4 17.1 24 5.9 77114 28.5 33.7 26.2 25.6 26.2 26.3 35.5 24.7 26.3 25.1 35.7 22.8 30 35 23.5 24.3 22.3 39.6 23.4 37.5 22.7 27.5 42.4 36.6 40.9 45.5 32 39.3 42.5 33.6 41.4 16.5 50 5.8 66514 20 20.6 18.4 19.2 18.8 18.8 21.4 18.8 18.8 17.7 21.5 17.4 18.5 19.6 16.8 17.4 16.4 20.6 17.8 20.9 17.2 19.1 20 21.7 20.1 19 21 20.7 18.2 18.6 17.1 16.5 10.1 4 77115 14.4 16.9 13 12.8 13.3 13.3 17.7 12.4 13.3 12.7 18.1 11.4 15.5 17.8 11.8 12.2 11.1 19.7 11.8 19.3 11.4 19.2 21.7 21.2 19.7 22.6 17.7 22.4 21.7 17.3 24 50 10.1 3.9 85246 14.2 13.3 14.6 15.8 16 12.9 12.1 14.7 13.9 13.7 12.4 15.8 12.4 11.4 14.3 15 14.7 10.9 12.7 11.4 14.2 14 10.6 8.4 8.8 9.2 9.3 9.2 9.3 9 5.9 5.8 4 3.9 Descending similarity (median % overlap) 13867 50308i 20899i 87133 45682 66398 63103 84940v 870484i 870486i 56365 14941 20441v 40816i 13800 53706 86010 29850v 92536i 20814 22730 20109 71263 47587 61963v 91943i 74637 65211 18621 97219i 15769 77114 66514 77115 85246 Descending # of total spectra Id=Y Proteome Informatics Research Group A B R F Subset of Participants Used for Localization Analysis Proteome Informatics Research Group 8000 # spectra Id Yes # spectra Loc Yes 7000 # spectra 6000 5000 4000 3000 2000 1000 14941 87133 22730 86010 13800 84940v 20899i 53706 92536i 870486i 45682 870484i 85246 13867 20441v 40816i 20109 50308i 29850v 56365 66398 91943i 47587 71263 65211 63103 97219i 20814 61963v 18621 74637 15769 77114 66514 77115 0 35 22 RF 1 0 1 A0 F 1 CM 0 M 8000 # spectra Id Yes # spectra Loc Yes 7000 5000 4000 3000 2000 1000 18621 61963v 97219i 71263 47587 91943i 56365 50308i 20109 20441v 13867 45682 870486i 92536i 53706 20899i 84940v 13800 86010 22730 87133 0 14941 # spectra 6000 0 1 F R M C A Excluded 0% localization 100% localization FDR - very high? Replicate submission Merged spectra Categorization Errors Y Loc only when no possible ambiguity A B R F Wide Range in Willingness to be Certain of Localization Proteome Informatics Research Group 4000 # spectra Id Yes Frxn 3 # spectra Loc Yes Frxn 3 # spectra Id Yes Frxn 4 # spectra Loc Yes Frxn 4 # spectra Id Yes Frxn 12 # spectra Loc Yes Frxn 12 3500 # spectra 3000 2500 2000 1500 1000 500 65211 18621 61963v 91943i 97219i 71263 47587 56365 50308i 20109 20441v 13867 45682 870486i 92536i 53706 20899i 84940v 13800 86010 22730 87133 14941 0 Fraction of Confidently Identified Spectra (Id=Y) Marked Fully Localized (Loc=Y) 1.2 Median Fraction of Confident Spectra Marked Loc=Y Fraction of Spectra 1.0 0.8 Fr3 = 48% Fr4 = 67% Fr12 = 65% 0.6 0.4 0.2 0.0 Fr3 Fr4 Fr12 On Average, How Similar Are the Sets of Confidently Identified and Localized Phosphopeptides (Id=Y, Loc=Y)? A B R F Proteome Informatics Research Group Pairwise Comparisons of Phosphopeptides - All Fractions 120 Other 100 overlap Median % 80 60 40 38% ± 8 20 n=22 overlap unique_to_participant unique_to_other 18621 61963v 97219i 71263 47587 91943i 56365 50308i 20109 20441v 13867 45682 870486i 92536i 53706 20899i 84940v 13800 86010 22730 87133 Participant 14941 0 sorted by # spectra Id=Y A B R F Phosphopeptide %Overlap By num. spectra Id=Y Proteome Informatics Research Group 14941 87133 22730 86010 13800 84940v 20899i 53706 92536i 870486i 45682 13867 20441v 20109 50308i 56365 91943i 47587 71263 97219i 61963v 18621 14941 87133 22730 86010 1380084940v 20899i 5370692536i 870486i 45682 1386720441v 2010950308i 5636591943i 47587 7126397219i 61963v 18621 35.8 31.8 28.5 33.7 31 35.2 29.8 31.3 36.6 32.2 33.8 26.6 19.5 33.4 26.8 20.9 27.2 27.3 16.1 24 24 35.8 37.7 33.5 39.4 32.5 35.9 36.7 35.4 38.1 39.5 39.4 34.8 25.2 41.6 28.1 30.8 38.4 38 13.3 36.2 30.5 31.8 37.7 51 58.3 47.8 45.8 56.8 53.1 51.5 50.2 51.5 45.4 23.4 54.4 31.8 27.3 38.5 38.9 8.7 38 30.7 28.5 33.5 51 66.4 33.1 44.1 52.6 51.6 45.5 47.9 40.2 46.6 18.2 39.2 24.7 22.6 30.4 34.9 6.4 31.8 24.7 33.7 39.4 58.3 66.4 41.6 49.1 55.4 58.8 52.7 52.6 47.7 49.2 23.2 49.9 30.9 29.1 37.9 42.5 8.5 37.9 31 31 32.5 47.8 33.1 41.6 40.6 40.5 39.8 44.9 39.1 50.8 33.5 19.3 48.9 45.2 27.4 37.9 35.1 11.1 33.6 29.7 35.2 35.9 45.8 44.1 49.1 40.6 45.9 45.9 48.6 49.4 44.3 40.2 20.3 45.1 31.2 25.5 34 35.6 10.4 31.8 29.1 29.8 36.7 56.8 52.6 55.4 40.5 45.9 50.8 46.2 50.2 48.1 45.9 22.3 48.4 29.9 25.4 36.8 38.1 8 36.6 28.8 31.3 35.4 53.1 51.6 58.8 39.8 45.9 50.8 49 45.7 44.5 44.1 22.4 45 29.8 27.3 36.7 37.9 7.7 34.9 28.7 36.6 38.1 51.5 45.5 52.7 44.9 48.6 46.2 49 45.9 46.9 42.3 23.1 50.1 32 28.5 39.6 39.6 10.7 36.3 30.7 32.2 39.5 50.2 47.9 52.6 39.1 49.4 50.2 45.7 45.9 44.8 42.5 20.4 45.4 29.4 25.4 36.6 37.6 8.9 37.1 32.8 33.8 39.4 51.5 40.2 47.7 50.8 44.3 48.1 44.5 46.9 44.8 41.6 24.8 54.3 36.5 28.9 40.3 41.2 11.3 38.6 31.8 26.6 34.8 45.4 46.6 49.2 33.5 40.2 45.9 44.1 42.3 42.5 41.6 26.3 43.6 28 28.4 38.5 42.6 8.8 47.3 29.6 19.5 25.2 23.4 18.2 23.2 19.3 20.3 22.3 22.4 23.1 20.4 24.8 26.3 28.6 22 25.1 30.9 31.8 14.2 32.6 20 33.4 41.6 54.4 39.2 49.9 48.9 45.1 48.4 45 50.1 45.4 54.3 43.6 28.6 38.6 37.5 46.1 48.4 13.4 44.2 36.8 26.8 28.1 31.8 24.7 30.9 45.2 31.2 29.9 29.8 32 29.4 36.5 28 22 38.6 26 34 38.7 13.6 30.7 26.9 20.9 30.8 27.3 22.6 29.1 27.4 25.5 25.4 27.3 28.5 25.4 28.9 28.4 25.1 37.5 26 39.4 39.8 10.5 36.1 31.1 27.2 38.4 38.5 30.4 37.9 37.9 34 36.8 36.7 39.6 36.6 40.3 38.5 30.9 46.1 34 39.4 46.7 12.7 47.3 35.4 27.3 38 38.9 34.9 42.5 35.1 35.6 38.1 37.9 39.6 37.6 41.2 42.6 31.8 48.4 38.7 39.8 46.7 12.2 45.2 37.5 16.1 13.3 8.7 6.4 8.5 11.1 10.4 8 7.7 10.7 8.9 11.3 8.8 14.2 13.4 13.6 10.5 12.7 12.2 12.3 11.2 24 36.2 38 31.8 37.9 33.6 31.8 36.6 34.9 36.3 37.1 38.6 47.3 32.6 44.2 30.7 36.1 47.3 45.2 12.3 33.4 24 30.5 30.7 24.7 31 29.7 29.1 28.8 28.7 30.7 32.8 31.8 29.6 20 36.8 26.9 31.1 35.4 37.5 11.2 33.4 By median %overlap Descending # of total spectra Id=Y 22730 13800 50308i 870486i 53706 92536i 20441v 13867 45682 20899i 84940v 71263 47587 86010 61963v 87133 18621 14941 56365 91943i 20109 97219i 22730 1380050308i 870486i 5370692536i 20441v 13867 4568220899i 84940v 71263 47587 8601061963v 87133 18621 14941 5636591943i 2010997219i 58.3 54.4 51.5 56.8 53.1 45.4 51.5 50.2 45.8 47.8 38.9 38.5 51 38 37.7 30.7 31.8 31.8 27.3 23.4 8.7 58.3 49.9 52.7 55.4 58.8 49.2 47.7 52.6 49.1 41.6 42.5 37.9 66.4 37.9 39.4 31 33.7 30.9 29.1 23.2 8.5 54.4 49.9 50.1 48.4 45 43.6 54.3 45.4 45.1 48.9 48.4 46.1 39.2 44.2 41.6 36.8 33.4 38.6 37.5 28.6 13.4 51.5 52.7 50.1 46.2 49 42.3 46.9 45.9 48.6 44.9 39.6 39.6 45.5 36.3 38.1 30.7 36.6 32 28.5 23.1 10.7 56.8 55.4 48.4 46.2 50.8 45.9 48.1 50.2 45.9 40.5 38.1 36.8 52.6 36.6 36.7 28.8 29.8 29.9 25.4 22.3 8 53.1 58.8 45 49 50.8 44.1 44.5 45.7 45.9 39.8 37.9 36.7 51.6 34.9 35.4 28.7 31.3 29.8 27.3 22.4 7.7 45.4 49.2 43.6 42.3 45.9 44.1 41.6 42.5 40.2 33.5 42.6 38.5 46.6 47.3 34.8 29.6 26.6 28 28.4 26.3 8.8 51.5 47.7 54.3 46.9 48.1 44.5 41.6 44.8 44.3 50.8 41.2 40.3 40.2 38.6 39.4 31.8 33.8 36.5 28.9 24.8 11.3 50.2 52.6 45.4 45.9 50.2 45.7 42.5 44.8 49.4 39.1 37.6 36.6 47.9 37.1 39.5 32.8 32.2 29.4 25.4 20.4 8.9 45.8 49.1 45.1 48.6 45.9 45.9 40.2 44.3 49.4 40.6 35.6 34 44.1 31.8 35.9 29.1 35.2 31.2 25.5 20.3 10.4 47.8 41.6 48.9 44.9 40.5 39.8 33.5 50.8 39.1 40.6 35.1 37.9 33.1 33.6 32.5 29.7 31 45.2 27.4 19.3 11.1 38.9 42.5 48.4 39.6 38.1 37.9 42.6 41.2 37.6 35.6 35.1 46.7 34.9 45.2 38 37.5 27.3 38.7 39.8 31.8 12.2 38.5 37.9 46.1 39.6 36.8 36.7 38.5 40.3 36.6 34 37.9 46.7 30.4 47.3 38.4 35.4 27.2 34 39.4 30.9 12.7 51 66.4 39.2 45.5 52.6 51.6 46.6 40.2 47.9 44.1 33.1 34.9 30.4 31.8 33.5 24.7 28.5 24.7 22.6 18.2 6.4 38 37.9 44.2 36.3 36.6 34.9 47.3 38.6 37.1 31.8 33.6 45.2 47.3 31.8 36.2 33.4 24 30.7 36.1 32.6 12.3 37.7 39.4 41.6 38.1 36.7 35.4 34.8 39.4 39.5 35.9 32.5 38 38.4 33.5 36.2 30.5 35.8 28.1 30.8 25.2 13.3 30.7 31 36.8 30.7 28.8 28.7 29.6 31.8 32.8 29.1 29.7 37.5 35.4 24.7 33.4 30.5 24 26.9 31.1 20 11.2 31.8 33.7 33.4 36.6 29.8 31.3 26.6 33.8 32.2 35.2 31 27.3 27.2 28.5 24 35.8 24 26.8 20.9 19.5 16.1 31.8 30.9 38.6 32 29.9 29.8 28 36.5 29.4 31.2 45.2 38.7 34 24.7 30.7 28.1 26.9 26.8 26 22 13.6 27.3 29.1 37.5 28.5 25.4 27.3 28.4 28.9 25.4 25.5 27.4 39.8 39.4 22.6 36.1 30.8 31.1 20.9 26 25.1 10.5 23.4 23.2 28.6 23.1 22.3 22.4 26.3 24.8 20.4 20.3 19.3 31.8 30.9 18.2 32.6 25.2 20 19.5 22 25.1 14.2 8.7 8.5 13.4 10.7 8 7.7 8.8 11.3 8.9 10.4 11.1 12.2 12.7 6.4 12.3 13.3 11.2 16.1 13.6 10.5 14.2 Descending similarity (median % overlap) n=22 A B R F Proteome Informatics Research Group n=35 Relatedness of Participants by Overlap n=22 A B R F Proteome Informatics Research Group If Participants Agree on the Identity, Do They Also Agree Site Localization Can be Certain? No possibility of ambiguity 10.0% Frxn 4 Subset of 472 spectra for which 20/22 participants all agree on Identity % of spectra 8.0% 6.0% 4.0% 2.0% 0.0% 100% 85% 70% 55% 40% 25% % participants indicating localization Yes 10% NPA A B R F What Fraction of the Time Do They Agree On Localization(s)? Proteome Informatics Research Group 8050 spectra with > 2/22 Id Yes (Frxn 3, 4, 12) # spectra 0 # N loc all partic no ambiguity 670, 11% 1000 2000 3000 4000 5000 6000 7000 100% partic agree 67-99% partic agree < 67% partic agree 563, 10% 5918 # Y loc 2-22 partic #Y loc 1 partic 5918/8050 spectra with > 2/22 Loc Yes and Site Ambiguity Possible 798 498 836 5918 Y loc 4685, 79% For all of the participants that agree on identity when • site ambiguity is possible (#S,T,Y > # phos) • >2 participants mark Loc=Y For 79% (4,685 of 5,918) of the spectra, all participants who mark Loc=Y unanimously agree on the localization of the phosphosites 18621 61963v 97219i 71263 47587 91943i 56365 50308i 20109 20441v 13867 45682 870486i 92536i 53706 20899i 25.0% 20.0% 15.0% 10.0% 5.0% 0.0% 15.0% 10.0% 5.0% 0.0% # Spectra with Loc Agreement 50.1-99.9% 20.0% Frxn 3: 154 Frxn 4: 498 Frxn 12: 227 The participants who are the most willing to localize are more likely to disagree with the majority view. 18621 61963v 97219i 71263 47587 91943i 56365 50308i 20109 20441v 13867 45682 870486i 92536i 53706 20899i 84940v 13800 86010 22730 87133 14941 % of spectra in minority localization choice 18621 61963v 97219i 71263 47587 91943i 56365 50308i 20109 20441v 13867 45682 870486i 92536i 53706 20899i 84940v 13800 86010 22730 87133 Proteome Informatics Research Group 84940v 13800 86010 22730 87133 14941 25.0% 14941 % of spectra in minority localization choice 30.0% % of spectra in minority localization choice A B R F Which Participants are More Likely to Disagree on Localization? 20.0% 15.0% 10.0% 5.0% 0.0% A B R F Proteome Informatics Research Group Quotes From Participants • “This study has started a dialogue of accurate phosphate identification and localization. Perhaps, the results of this study will point out the inadequacies of current informatics methods in identification and localization of phosphates.” • “Great study choice! I learned a lot about available software, limitations, etc!” A B R F Proteome Informatics Research Group YY: YN: NS: ND: Resource for Inspecting Peptide Id Certainty Overlaps - Frxn 4 Y – identification Y – localization Y – identification N – localization N – identification, but top sequence same as consensus N – identification, and top sequence different than consensus 2800 2400 2000 14941 87133 22730 86010 13800 84940v 20899i 53706 92536i 870486 i 45682 870484 i 85246 13867 20441v 40816i 20109 50308i 29850v 56365 66398 91943i 47587 71263 65211 63103 97219i 20814 61963v 18621 74637 15769 77114 66514 77115 # spectra 1600 1400 1200 #DN Diff Id No #SN Same Id No #DY Diff Id Yes #SY Same Id Yes #Y1P Id Yes single 1000 800 400 0 Frxn 12 – highest precursor charges #DN Diff Id No #SN Same Id No #DY Diff Id Yes #SY Same Id Yes #Y1P Id Yes single 1600 1200 800 400 0 # spectra Proteome Informatics Research Group 1800 Frxn 3 – most multiple phos per peptide 4000 3000 14941 87133 22730 86010 13800 84940v 20899i 53706 92536i 870486 i 45682 870484 i 85246 13867 20441v 40816i 20109 50308i 29850v 56365 66398 91943i 47587 71263 65211 63103 97219i 20814 61963v 18621 74637 15769 77114 66514 77115 14941 87133 22730 86010 13800 84940v 20899i 53706 92536i 870486 i 45682 870484 i 85246 13867 20441v 40816i 20109 50308i 29850v 56365 66398 91943i 47587 71263 65211 63103 97219i 20814 61963v 18621 74637 15769 77114 66514 77115 # spectra A B R F Room for Improvement in ID Certainty Thresholds Frxn 4 – most phosphopeptides #DN Diff Id No #SN Same Id No #DY Diff Id Yes #SY Same Id Yes #Y1P Id Yes single 2000 600 1000 200 0 A B R F Proteome Informatics Research Group 1. 2. 3. Preliminary Conclusions Wide range of spectra marked confidently identified Wide range of spectra marked confidently localized Lack of a uniform method for calculating and reporting ambiguity made it hard to compare results from some participants (13 of 35 were only partially included) 4. Some participants succeeded without localization software but most at least used some measure of ambiguity 5. Typically, very few identifications were unique to any one participant 6. Unique peptide assignments were roughly 57% identical (n=35) 7. Confidently localized phosphopeptides assignments were roughly 38% identical (n=22) 8. Participants that performed well often shared the highest similarity with other participants 9. Participants did not hesitate to mark peptides with ambiguous phosphosite localizations (57% of identified spectra on average) 10. If all of the participants agree on the identification, phosphosite ambiguity is possible, and that localization is possible, for 79% of the spectra, participants unanimously agree on the localization(s) A B R F Proteome Informatics Research Group iPRG Membership • • • • • Manor Askenazi - Dana-Farber Cancer Institute Karl Clauser - Broad Institute of MIT and Harvard Lennart Martens (incoming chair) - Ghent University, Belgium W. Hayes McDonald - Vanderbilt University Paul A Rudnick (outgoing chair) – NIST • • • • Karen Meyer-Arendt (outgoing member) - University of Colorado *Brian C. Searle (outgoing member) - Proteome Software, Inc. *William S. Lane (outgoing member) - Harvard University *Jeffrey A Kowalak (EB Liaison) (outgoing) – NIMH • • • Eric Deutsch (incoming member) – Institute for Systems Biology Nuno Bandiera (incoming member) – UCSD Robert Chalkley (incoming member) – UCSF *Founding member A B R F Proteome Informatics Research Group Acknowledgements • Phillip Mertins, The Broad Institute – All wet lab work and an analysis • Steve Gygi, Harvard Medical School – Test datasets • Matthew Chambers, Vanderbilt University Medical Center – Data format conversions (ProteoWizard) • Steve Stein and Yuri Mirokhin, NIST – A K562 phosphopeptide spectral library • Renee Robinson, Harvard University – “The Anonymizer” A B R F Proteome Informatics Research Group Selected Survey Responses • Do you think this type of study is useful? – Yes 33 (100%) • How difficult do you think this study was? – Easy 4 – Challenging 17 – Just right 12 • Based on this study, would you consider participating in future ABRF studies? – Yes 33 (100%) • Have you participated in previous ABRF studies? – No 14 (42%) – Yes 19 (58%) A B R F Proteome Informatics Research Group Survey cont. • Before this study, how confident were you of your ability to identify and rank phosphopeptides including assessing phosphorylation site localization? • Now, after completing the study, how confident are you of your ability to identify and rank phosphopeptides including assessing specific phosphorylation sites?