clauser_NIH_April_2010

advertisement
Proteomics of Tumor Extracellular Matrix
The “Matrisome Project”
Karl Clauser
Proteomics Platform of the Broad Institute
Alexandra Naba
Koch Institute for Cancer Research @ MIT – Hynes Lab
The ECM is a prominent part of the tumor microenvironment
Tumor cells
Normal epithelial cells
Basement membrane
Immune cells:
macrophages, lymphocytes…
Fibroblasts
ECM
Lymphatic vessels
Blood vessels
endothelial
cells, pericytes
Adapted from Joyce JA. and Pollard JW., Nature Review Cancer (2009)
The ECM gene expression is dysregulated in tumors
Fibronectin / Smooth Muscle Actin
Normal
pancreatic
islet
arteriole
Normal Pancreas
Tumor
 Marked up-regulation of fibronectin-rich matrix around
vascularized RIPTAg pancreatic tumors
Astrof S. et al., MCB (2004)
ECM organization is often altered in tumors
GPR56
TGM2
Merge
(blue: DAPI – nuclei)
Non-metastatic
melanoma
Metastatic
melanoma
Lei Xu
The Extracellular Matrix: Not Just Pretty Fibrils!
ECM
GF
GF
Integrins
GF receptors
Focal Adhesion
Talin, Paxillin,
Vinculin, FAK, Src…
Survival
Migration
Proliferation
Morphology
Why study the tumor matrisome?
 The ECM provides biophysical and biochemical cues promoting cell
growth, invasion and metastasis.
 Goals:
1. What is the source of the tumor ECM (tumor or stroma?)
2.
3.
What are the changes in the matrisome during tumor
progression?

 Invasion?

 Angiogenic switch?

 Metastatic dissemination?
Can ECM proteins serve as prognostic/diagnostic tool?
The extracellular matrix
Structural proteins (fibrils):
Proteoglycans:
-Glycosaminoglycans covalently bound
to a core protein: (Aggrecan, Decorin,
Perlecan, Versican…)
- Hyaluronan
-Collagens (30 genes)
-Elastin
Matrisome
~ 500-700 proteins
Adhesive glycoproteins:
ECM associated proteins:
- Fibronectin
-ECM modification enzymes (LOX, TG2…)
- Laminins (12 genes)
-Matrix Metalloproteinases
-Thrombospondins (5 genes)
-Growth Factors (VEGF, TGFb…)
-Vitronectin, and many more…
Why using a proteomics-based approach?
 Correlation of the protein expression with the mRNA
data?
 The insolubility of the ECM proteins: an advantage!
Enrichment of ECM protein from normal tissue
Purification
Steps
Tissue: Mechanical Lysis
C
Chemical Lysis
(High salt Buffer)
N
M
CS
Collagen
180kDa
180kDa
 ECM proteins:
Membrane protein
solubilization (DOC, NP-40)
120kDa
83kDa
8-fold enrichment
VI (ECM)
Laminin
(ECM)
Integrin
b1 (PM)
Tf
Receptor (PM)
55kDa
Cytoskeletal protein
solubilization (SDS)
Tubulin
49kDa
38kDa
Insoluble fraction
= ECM-enriched fraction
Actin
(Cytoskeleton)
(Cytoskeleton)
GAPDH
(Cytosol)
16kDa
Histones
(Nucleus)
The proteomics workflow
Solubilize
in
8M urea
PNGase-F
Deglycosylation
2M urea
Lys-C, trypsin
Digestion to
peptides
2M urea
Agilent 3100
Reversed Offgel Electrophoresis
phase Separation by peptide pI
Desalting
Thermo
LTQ-Orbitrap
LC-MS/MS
Spectrum Mill
identification of
peptides and
proteins
LC/MS/MS
MS
8MS/MS
Intensity
Relative Abundance
LC
Retention
time (min)
m/z
Quantitation
Identification
1 cycle: 3sec.
1MS scan 8 most abundant peptides
2nd MS
8 MS/MS: peptide sequence
Database search parameters
Content of the lung matrisome - unfractionated
Mass Spec Intensity
Number of Peptides
0.08%
2.07%
Number of Proteins
0.64%
3.55%
8.92%
0.30%
17.81%
15.38%
9.79%
29,59%
(50)
0.33%
8.51%
0.66%
77.85%
4.05%
59,41%
(1291)
19.53%
10.65%
2.39%
2.35%
4.83%
Protein sub-cellular location classified from
literature knowledge and GO annotation
7.10%
5.92%
8.28%
The lung matrisome
Basement membrane
Collagens
components
Col4a1
Col1a1
Col4a2
Col1a2
Col4a3
Col3a1
Col4a4
Col5a1
Col5a2
Lama2
Col6a1
Lama3
Col6a2
Lama4
Col6a3
Lama5
Col12a1
Lamb2
Col29a1
Lamb3
Lamc1
Lamc2
Nidogen-1
Perlecan
Others
Proteoglycans
Enzymes
Agrin
Decorin
Dermatopontin
Efemp-1
Elastin
Emilin-1
Fibrinogen, alpha
Fibrinogen, beta
Fibrinogen, gamma
Fibrillin-1
Fibronectin
Fibulin-5
Mfap-4
Nephronectin
Periostin
Prolargin
Tinagl1
Vitronectin
von Willebrand Factor
Biglycan
Lumican
Mimecan
LOXL1
TG2
Growth
Factors
LTBP1
TGFbi
Off Gel Electrophoresis: principle
IPG gel strip
Pi (A)
pH gradient
Pi (B)
 Capacity 50-100ug total peptide
 Separation into 12 fractions, pI 3-10
 Each fraction is analyzed by LC/MS/MS
 On average ~ 1800 proteins identified (6 times more than without OGE)
OGE fractionation: normal lung 5557
Unseparated sample
Overlap of Distinct Peptides in Fractions
3000
1 frxn
9178
83%
# distinct peptides
2500
2000
11 frxns
10 frxns
9 frxns
8 frxns
1500
7 frxns
6 frxns
1000
5 frxns
4 frxns
500
3 frxns
0
2 frxns
1
2
3
4
5
6
pI resolution
7
8
9_10
11
12
13
10
11
12
13
OGE fr #
10
9
pI median
8
7
6
5
4
3
0
1
2
3
4
5
6
7
OGE fr #
8
9
1 frxns
The lung matrisome before/after OGE
Mass Spec Intensity
Number of Peptides
Number of Proteins
Before OGE
22.15%
40.59%
29,59%
50 prot.
59,41%
1291 peptides
77.85%
70.41%
~ 2x more
~ 2x more
After OGE
9,74%
105 prot.
27,81%
26,94%
2328 peptides
72,19%
73,06%
ECM
90,26%
Non-ECM
Coverage and Sensitivity Improvements from OGE
Increased coverage
1 peptide
before OGE
Undetected before OGE
Lung matrisome after OGE
Basement membrane
components
Collagens
Others
Proteoglycans
Enzymes
Growth
Factors
Col4a1
Col1a1
Agrin
Mfap-2
Asporin
LOX
LTBP1
Col4a2
Col1a2
ECM-1
Mfap-4
Biglycan
LOXL1
LTBP2
Col4a3
Col3a1
Elastin
Multimerin-1, -2
Decorin
LOXL2
LTBP4
Col4a4
Col5a1, a2, a3, a6
Emilin-1
Nephronectin
Dermatopontin
LOXL3
TGFbi
Lama1
Col6a1, a2, a3
Emilin-2
Papilin
Lumican
TGM2
Lama2
Col7a1
Fras-1
Periostin
Mimecan
ADAM10
Lama3
Col9a2
Frem-1
Prolargin
ADAMTS9
Lama4
Col12a1
Frem-2
Prg2
ADAMTS17
Lama5
Col14a1
Fibrinogen, a
SPARC
ADAMTSL1
Lamb2
Col15a1
Fibrinogen, b
Spondin-1
ADAMTSL4
Lamb3
Col16a1
Fibrinogen, g
Thrombospondin-1
ADAMTSL5
Lamc1
Col18a1
Fibrillin-1
Thrombospondin-4
MMP9
Lamc2
Col23a1
Fibronectin
Tinagl1
MMP19
Nidogen-1
Col24a1
Fibulin-1, -2
Tenascin-X
TIMP3
Perlecan
Col27a1
Fibulin-5
Vitronectin
Pcolce
Col28a1
Hemicentin-1, von Willebrand Factor
Col29a1
Hemicentin-2
Additional ECM proteins detected after OGE
Plod-1, -3
A very reproducible approach
 The comparison of 2 samples processed in parallel lead to a
> 90% identity
 The difference between the matrisome of 2 different organs
represents less than 10% of the proteins identified.
 Comparison of the lung and colon matrisome:
 Identification of proteins exclusively present in the lung and participating
in the TGFb regulation axis
 LTBP2: binds TGFb family member
 Thrombospondin-1: the TSP1 knock-out mice get pneumonia that can
be ameliorated by TGFb activation
The complex domain structures of ECM proteins
Fibronectin
S S
Fibrillin-1
LTBP-1
Thrombospondin-1
Hynes RO., Science (2009)
Focusing on ECM proteins - Help from Bioinformatics?
Knowledge-based
Annotation
We will miss unknown
proteins
Combination of the
two approaches:
OGE proteomics
data
Complete Matrisome
85 domains
found in
proteins
involved in :
cell adhesion,
GF, enz.
(InterPro IDs)
Extraction of the
proteins that contains
at least one domains
We will miss unknown
proteins that have no
domains!
Annotations of the protein sub-cellular location
 The GO annotations for cellular compartment unsatisfactory
 Can be inconsistent for mouse vs. human

Tgm2 Protein-glutamine gamma-glutamyltransferase 2 (human)
mitochondrion|mitochondrion|plasma membrane|plasma membrane|

Tgm2 Protein-glutamine gamma-glutamyltransferase 2 (mouse)
proteinaceous extracellular matrix|cytosol|membrane|

Lamb2 laminin, beta 2 (human)
extracellular region|basal lamina|extracellular space|nucleus|cytoplasm|endoplasmic reticulum|laminin-11 complex|

Lamb2 laminin, beta 2 (mouse)
basement membrane|basement membrane|

Fbn1 fibrillin 1 (human)
microfibril|microfibril|extracellular region|basement membrane|extracellular space|

Fbn1 fibrillin 1 (mouse)
microfibril|extracellular region|proteinaceous extracellular matrix|

EMILIN1 (human)
extracellular region|proteinaceous extracellular matrix|extracellular space|nucleus|nucleolus|centrosome|

Emilin1 (mouse)
extracellular region|proteinaceous extracellular matrix|extracellular space|
Characterization of the tumor matrisome
1. Understanding the origin of tumor ECM
2. Can we observe changes in the matrisome during the course
of tumor progression?
 Invasion?
 Angiogenic switch?
 Metastatic dissemination
 How different is the ECM from when compared to the
ECM of the primary tumor?
3. Can we correlate changes in the matrisome to the
invasiveness of a tumor?
 Can ECM serve as a diagnostic / prognostic tool in
clinics?
Of mouse or man?
SC Injection of A375
Human Melanoma Cells
“NSG” mouse
NOD/SCID/IL2R
Tumor Collection
--Tumor ECM preparation
Proteomics pipeline
Proteins secreted by the tumor cells:
human sequence
Proteins secreted by the stromal cells:
mouse sequence
VTN - Vitronectin is secreted by the stroma
A375-1A
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
10 peptides detected
0 human only
8 mouse only
2 shared
Emilin-1 is predominantly secreted by the tumor
A375-1A
37 peptides detected
20 human only
9 mouse only
8 shared
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
MS intensity (H/M) = 6.3
BGN - Biglycan is Predominantly Secreted by the Stroma
A375-1A
11 peptides detected
2 mouse only
1 human only
8 shared
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
MS intensity (M/H) = 460
Challenges in MS Quantitation of Tumor vs. Stroma Secretion
16 peptides detected
10 mouse only
5 human only
1 shared
A375-1A
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
Mouse
Human
Use all distinguishing peptides
Skip - cleavage site alteration
- unpaired peptides
Use only similar pairs with same charge
# peptides
10M, 5H
3M, 3H
MS intensity (M/H)
1.6
0.7
Protein Grouping for Species in Xenograft Tumors
protein group
shared
shared
human
mouse
human
mouse
human
subgroup 1
subgroup 2
Subgroup specific - ON
1. total MS1 intensity for only human
2. total MS1 intensity for only mouse
Subgroup specific - OFF
1. total MS1 intensity for human + shared
2. total MS1 intensity for mouse + shared
Of mouse or man?
Tumor Tumor Both
Only More Similar Indistinguishable
Stroma
More Stroma Only
COL11A1
COL11A2
COL19A1
COL27A1
COL4A4
EMILIN2
LAMA3
LAMA5
LTBP1
LTBP1
LTBP3
MMP14
PCOLCE
PLOD1
PLOD2
TIMP1
Bgn
Col12a1
Col16a1
Col1a1
Col1a2
Col2a1
Col3a1
Col5a2
Col5a3
Fbn1
Fgb
Fn1
Ltbp4
Lum
Postn
No shared peptides
No shared peptides
EMILIN1
HSPG2
LAMA4
LAMB1
LAMC1
LOXL2
PLOD3
TGFBI
TINAGL1
Adamtsl1
Col4a1
Col4a2
Col4a5
Col6a1
Col6a2
Col6a3
Lamb2
Loxl3
Mfap2
Nid1
Prelp
Tgm2
Tnc
Tnxb
COL13A1
Col13a1
COL18A1
Col18a1
COL22A1
Col22a1
ECM1
Ecm1
VWA1
Vwa1
Col4a3bp
Efemp2
MFAP1
Sparc
Thbs4
Timp3
Vcan
Aspn
Lama2
Col10a1
Lamb3
Col13a1
Loxl1
Col14a1
Ltbp2
Col15a1
Mfap4
Col23a1
Mfap5
Col28a1
Mmp19
Col4a3
Nid2
Col5a1
Ogn
Col7a1
Thbs1
Col9a1
Tnn
Col9a2
Vtn
Col9a3
Vwa5a
Dcn
Vwf
Dpt
E330026B02Rik
Efemp1
Eln
Fbln1
Fbln2
Fbn2
Fga
Fgfbp3
Fgg
More: >5x
Gm7455
Similar: -5 to 5x
HMCN2
Of mouse or man?
Fibrillar
Basement Collagens
Membrane (1,2,3,5,11)
HMCN2
HSPG2
Lama2
LAMA3
LAMA4
LAMA5
LAMB1
Lamb2
Lamb3
LAMC1
Nid1
Nid2
Col4a1
Col4a2
Col4a3
Col4a3bp
COL4A4
Col4a5
Col15a1
COL18A1
Col18a1
Col1a1
Col1a2
Col2a1
Col3a1
Col5a1
Col5a2
Col5a3
COL11A1
COL11A2
More: >5x
Similar: -5 to 5x
Matricellular Hemostatic
ECM
ECM
ECM
Proteo Modifying Growth
Proteins
Glycoproteins glycans Enzymes Factors Others
Other
Collagens
Col6a1
Col6a2
Col6a3
Col7a1
Col8a1
Col9a1
Col9a2
Col9a3
Col10a1
Col12a1
COL13A1
Col13a1
Col14a1
Col16a1
COL19A1
COL22A1
Col22a1
Col23a1
COL27A1
Col28a1
E330026B02Rik
Gm7455
Fga
Fgb
Fgg
Vtn
VWA1
Vwa1
Vwa5a
Vwf
Sparc
Thbs1
Thbs4
Tnc
Tnn
Tnxb
Tumor
Only
Tumor
More
Both
Similar
Aspn
Bgn
Dcn
Dpt
Lum
Ogn
Vcan
Adamtsl1
Loxl1
LOXL2
Loxl3
MMP14
Mmp19
PCOLCE
PLOD1
PLOD2
PLOD3
Tgm2
TIMP1
Timp3
Indistinguishable
Fgfbp3
LTBP1
LTBP1
Ltbp2
LTBP3
Ltbp4
TGFBI
Stroma
More
ECM1
Ecm1
Efemp1
Efemp2
Eln
EMILIN1
EMILIN2
Fbln1
Fbln2
Fbn1
Fbn2
Fn1
MFAP1
Mfap2
Mfap4
Mfap5
Postn
Prelp
TINAGL1
Stroma Only
Can we identify trends?
 Basement membrane produced by combination
of tumor/stroma
 Predominantly produced by the tumor
 ECM modifying enzymes
 Laminins
 Growth Factors
Predominantly produced by the stromal cells
 Proteoglycans
 Most Collagens
Characterization of the tumor matrisome
1. Understanding the origin of tumor ECM
2. Can we observe changes in the matrisome during the course
of tumor progression?
 Invasion?
 Angiogenic switch?
 Metastatic dissemination
 How different is the ECM from when compared to the
ECM of the primary tumor?
3. Can we correlate changes in the matrisome to the
invasiveness of a tumor?
 Can ECM serve as a diagnostic / prognostic tool in
clinics?
Ongoing work…
 Comparing metastatic and non-metastatic tumors:
 Limitation of the A375 xenograft model:
 Subcutaneous injection
 Metastatic only by tail vain injection
 What would be the control normal matrisome?
Xu L. et al., PNAS (2006)
Mouse model of mammary carcinoma: MMTV-PyMT
4 wks
Premalignant mammary gland
No tumor palpable



8 wks
15 wks
Palpable tumor
(1wk pp)
Late Stage Metastatic
tumor
tumor
A transgenic mouse strain that expresses the polyoma middle T oncogen (PyMT) under the
mouse mammary tumor virus promoter (MMTV) in the mammary gland.
Carcinomas develop in the mammary gland and mimics human disease stages.
In parallel, mammary gland of age-matched WT FVB mice are collected.
Guy CT. et al., MCB (1992)
Lin EY. et al., (2003)
The MMTV-PyMT mRNA signature
Molecular expression profiling of tumors initiated by transgenic overexpression of
polyoma middle T antigen (PyMT) targeted to the mouse mammary gland.
Procollagen type I, 2
Procollagen type III, 1
Biglycan
Fibrinogen-like
Nidogen-1
LOXL
MMP-2
Laminin B1 subunit 1
Collagen
Procollagen type XI, 1
Syndecan-2
And:
Procollagen type IV, 2, 3 and 6, procollagen type XV, lumican, nephronectin, MMP-14
 How well do the array data reflect in protein-level changes?
Desai KV. et al., PNAS (2002)
Qiu TH. et al., Cancer Res (2004)
Acknowledgment
• Richard Hynes Lab
• Proteomics Platform
Alexandra Naba
John Lamar
Hui Liu
• Bioinformatics Core Facility (KI)
Charlie Whittaker
TMEN (TUMOR MICROENVIRONMENT NETWORK NCI)
Steve Carr
Jake Jaffe
U54-CA126515
Xenotransplant model of mammary carcinoma
 MDA-MB-231: Human mammary carcinoma cell line
 LM2: Highly metastatic derivative [Massagué Lab]
 Orthotopic injection in the mammary fat pad [John Lamar]
 Comparison of the primary tumor matrisome to the “normal”
mammary gland matrisome
Identification of the ECM proteins synthesized by the tumorassociated stroma and not by the normal stroma.
Manipulation of gene expression: validation
Characterization of the ECM changes at the angiogenic switch
 Model system: RIP-Tag mouse
9 wks
12 wks
 A transgenic mouse strain that expresses the simian virus 40 large T antigen (TAg)
under the rat insulin II promoter (RIP) in the b-pancreatic islet cells.
 Carcinomas develop in the pancreatic islets and progress through characteristic
stages.
 Human disease: Insulinoma (only in very rare case malignant)
Folkman, J. et al. (1989) Nature 339
Characterization of the tumor matrisome
1. Understanding the origin of tumor ECM
2. Can we observe changes in the matrisome during the course
of tumor progression?
 Invasion?
 Angiogenic switch?
 Metastatic dissemination
 How different is the ECM from when compared to the
ECM of the primary tumor?
3. Can we correlate changes in the matrisome to the
invasiveness of a tumor?
 Can ECM serve as a diagnostic / prognostic tool in
clinics?
Can we correlate changes in the matrisome to the
invasiveness/aggressiveness of a tumor?
 Collaboration with MGH:
 Colon cancer sample +/- Liver metastasis
 Patient history
 Can ECM proteins serve as prognostic/diagnostic tool?
A B
R F
Proteome Informatics
Research Group
iPRG: Informatic Evaluation of
Phosphopeptide Identification and
Phosphosite Localization
ABRF 2010, Sacramento, CA
March 22, 2010
A B
R F
A Challenging Problem
Proteome Informatics
Research Group
4/7 DSAIPVEsDtDDEGAPR
3/7 DSAIPVESDtDDEGAPR
P(m/z) -H3PO4
879
14/21 said can identify peptide but can not localize site
A B
R F
Proteome Informatics
Research Group
Solution
• Not fun to do by hand!
• Software available that evaluates ‘sitedetermining ions’
– Generate per residue localization scores
– Examples: Ascore, PTM Score (MaxQuant), pFind,
PhosphoScore, etc.
A B
R F
Proteome Informatics
Research Group
Study Goals
1. Evaluate the consistency of reporting
phosphopeptide identifications and
phosphosite localization across laboratories
2. Characterize the underlying reasons why
result sets differ
3. Produce a benchmark phosphopeptide
dataset, spectral library and analysis resource
A B
R F
Proteome Informatics
Research Group
Study Design
• Use a common dataset
• Use a common sequence database
• Allow participants to use the bioinformatic tools
and methods of their choosing
• Use a common reporting template
• Fix the identification confidence (1% FDR)
• Require an indication of phosphosite ambiguity
per spectrum
• Ignore protein inference – for now
A B
R F
Proteome Informatics
Research Group
Soliciting Participants and
Logistics
Study advertised on the ABRF website and listserv, Molecular and Cellular Proteomics blogsite,
GenomeWeb and by direct invitation from iPRG members
1. Email participation
request to
‘iPRGxxxx@gmail.com’
Participant
2. Send official study letter
with instructions
iPRG members
Questions / Answers
3. All further communication (e.g.,
questions, submission) through
‘iPRGxxx.anonymous@gmail.com’
“Anonymizer”
A B
R F
Proteome Informatics
Research Group
Study Materials and Instructions
to Participants
• 1 Orbitrap XL dataset (3
files)
– RAW, mzML, mzXML, MGF,
pkl or dta – conversions by
ProteoWizard
• 1 FASTA file (SwissProt
human seq’s. v57.1)
• 1 template (Excel)
• 1 on-line survey (Survey
Monkey)
1. Analyze the dataset
2. Report the phosphopeptide
spectrum matches in the
provided template
3. Complete an on-line survey
4. Attach a 1-2 page description
of your methodology
A B
R F
Proteome Informatics
Research Group
Reporting Template
ABRF iPRG 2010 Study Template: Phosphorylated Peptide Analysis
Instructions: Please fill in all REQUIRED fields. After deleting the example rows, create a new row for each phosphopeptide spectrum match. Multiple rows MAY be used to
report ambiguous phosphosite localizations. Phosphorylated residues MUST be indicated in the 'Peptide Sequence' field, and results should be sorted by 'Peptide
Identification Score' from most to least confident. Additional instructions can be found above each field header. Results should be emailed to
'anonymous.iPRG2010@gmail.com' no later than Jan. 10, 2010. Please make sure to fill out the REQUIRED survey --------------------->
REQUIRED FIELDS
Identifiers should be
unique scan numbers
from data file but
may also refer to a
Name of data file merged range of
(e.g.,
MS/MS scans (e.g.,
D20090930_PM_ Scan:19,
K562_SCX2316.19.19.3.dta,
IMAC_fxn03)
2316.19.19.3.pkl).
Precursor
m/z as
submited
to search
engine
Use lowercase s, t or y (e.g. SLsGSsPCPK) OR a
trailing symbol (e.g. SLS#GS#PCPK) OR a string
in parentheses (e.g. SLS(ph)GS(ph)PCPK)
immediately following each phosphorylated
Precursor residue. Only phosphorylation of S, T and Y will
charge
be compared; all other modifications (e.g.,
reported by oxidized M) will be ignored. It will be assumed
search
that all modifications indicated on S, T or Y are
engine
phosphorylations.
Protein identifier(s)
from Fasta file. Use Total number
multiple values if of
peptide is found in phosphorylati
multiple proteins, ons as
e.g., Q9NZ18;
evidenced by
Q9UQ35. Protein the precursor
inference will not m/z and MS2
be scored.
spectrum.
'Y' indicates this match
is BETTER than the
confidence threshold.
'N' indicates the match
is WORSE. Please
report BOTH types of
identifications in your
ranked list. Is this
match above 1% FDR
identification threshold
(Y|N)?
Num.
Peptide Identification
Phospho sites Certainty
1Y
Indicate 'Y' if ALL
phosphorylations
have been
confidently
localized. 'N' if
one or more
have not. Are
ALL
phosphosites
unambiguously
localized (Y|N)?
Phosphosite
Localization
Certainty
Y
Peptide
identification
score reported
by search engine
(e.g., E-value, pvalue,
probability,
Mascot score,
etc.)
Peptide
Identification
Score
0.0002097
Precursor Precursor
File
Spectrum Identifier m/z
Charge
Peptide Sequence
D20090930_PM_K562_SCX-IMAC_fxn03
Scan:908
558.7576
2 qGsPVAAGAPAK
Accession(s)
Q9NZI8
D20090930_PM_K562_SCX-IMAC_fxn04
Scan:2017
710.82233
2 TsPDPSPVSAAPSK
Q13469
1Y
N
45.41
D20090930_PM_K562_SCX-IMAC_fxn03
Scan:683
692.28891
2 _APQTS(ph)S(ph)SPPPVR_
Q8IYB3
2Y
N
30.09
D20090930_PM_K562_SCX-IMAC_fxn03
Scan:4832
775.3548
2 SQtPPGVAtPPIPK
Q15648
2Y
N
31.79
D20090930_PM_K562_SCX-IMAC_fxn03
Scan:641
D20090930_PM_K562_SCX-IMAC_fxn03
Scan:641
590.2127
590.2127
2 SLsGSsPcPK
2 sLSGSsPcPK
Q9UQ35
Q9UQ35
2Y
2Y
N
N
0.0112023
0.0915611
A B
R F
• 59 requests / 32 submissions (54% return)
 2 retractions
 + 7 iPRG members and 1 guest
Proteome Informatics
Research Group
Resource Lab Status
Conduct both core
functions and noncore lab research
3%
Membership (n=33)
39%
43%
Core only
15%
45%
55%
Non-core research lab
ABRF Member
Non-member
Primary Job Function
3%
Type of Lab
6%
6%
Bioinformatician/Developer
18%
6%
Academic
9%
Director/Manager
12%
Biotech/Pharma/Industry
58%
9%
Mass Spectrometrist
Contract Research Org
73%
Lab Scientist
Other
Government
Other
Proteomics Experience
Location
20
6%
Asia
15%
Australia/New
Zealand
15
9%
70%
Europe
North Amercia
10
5
0
1-2 years
3-4 years
5-10 years
>10 years
Unanswered
S
Sp
ec
t
ru
m
M
ill
op
ho
ss
i
or
e
na
to
r
Sc
ph
o
ph
i
Pr
Ph
os
Ph
os
or
e
Sc
NN
M
yr
iM
at
ch
PL
Qu
an
t
e
P
h
M
ax et
Q
M
SP uan
e
t
Op pSe
a
en
M rch
S
Pr
ot /TO
ei
nP PP
ro
ph
et
Pv
Sp
ie
ec
tru w
m
M
i
th ll
eg
pm
ro
p
iP
TP
co
Ta t
nd
em
OM
S
SE S A
QU
E
M
yr ST
iM
at
c
i
n- h
Pe
pt hou
id
eP se
ro
ph
e
In t
sP
ec
Sc T
af
f
Pe old
pA
RM
Pe L
pt
ize
r
pF
Sp ind
ec
tra
ST
X!
Proteome Informatics
Research Group
M
ax
ou
s
m
co
re
st
o
In
-h
cu
As
M
as
A B
R F
Software Tools Used
Peptide Identification
16
14
12
10
8
6
4
2
0
6
Phosphosite Localization
5
4
3
2
1
0
A B
R F
Proteome Informatics
Research Group
Sample:
Protocol:
Lysis:
SCX:
IMAC:
MS/MS:
The SCX/IMAC Enrichment Approach
for Phosphoproteomics
7.5x10e7 human K562 human chronic myelogenous leukemia cells, 4mg lysate
Villen, J, and Gygi, SP, Nat Prot, 2208, 3, 1630-1638.
8M urea, 75mM NaCl, 50 mM Tris pH 8.2, phosphatase inhibitors
PolyLC - Polysulfoethyl A 9.4 mm X 200mm, elute: 0-105mM KCl , 30% Acn .
Sigma - PhosSelect Fe IMAC beads, bind: 40% Acn, 0.1% formic acid, elute: 500 mM K2HPO4 pH 7
Thermo Fisher Orbitrap XL, high-res MS1 scans in the Orbitrap (60k), Top-8 fragmented in LTQ, exclude +1
and precursors w/ unassigned charges, 20s exclusion time, precursor mass error +/- 10 ppm
A B
R F
Preliminary Analysis of SCX Fractions and Dataset Selection
Proteome Informatics
Research Group
1600
3500
Fraction overlap
# distinct peptides
Precursor z
3000
# spectra
2500
2000
z4
z3
1500
z2
1000
1200
6 frxns
5 frxns
4 frxns
800
3 frxns
2 frxns
1 frxns
400
500
0
0
2
3
4
5
6
7
8
9
10 11 12
2
3
4
5
SCX fr #
7
8
9
10 11 12
SCX fr #
100%
80%
6SC
5SC
60%
Solution charge
4SC
3SC
40%
2SC
1SC
20%
0SC
% distinct peptides
100%
% distinct peptides
6
80%
60%
# phosphosites
3P
2P
40%
1P
20%
-1SC
0%
0%
2
3
4
5
6
7
8
SCX fr #
9
10 11 12
2
3
4
5
6
7
8
9
10 11 12
SCX fr#
54
A B
R F
Preliminary Analysis of SCX Fractions and Dataset Selection
Proteome Informatics
Research Group
3500
Precursor z
3000
# spectra
2500
2000
Frxn 3: multi-phosphosites
Frxn 4: single phospho, single basic
Frxn 12: multi-basic residues (RHK)
z4
z3
1500
z2
1000
500
0
2
3
4
5
6
7
8
9
10 11 12
SCX fr #
100%
80%
6SC
5SC
60%
Solution charge
4SC
3SC
40%
2SC
1SC
20%
0SC
% distinct peptides
% distinct peptides
100%
80%
60%
# phosphosites
3P
2P
40%
1P
20%
-1SC
0%
0%
2
3
4
5
6
7
8
SCX fr #
9
10 11 12
2
3
4
5
6
7
8
9
10 11 12
SCX fr#
55
A B
R F
Proteome Informatics
Research Group
Notes on Analysis
• Identification and localization were analyzed separately
• All non-phosphopeptide IDs removed
• Reported confidence indicators were used as filters
– Id = Is peptide spectrum match above 1% FDR? (Y|N)
– Loc = Are ALL phosphosites unambiguously assigned? (Y|N)
• Mods indicated on S|T|Y are assumed phos, others ignored
– Unique peptide comparison
SLsm(ox)DSQVPVYSPSIDLK
SLSm(ox)DSQVPVYSPsIDLK
SLSMDSQVPVYSPSIDLK
– Phosphopeptide comparison
SLsm(ox)DSQVPVYSPSIDLK
SLSm(ox)DSQVPVYSPsIDLK
SLsMDSQVPVYSPSIDLK
SLSMDSQVPVYSPsIDLK
A B
R F
From 30,000 Ft.
Proteome Informatics
Research Group
8000
# spectra Id Yes
# spectra Loc Yes
# unique Peptides UC ID Yes
7000
6000
5000
3,571
4000
3000
2,084
2000
1,623
1000
y
n
y
n
y
n
n
y
n
y
n
y
n
y
y
n
n
n
y
n
n
n
n
n
n
n
n
y
n
n
n
n
y
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
y
n
y
y
y
y
y
y
n
y
n
n
n
y
y
n
Ma
Se Sc
Ih
As
y
n
y
n
y
n
y
y
n n
y
n
n
y
y
n
y
n
y
n
n
n n
n
n
n
n
n
n
y
n
y
n
n
n n
n
y
y
n
y
y
y
y
y
y
y
n n
n
Pf
Se Mu* Om Ma Sm Mu Sm Mu
Mu Xt Ma Mu
Ma Ma In
Ma Mu Se Mu Se Mu
Ma Xt
Ih
As
As
As
Mq
As
Ph
y
Mq Sm Ih
Sm Ih
Pl
77115
15769
n
y
66514
74637
n
y
77114
18621
n
61963v
y
20814
n
97219i
n
n
63103
n
n
65211
y
n
71263
47587
Pf
91943i
n
66398
n
56365
n
n
n
29850v
y
y
50308i
y
20109
n
n n
40816i
y
20441v
n
n
13867
n
85246
n
y
n
y
y
n
870484i
y
y
45682
y
y
870486i
y
92536i
y
53706
84940v
y
20899i
13800
Pre masses
adjusted n n y
N term
acetyl
y
Used > 1
search
engine y y n
Pre mass
filtering n y y
Solution
charge
filtering n n y
Localization
software y y y
Peptide ID
Software Mu* Mu*Se
Localization
software Ih Ih As
86010
22730
87133
14941
0
y
Ps
Pl
Nn
Mu Pv
Ih
Ma=Mascot, Ms=MsInspect, Mu=Multiple, Mu*=Multiple + Spec Lib, Om=OMSSA, Pf=pFind, Pv=Pview, Sc=Scaffold, Se=SEQUEST, Sm=Spectrum Mill, Xt=X!Tandem
As=Ascore, Ih=In-house, In=InsPecT, Mq=MaxQuant, Nn=NNScore, Ph=Phosphinator, Pl=Phosphate Localization Score, Ps=PhosphoScore, Sm=Spectrum Mill
A B
R F
Proteome Informatics
Research Group
Quotes From Participants
(grading?)
• “It is hard to see how the results of this analysis will be assessed.
How will the accuracy of individual methods be determined? It is
always easy to find more matches but less easy to determine
whether they are credible.”
• “The most challenging part of this study is phosphate localization.
But, the data in this study is not a gold standard. So, it's hard to
judge which method works. .”
• “Well-designed study. I am looking forward to the results, and I am
honestly wondering how the study designer comes up with a
"model answer" (or maybe there won't be one?).”
• “How sure are the authors of phosphosite localisation? Have
synthetic peptides been made to validate many of the peptides?”
A B
R F
Relative Performance: Identification
By Fraction
Proteome Informatics
Research Group
4000
# spectra Id Yes Frxn 3
# spectra Id Yes Frxn 4
# spectra Id Yes Frxn 12
# spectra Id Yes
3500
3000
Performance was
not equivalent
across the 3
fractions for all
participants.
2500
2000
1500
1000
500
77115
15769
77114
66514
18621
74637
20814
61963v
65211
63103
97219i
47587
71263
66398
91943i
50308i
29850v
56365
40816i
20109
13867
20441v
45682
870484i
85246
92536i
870486i
20899i
53706
86010
13800
84940v
87133
22730
14941
0
4000
# unique peptides UC Id Yes Frxn 4
3000
# unique peptides UC Id Yes Frxn 12
2500
2000
1500
1000
500
77115
15769
77114
66514
18621
74637
97219i
20814
61963v
65211
63103
91943i
47587
71263
56365
66398
20109
50308i
29850v
20441v
40816i
870484i
85246
13867
870486i
45682
20899i
53706
92536i
13800
84940v
87133
22730
86010
0
14941
# unique peptides UC Id Yes
# unique peptides UC Id Yes Frxn 3
3500
Some participants
saw more unique
peptides than
others.
A B
R F
Proteome Informatics
Research Group
How Much Did Phosphopeptide Identifications
(Spectra and Unique Peptides) Vary?
Fr 3
Fr 4
Fr 12
3000
2000
Unique Phosphopeptide CVs
Fr3=74%
Fr4=31%
Fr12=84%*
1000
# Unique Peptides (Fr 12)
# Spectra ID=Y (Fr 12)
# Unique Peptides (Fr 4)
# Spectra ID=Y (Fr 4)
# Unique Peptides (Fr 3)
0
# Spectra ID=Y (Fr 3)
Num. of Spectra or Unique Peptides
4000
A B
R F
Proteome Informatics
Research Group
One Participant Wonders
Frxn 3 – most multiple phos per peptide
Frxn 4 – most phosphopeptides
1800
4000
# spectra Id Yes Shared
1500
3500
# spectra Id Yes Unique to Participant
# spectra Id Yes Shared
# spectra Id Yes Unique to Participant
3000
# spectra
# spectra
1200
900
600
2500
2000
1500
1000
300
500
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486 i
45682
870484 i
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
Frxn 12 – highest precursor charges
2100
1800
# spectra Id Yes Shared
# spectra Id Yes Unique to Participant
Gray means – Number of spectra where < 2
people agreed on the Id
85246: 1205 spectra with 3-15
phosphosites, 624 spectra with 4-15
1500
1200
900
20814: ?, Frxn 12 >> Frxn 3,4
600
300
0
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486 i
45682
870484 i
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
# spectra
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486 i
45682
870484 i
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
0
0
77114, 77115: merged multiple scans, so
can’t be compared with other 33
A B
R F
On Average, How Similar Are the Sets
of Confidently Identified Unique
Peptides?
Proteome Informatics
Research Group
Pairwise Comparisons of Unique Peptides - All Fractions
120
Other
100
overlap
Median %
80
60
57% ± 14
40
20
n=35
0
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486i
45682
870484i
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
Participant
overlap
unique to participant
unique to other
sorted by # spectra Id=Y
A B
R F
Unique Peptide %Overlap
14941
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486i
45682
870484i
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
82.5
73.5
79.8
75.6
78.4
75.9
73.2
69.3
72.5
74.6
69.2
15.8
71.5
61.7
57.2
55.1
65.4
52.4
59.2
70.2
45.5
44.3
47.2
39.4
60.6
37.6
53.9
43.7
38.3
44.8
25.5
22.8
17.4
11.4
87133
82.5
75.2
77.4
74.6
80.2
76.7
76
69.5
71.3
81.3
73.4
15.8
79
61.1
64
60.6
71.7
58.7
65.5
70.6
47
49.4
50.3
43.7
67.6
42
60.1
48
42.4
49.6
28.4
25.6
19.2
12.8
22730
73.5
75.2
70.3
68.5
79.4
66.6
71.4
65
68
69.5
70
14.2
68.9
54.9
57.9
51
63.2
52.6
56.5
64.7
40.8
43.8
42.9
39.3
60.7
35.9
53.9
41.6
38.7
44.9
25
22.7
17.2
11.4
86010
79.8
77.4
70.3
85.4
73.2
73.6
69.3
70
70
70.4
66.8
14.7
68.7
59.2
56
52.7
61.7
50.9
57.3
69.6
44.4
42.8
46
37.8
58.1
36
52.7
42
37.6
43.9
24.7
22.3
16.4
11.1
13800 84940v 20899i
75.6
78.4
75.9
74.6
80.2
76.7
68.5
79.4
66.6
85.4
73.2
73.6
71.3
71.7
71.3
70.7
71.7
70.7
67.6
75.1
69.5
69
67.3
67.9
69.4
67.5
70.8
69.7
73.6
73.2
66.3
69.5
68.9
14.3
14.7
14.6
69.2
74.2
72.9
59.1
58.2
63.5
57.2
61.1
62.2
52.3
53.8
57.6
63.1
67
70.3
52.3
56.2
58.4
57.5
61.3
64.4
69.4
67.9
72.2
45.7
44.1
50.7
43.5
47.9
48.5
46.6
46.5
52.5
39.3
42.1
43.2
60.1
65
65.9
37.3
39.6
41
54.1
58.3
59.5
42.8
45
47.8
38.8
40.9
42.4
45.4
47.4
48.4
25.5
27.3
29.1
23.5
24.7
26.2
16.8
18.8
18.4
11.8
12.4
13
53706 92536i 870486i
73.2
69.3
72.5
76
69.5
71.3
71.4
65
68
69.3
70
70
67.6
69
69.4
75.1
67.3
67.5
69.5
67.9
70.8
64.6
66
64.6
67.1
66
67.1
71.2
63.5
66.4
68.1
64.2
90.9
15
12.7
13.7
73.8
67.7
68.4
55.5
55.9
60
57.7
57
59.2
53
50.3
52.8
66.7
61.4
66.9
54.2
53.1
54.6
58.9
57.7
59.1
65.2
81.7
69
43.4
45.8
47.7
44.5
45.2
45.7
45.6
44.9
49.5
41
39.2
41.4
62.5
59.5
62.3
37.7
36.5
38.1
56.3
55.4
55.7
42.8
42.9
45.5
39.7
38.4
41
47.8
46.3
48.4
26.5
25.9
27.7
24.3
23.4
25.1
17.4
17.8
17.7
12.2
11.8
12.7
45682 870484i
74.6
69.2
81.3
73.4
69.5
70
70.4
66.8
69.7
66.3
73.6
69.5
73.2
68.9
71.2
68.1
63.5
64.2
66.4
90.9
68.7
68.7
16
13.9
73.8
71
60.2
58.1
61.8
60.6
56.5
54
69.4
69.7
57.3
56.8
63.1
61.5
65
66.1
46.6
46.7
48.1
47.4
49.9
50.1
43.1
43.5
65.1
64.6
41.7
39.7
58.7
57.7
48.3
47.1
42
42.7
49.5
50.7
28.6
28.9
26.2
26.3
18.8
18.8
13.3
13.3
85246
15.8
15.8
14.2
14.7
14.3
14.7
14.6
15
12.7
13.7
16
13.9
14.2
12.4
11.4
14
13.3
10.9
12.4
12.9
9.2
8.4
10.6
9.2
12.1
9
11.4
8.8
9.3
9.3
5.9
5.8
4
3.9
13867 20441v 40816i
71.5
61.7
57.2
79
61.1
64
68.9
54.9
57.9
68.7
59.2
56
69.2
59.1
57.2
74.2
58.2
61.1
72.9
63.5
62.2
73.8
55.5
57.7
67.7
55.9
57
68.4
60
59.2
73.8
60.2
61.8
71
58.1
60.6
14.2
12.4
11.4
62.1
64.1
62.1
58.1
64.1
58.1
60.5
55.5
55.5
73.5
63.6
70.2
61.3
59
74
67.3
58.1
64.7
69.9
59.9
61.7
49.8
53.7
57.4
52
47.9
57.1
52.7
52.5
56.5
46.4
45.8
49.7
71.3
62.6
71.9
43.9
44.2
46.4
64
58.5
67.7
50.7
57.2
58
44.5
44.3
55.1
53.2
45
51.1
30.6
31.5
36.3
28.5
30
35
20
18.5
19.6
14.4
15.5
17.8
20109 50308i 29850v
55.1
65.4
52.4
60.6
71.7
58.7
51
63.2
52.6
52.7
61.7
50.9
52.3
63.1
52.3
53.8
67
56.2
57.6
70.3
58.4
53
66.7
54.2
50.3
61.4
53.1
52.8
66.9
54.6
56.5
69.4
57.3
54
69.7
56.8
14
13.3
10.9
60.5
73.5
61.3
55.5
63.6
59
55.5
70.2
74
60.3
54
60.3
70.2
54
70.2
59.4
68.8
64.6
53.6
67.1
58
45.5
59.7
67.1
45
53.9
62.2
48.6
58.7
59.8
45.5
52.6
53.4
56.6
75.9
70.4
38.6
49.2
50.2
56.7
70.3
68.2
46.1
55.7
63.5
41.3
50.7
58.7
44
53.6
51.5
29.1
36.1
41.1
27.5
33.7
39.6
19.1
20.6
20.6
19.2
16.9
19.7
56365
59.2
65.5
56.5
57.3
57.5
61.3
64.4
58.9
57.7
59.1
63.1
61.5
12.4
67.3
58.1
64.7
59.4
68.8
64.6
61.6
56.7
56.8
68.6
51.4
68.8
47.6
66.2
55.5
51
56.8
37.3
35.7
21.5
18.1
66398 91943i
70.2
45.5
70.6
47
64.7
40.8
69.6
44.4
69.4
45.7
67.9
44.1
72.2
50.7
65.2
43.4
81.7
45.8
69
47.7
65
46.6
66.1
46.7
12.9
9.2
69.9
49.8
59.9
53.7
61.7
57.4
53.6
45.5
67.1
59.7
58
67.1
61.6
56.7
50.8
50.8
48.7
56.5
49.9
61
43.5
50.6
65
58.1
40.2
46.2
59.4
57.3
47.4
60.6
41.5
51.8
49.4
45.6
28.7
42
26.3
45.5
18.8
19
13.3
22.6
47587
44.3
49.4
43.8
42.8
43.5
47.9
48.5
44.5
45.2
45.7
48.1
47.4
8.4
52
47.9
57.1
45
53.9
62.2
56.8
48.7
56.5
52.5
48.5
56.1
42.6
54.6
55.4
44.9
45.5
44.7
36.6
21.7
21.2
71263
47.2
50.3
42.9
46
46.6
46.5
52.5
45.6
44.9
49.5
49.9
50.1
10.6
52.7
52.5
56.5
48.6
58.7
59.8
68.6
49.9
61
52.5
49.1
57.9
44.8
59.2
56.1
53.1
52.6
41.2
42.4
20
21.7
65211
39.4
43.7
39.3
37.8
39.3
42.1
43.2
41
39.2
41.4
43.1
43.5
9.2
46.4
45.8
49.7
45.5
52.6
53.4
51.4
43.5
50.6
48.5
49.1
52.3
49.3
51.4
50.3
43.7
42.2
39.3
39.3
20.7
22.4
63103 97219i
60.6
37.6
67.6
42
60.7
35.9
58.1
36
60.1
37.3
65
39.6
65.9
41
62.5
37.7
59.5
36.5
62.3
38.1
65.1
41.7
64.6
39.7
12.1
9
71.3
43.9
62.6
44.2
71.9
46.4
56.6
38.6
75.9
49.2
70.4
50.2
68.8
47.6
65
40.2
58.1
46.2
56.1
42.6
57.9
44.8
52.3
49.3
49.2
49.2
67.5
47.5
57.3
49.7
51.7
37.7
54.1
38.5
36.5
31.3
35.5
33.6
21.4
18.6
17.7
17.3
20814 61963v
53.9
43.7
60.1
48
53.9
41.6
52.7
42
54.1
42.8
58.3
45
59.5
47.8
56.3
42.8
55.4
42.9
55.7
45.5
58.7
48.3
57.7
47.1
11.4
8.8
64
50.7
58.5
57.2
67.7
58
56.7
46.1
70.3
55.7
68.2
63.5
66.2
55.5
59.4
47.4
57.3
60.6
54.6
55.4
59.2
56.1
51.4
50.3
67.5
57.3
47.5
49.7
56.5
56.5
53.6
48
54.8
44.6
37.2
40.7
37.5
40.9
20.9
20.1
19.3
19.7
18621
38.3
42.4
38.7
37.6
38.8
40.9
42.4
39.7
38.4
41
42
42.7
9.3
44.5
44.3
55.1
41.3
50.7
58.7
51
41.5
51.8
44.9
53.1
43.7
51.7
37.7
53.6
48
43.3
36.8
42.5
18.2
21.7
74637
44.8
49.6
44.9
43.9
45.4
47.4
48.4
47.8
46.3
48.4
49.5
50.7
9.3
53.2
45
51.1
44
53.6
51.5
56.8
49.4
45.6
45.5
52.6
42.2
54.1
38.5
54.8
44.6
43.3
33.4
32
21
17.7
15769
25.5
28.4
25
24.7
25.5
27.3
29.1
26.5
25.9
27.7
28.6
28.9
5.9
30.6
31.5
36.3
29.1
36.1
41.1
37.3
28.7
42
44.7
41.2
39.3
36.5
31.3
37.2
40.7
36.8
33.4
41.4
17.1
24
77114
22.8
25.6
22.7
22.3
23.5
24.7
26.2
24.3
23.4
25.1
26.2
26.3
5.8
28.5
30
35
27.5
33.7
39.6
35.7
26.3
45.5
36.6
42.4
39.3
35.5
33.6
37.5
40.9
42.5
32
41.4
16.5
50
66514
17.4
19.2
17.2
16.4
16.8
18.8
18.4
17.4
17.8
17.7
18.8
18.8
4
20
18.5
19.6
19.1
20.6
20.6
21.5
18.8
19
21.7
20
20.7
21.4
18.6
20.9
20.1
18.2
21
17.1
16.5
77115
11.4
12.8
11.4
11.1
11.8
12.4
13
12.2
11.8
12.7
13.3
13.3
3.9
14.4
15.5
17.8
19.2
16.9
19.7
18.1
13.3
22.6
21.2
21.7
22.4
17.7
17.3
19.3
19.7
21.7
17.7
24
50
10.1
10.1
Descending # of total spectra Id=Y
n=35
13867 50308i 20899i
73.5
72.9
73.5
70.3
72.9
70.3
79
71.7
76.7
73.8
69.4
73.2
69.9
67.1
72.2
71.3
75.9
65.9
74.2
67
70.7
71
69.7
68.9
68.4
66.9
70.8
67.3
68.8
64.4
71.5
65.4
75.9
62.1
63.6
63.5
64.1
70.2
62.2
69.2
63.1
71.7
73.8
66.7
69.5
68.7
61.7
73.6
61.3
70.2
58.4
67.7
61.4
67.9
64
70.3
59.5
68.9
63.2
66.6
60.5
60.3
57.6
52.7
58.7
52.5
52
53.9
48.5
50.7
55.7
47.8
49.8
59.7
50.7
53.2
53.6
48.4
46.4
52.6
43.2
44.5
50.7
42.4
43.9
49.2
41
30.6
36.1
29.1
28.5
33.7
26.2
20
20.6
18.4
14.4
16.9
13
14.2
13.3
14.6
87133
79
71.7
76.7
81.3
70.6
67.6
80.2
73.4
71.3
65.5
82.5
61.1
64
74.6
76
77.4
58.7
69.5
60.1
75.2
60.6
50.3
49.4
48
47
49.6
43.7
42.4
42
28.4
25.6
19.2
12.8
15.8
45682
73.8
69.4
73.2
81.3
65
65.1
73.6
68.7
66.4
63.1
74.6
60.2
61.8
69.7
71.2
70.4
57.3
63.5
58.7
69.5
56.5
49.9
48.1
48.3
46.6
49.5
43.1
42
41.7
28.6
26.2
18.8
13.3
16
66398
69.9
67.1
72.2
70.6
65
65
67.9
66.1
69
61.6
70.2
59.9
61.7
69.4
65.2
69.6
58
81.7
59.4
64.7
53.6
49.9
48.7
47.4
50.8
49.4
43.5
41.5
40.2
28.7
26.3
18.8
13.3
12.9
63103 84940v 870484i 870486i
71.3
74.2
71
68.4
75.9
67
69.7
66.9
65.9
70.7
68.9
70.8
67.6
80.2
73.4
71.3
65.1
73.6
68.7
66.4
65
67.9
66.1
69
65
64.6
62.3
65
69.5
67.5
64.6
69.5
90.9
62.3
67.5
90.9
68.8
61.3
61.5
59.1
60.6
78.4
69.2
72.5
62.6
58.2
58.1
60
71.9
61.1
60.6
59.2
60.1
71.3
66.3
69.4
62.5
75.1
68.1
66
58.1
73.2
66.8
70
70.4
56.2
56.8
54.6
59.5
67.3
64.2
67.1
67.5
58.3
57.7
55.7
60.7
79.4
70
68
56.6
53.8
54
52.8
57.9
46.5
50.1
49.5
56.1
47.9
47.4
45.7
57.3
45
47.1
45.5
58.1
44.1
46.7
47.7
54.1
47.4
50.7
48.4
52.3
42.1
43.5
41.4
51.7
40.9
42.7
41
49.2
39.6
39.7
38.1
36.5
27.3
28.9
27.7
35.5
24.7
26.3
25.1
21.4
18.8
18.8
17.7
17.7
12.4
13.3
12.7
12.1
14.7
13.9
13.7
56365
67.3
68.8
64.4
65.5
63.1
61.6
68.8
61.3
61.5
59.1
59.2
58.1
64.7
57.5
58.9
57.3
64.6
57.7
66.2
56.5
59.4
68.6
56.8
55.5
56.7
56.8
51.4
51
47.6
37.3
35.7
21.5
18.1
12.4
14941 20441v 40816i
71.5
62.1
64.1
65.4
63.6
70.2
75.9
63.5
62.2
82.5
61.1
64
74.6
60.2
61.8
70.2
59.9
61.7
60.6
62.6
71.9
78.4
58.2
61.1
69.2
58.1
60.6
72.5
60
59.2
59.2
58.1
64.7
61.7
57.2
61.7
58.1
57.2
58.1
75.6
59.1
57.2
73.2
55.5
57.7
79.8
59.2
56
52.4
59
74
69.3
55.9
57
53.9
58.5
67.7
73.5
54.9
57.9
55.1
55.5
55.5
47.2
52.5
56.5
44.3
47.9
57.1
43.7
57.2
58
45.5
53.7
57.4
44.8
45
51.1
39.4
45.8
49.7
38.3
44.3
55.1
37.6
44.2
46.4
25.5
31.5
36.3
22.8
30
35
17.4
18.5
19.6
11.4
15.5
17.8
15.8
12.4
11.4
13800
69.2
63.1
71.7
74.6
69.7
69.4
60.1
71.3
66.3
69.4
57.5
75.6
59.1
57.2
67.6
85.4
52.3
69
54.1
68.5
52.3
46.6
43.5
42.8
45.7
45.4
39.3
38.8
37.3
25.5
23.5
16.8
11.8
14.3
53706
73.8
66.7
69.5
76
71.2
65.2
62.5
75.1
68.1
66
58.9
73.2
55.5
57.7
67.6
69.3
54.2
64.6
56.3
71.4
53
45.6
44.5
42.8
43.4
47.8
41
39.7
37.7
26.5
24.3
17.4
12.2
15
86010 29850v 92536i
68.7
61.3
67.7
61.7
70.2
61.4
73.6
58.4
67.9
77.4
58.7
69.5
70.4
57.3
63.5
69.6
58
81.7
58.1
70.4
59.5
73.2
56.2
67.3
66.8
56.8
64.2
70
54.6
67.1
57.3
64.6
57.7
79.8
52.4
69.3
59.2
59
55.9
56
74
57
85.4
52.3
69
69.3
54.2
64.6
50.9
70
50.9
53.1
70
53.1
52.7
68.2
55.4
70.3
52.6
65
52.7
54
50.3
46
59.8
44.9
42.8
62.2
45.2
42
63.5
42.9
44.4
67.1
45.8
43.9
51.5
46.3
37.8
53.4
39.2
37.6
58.7
38.4
36
50.2
36.5
24.7
41.1
25.9
22.3
39.6
23.4
16.4
20.6
17.8
11.1
19.7
11.8
14.7
10.9
12.7
20814
64
70.3
59.5
60.1
58.7
59.4
67.5
58.3
57.7
55.7
66.2
53.9
58.5
67.7
54.1
56.3
52.7
68.2
55.4
53.9
56.7
59.2
54.6
56.5
57.3
54.8
51.4
53.6
47.5
37.2
37.5
20.9
19.3
11.4
22730
68.9
63.2
66.6
75.2
69.5
64.7
60.7
79.4
70
68
56.5
73.5
54.9
57.9
68.5
71.4
70.3
52.6
65
53.9
51
42.9
43.8
41.6
40.8
44.9
39.3
38.7
35.9
25
22.7
17.2
11.4
14.2
20109
60.5
60.3
57.6
60.6
56.5
53.6
56.6
53.8
54
52.8
59.4
55.1
55.5
55.5
52.3
53
52.7
54
50.3
56.7
51
48.6
45
46.1
45.5
44
45.5
41.3
38.6
29.1
27.5
19.1
19.2
14
71263
52.7
58.7
52.5
50.3
49.9
49.9
57.9
46.5
50.1
49.5
68.6
47.2
52.5
56.5
46.6
45.6
46
59.8
44.9
59.2
42.9
48.6
52.5
56.1
61
52.6
49.1
53.1
44.8
41.2
42.4
20
21.7
10.6
47587 61963v 91943i
52
50.7
49.8
53.9
55.7
59.7
48.5
47.8
50.7
49.4
48
47
48.1
48.3
46.6
48.7
47.4
50.8
56.1
57.3
58.1
47.9
45
44.1
47.4
47.1
46.7
45.7
45.5
47.7
56.8
55.5
56.7
44.3
43.7
45.5
47.9
57.2
53.7
57.1
58
57.4
43.5
42.8
45.7
44.5
42.8
43.4
42.8
42
44.4
62.2
63.5
67.1
45.2
42.9
45.8
54.6
56.5
57.3
43.8
41.6
40.8
45
46.1
45.5
52.5
56.1
61
55.4
56.5
55.4
60.6
56.5
60.6
45.5
44.6
45.6
48.5
50.3
50.6
44.9
48
51.8
42.6
49.7
46.2
44.7
40.7
42
36.6
40.9
45.5
21.7
20.1
19
21.2
19.7
22.6
8.4
8.8
9.2
74637
53.2
53.6
48.4
49.6
49.5
49.4
54.1
47.4
50.7
48.4
56.8
44.8
45
51.1
45.4
47.8
43.9
51.5
46.3
54.8
44.9
44
52.6
45.5
44.6
45.6
42.2
43.3
38.5
33.4
32
21
17.7
9.3
65211
46.4
52.6
43.2
43.7
43.1
43.5
52.3
42.1
43.5
41.4
51.4
39.4
45.8
49.7
39.3
41
37.8
53.4
39.2
51.4
39.3
45.5
49.1
48.5
50.3
50.6
42.2
43.7
49.3
39.3
39.3
20.7
22.4
9.2
18621 97219i
44.5
43.9
50.7
49.2
42.4
41
42.4
42
42
41.7
41.5
40.2
51.7
49.2
40.9
39.6
42.7
39.7
41
38.1
51
47.6
38.3
37.6
44.3
44.2
55.1
46.4
38.8
37.3
39.7
37.7
37.6
36
58.7
50.2
38.4
36.5
53.6
47.5
38.7
35.9
41.3
38.6
53.1
44.8
44.9
42.6
48
49.7
51.8
46.2
43.3
38.5
43.7
49.3
37.7
37.7
36.8
31.3
42.5
33.6
18.2
18.6
21.7
17.3
9.3
9
Descending similarity (median % overlap)
15769
30.6
36.1
29.1
28.4
28.6
28.7
36.5
27.3
28.9
27.7
37.3
25.5
31.5
36.3
25.5
26.5
24.7
41.1
25.9
37.2
25
29.1
41.2
44.7
40.7
42
33.4
39.3
36.8
31.3
41.4
17.1
24
5.9
77114
28.5
33.7
26.2
25.6
26.2
26.3
35.5
24.7
26.3
25.1
35.7
22.8
30
35
23.5
24.3
22.3
39.6
23.4
37.5
22.7
27.5
42.4
36.6
40.9
45.5
32
39.3
42.5
33.6
41.4
16.5
50
5.8
66514
20
20.6
18.4
19.2
18.8
18.8
21.4
18.8
18.8
17.7
21.5
17.4
18.5
19.6
16.8
17.4
16.4
20.6
17.8
20.9
17.2
19.1
20
21.7
20.1
19
21
20.7
18.2
18.6
17.1
16.5
10.1
4
77115
14.4
16.9
13
12.8
13.3
13.3
17.7
12.4
13.3
12.7
18.1
11.4
15.5
17.8
11.8
12.2
11.1
19.7
11.8
19.3
11.4
19.2
21.7
21.2
19.7
22.6
17.7
22.4
21.7
17.3
24
50
10.1
3.9
85246
14.2
13.3
14.6
15.8
16
12.9
12.1
14.7
13.9
13.7
12.4
15.8
12.4
11.4
14.3
15
14.7
10.9
12.7
11.4
14.2
14
10.6
8.4
8.8
9.2
9.3
9.2
9.3
9
5.9
5.8
4
3.9
Descending similarity (median % overlap)
13867
50308i
20899i
87133
45682
66398
63103
84940v
870484i
870486i
56365
14941
20441v
40816i
13800
53706
86010
29850v
92536i
20814
22730
20109
71263
47587
61963v
91943i
74637
65211
18621
97219i
15769
77114
66514
77115
85246
Descending # of total spectra Id=Y
Proteome Informatics
Research Group
A B
R F
Subset of Participants Used for Localization Analysis
Proteome Informatics
Research Group
8000
# spectra Id Yes
# spectra Loc Yes
7000
# spectra
6000
5000
4000
3000
2000
1000
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486i
45682
870484i
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
0
35
22
RF
1
0
1
A0
F
1 CM 0 M
8000
# spectra Id Yes
# spectra Loc Yes
7000
5000
4000
3000
2000
1000
18621
61963v
97219i
71263
47587
91943i
56365
50308i
20109
20441v
13867
45682
870486i
92536i
53706
20899i
84940v
13800
86010
22730
87133
0
14941
# spectra
6000
0
1
F
R
M
C
A
Excluded
0% localization
100% localization
FDR - very high?
Replicate submission
Merged spectra
Categorization Errors
Y Loc only when
no possible ambiguity
A B
R F
Wide Range in Willingness to be Certain of Localization
Proteome Informatics
Research Group
4000
# spectra Id Yes Frxn 3
# spectra Loc Yes Frxn 3
# spectra Id Yes Frxn 4
# spectra Loc Yes Frxn 4
# spectra Id Yes Frxn 12
# spectra Loc Yes Frxn 12
3500
# spectra
3000
2500
2000
1500
1000
500
65211
18621
61963v
91943i
97219i
71263
47587
56365
50308i
20109
20441v
13867
45682
870486i
92536i
53706
20899i
84940v
13800
86010
22730
87133
14941
0
Fraction of Confidently Identified Spectra (Id=Y) Marked Fully Localized (Loc=Y)
1.2
Median Fraction of Confident
Spectra Marked Loc=Y
Fraction of Spectra
1.0
0.8
Fr3 = 48%
Fr4 = 67%
Fr12 = 65%
0.6
0.4
0.2
0.0
Fr3
Fr4
Fr12
On Average, How Similar Are the Sets
of Confidently Identified and
Localized Phosphopeptides (Id=Y,
Loc=Y)?
A B
R F
Proteome Informatics
Research Group
Pairwise Comparisons of Phosphopeptides - All Fractions
120
Other
100
overlap
Median %
80
60
40
38% ± 8
20
n=22
overlap
unique_to_participant
unique_to_other
18621
61963v
97219i
71263
47587
91943i
56365
50308i
20109
20441v
13867
45682
870486i
92536i
53706
20899i
84940v
13800
86010
22730
87133
Participant
14941
0
sorted by # spectra Id=Y
A B
R F
Phosphopeptide %Overlap
By num. spectra Id=Y
Proteome Informatics
Research Group
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486i
45682
13867
20441v
20109
50308i
56365
91943i
47587
71263
97219i
61963v
18621
14941 87133 22730 86010 1380084940v 20899i 5370692536i 870486i 45682 1386720441v 2010950308i 5636591943i 47587 7126397219i 61963v 18621
35.8
31.8
28.5
33.7
31
35.2
29.8
31.3
36.6
32.2
33.8
26.6
19.5
33.4
26.8
20.9
27.2
27.3
16.1
24
24
35.8
37.7
33.5
39.4
32.5
35.9
36.7
35.4
38.1
39.5
39.4
34.8
25.2
41.6
28.1
30.8
38.4
38
13.3
36.2
30.5
31.8
37.7
51
58.3
47.8
45.8
56.8
53.1
51.5
50.2
51.5
45.4
23.4
54.4
31.8
27.3
38.5
38.9
8.7
38
30.7
28.5
33.5
51
66.4
33.1
44.1
52.6
51.6
45.5
47.9
40.2
46.6
18.2
39.2
24.7
22.6
30.4
34.9
6.4
31.8
24.7
33.7
39.4
58.3
66.4
41.6
49.1
55.4
58.8
52.7
52.6
47.7
49.2
23.2
49.9
30.9
29.1
37.9
42.5
8.5
37.9
31
31
32.5
47.8
33.1
41.6
40.6
40.5
39.8
44.9
39.1
50.8
33.5
19.3
48.9
45.2
27.4
37.9
35.1
11.1
33.6
29.7
35.2
35.9
45.8
44.1
49.1
40.6
45.9
45.9
48.6
49.4
44.3
40.2
20.3
45.1
31.2
25.5
34
35.6
10.4
31.8
29.1
29.8
36.7
56.8
52.6
55.4
40.5
45.9
50.8
46.2
50.2
48.1
45.9
22.3
48.4
29.9
25.4
36.8
38.1
8
36.6
28.8
31.3
35.4
53.1
51.6
58.8
39.8
45.9
50.8
49
45.7
44.5
44.1
22.4
45
29.8
27.3
36.7
37.9
7.7
34.9
28.7
36.6
38.1
51.5
45.5
52.7
44.9
48.6
46.2
49
45.9
46.9
42.3
23.1
50.1
32
28.5
39.6
39.6
10.7
36.3
30.7
32.2
39.5
50.2
47.9
52.6
39.1
49.4
50.2
45.7
45.9
44.8
42.5
20.4
45.4
29.4
25.4
36.6
37.6
8.9
37.1
32.8
33.8
39.4
51.5
40.2
47.7
50.8
44.3
48.1
44.5
46.9
44.8
41.6
24.8
54.3
36.5
28.9
40.3
41.2
11.3
38.6
31.8
26.6
34.8
45.4
46.6
49.2
33.5
40.2
45.9
44.1
42.3
42.5
41.6
26.3
43.6
28
28.4
38.5
42.6
8.8
47.3
29.6
19.5
25.2
23.4
18.2
23.2
19.3
20.3
22.3
22.4
23.1
20.4
24.8
26.3
28.6
22
25.1
30.9
31.8
14.2
32.6
20
33.4
41.6
54.4
39.2
49.9
48.9
45.1
48.4
45
50.1
45.4
54.3
43.6
28.6
38.6
37.5
46.1
48.4
13.4
44.2
36.8
26.8
28.1
31.8
24.7
30.9
45.2
31.2
29.9
29.8
32
29.4
36.5
28
22
38.6
26
34
38.7
13.6
30.7
26.9
20.9
30.8
27.3
22.6
29.1
27.4
25.5
25.4
27.3
28.5
25.4
28.9
28.4
25.1
37.5
26
39.4
39.8
10.5
36.1
31.1
27.2
38.4
38.5
30.4
37.9
37.9
34
36.8
36.7
39.6
36.6
40.3
38.5
30.9
46.1
34
39.4
46.7
12.7
47.3
35.4
27.3
38
38.9
34.9
42.5
35.1
35.6
38.1
37.9
39.6
37.6
41.2
42.6
31.8
48.4
38.7
39.8
46.7
12.2
45.2
37.5
16.1
13.3
8.7
6.4
8.5
11.1
10.4
8
7.7
10.7
8.9
11.3
8.8
14.2
13.4
13.6
10.5
12.7
12.2
12.3
11.2
24
36.2
38
31.8
37.9
33.6
31.8
36.6
34.9
36.3
37.1
38.6
47.3
32.6
44.2
30.7
36.1
47.3
45.2
12.3
33.4
24
30.5
30.7
24.7
31
29.7
29.1
28.8
28.7
30.7
32.8
31.8
29.6
20
36.8
26.9
31.1
35.4
37.5
11.2
33.4
By median %overlap
Descending # of total spectra Id=Y
22730
13800
50308i
870486i
53706
92536i
20441v
13867
45682
20899i
84940v
71263
47587
86010
61963v
87133
18621
14941
56365
91943i
20109
97219i
22730 1380050308i 870486i 5370692536i 20441v 13867 4568220899i 84940v 71263 47587 8601061963v 87133 18621 14941 5636591943i
2010997219i
58.3
54.4
51.5
56.8
53.1
45.4
51.5
50.2
45.8
47.8
38.9
38.5
51
38
37.7
30.7
31.8
31.8
27.3
23.4
8.7
58.3
49.9
52.7
55.4
58.8
49.2
47.7
52.6
49.1
41.6
42.5
37.9
66.4
37.9
39.4
31
33.7
30.9
29.1
23.2
8.5
54.4
49.9
50.1
48.4
45
43.6
54.3
45.4
45.1
48.9
48.4
46.1
39.2
44.2
41.6
36.8
33.4
38.6
37.5
28.6
13.4
51.5
52.7
50.1
46.2
49
42.3
46.9
45.9
48.6
44.9
39.6
39.6
45.5
36.3
38.1
30.7
36.6
32
28.5
23.1
10.7
56.8
55.4
48.4
46.2
50.8
45.9
48.1
50.2
45.9
40.5
38.1
36.8
52.6
36.6
36.7
28.8
29.8
29.9
25.4
22.3
8
53.1
58.8
45
49
50.8
44.1
44.5
45.7
45.9
39.8
37.9
36.7
51.6
34.9
35.4
28.7
31.3
29.8
27.3
22.4
7.7
45.4
49.2
43.6
42.3
45.9
44.1
41.6
42.5
40.2
33.5
42.6
38.5
46.6
47.3
34.8
29.6
26.6
28
28.4
26.3
8.8
51.5
47.7
54.3
46.9
48.1
44.5
41.6
44.8
44.3
50.8
41.2
40.3
40.2
38.6
39.4
31.8
33.8
36.5
28.9
24.8
11.3
50.2
52.6
45.4
45.9
50.2
45.7
42.5
44.8
49.4
39.1
37.6
36.6
47.9
37.1
39.5
32.8
32.2
29.4
25.4
20.4
8.9
45.8
49.1
45.1
48.6
45.9
45.9
40.2
44.3
49.4
40.6
35.6
34
44.1
31.8
35.9
29.1
35.2
31.2
25.5
20.3
10.4
47.8
41.6
48.9
44.9
40.5
39.8
33.5
50.8
39.1
40.6
35.1
37.9
33.1
33.6
32.5
29.7
31
45.2
27.4
19.3
11.1
38.9
42.5
48.4
39.6
38.1
37.9
42.6
41.2
37.6
35.6
35.1
46.7
34.9
45.2
38
37.5
27.3
38.7
39.8
31.8
12.2
38.5
37.9
46.1
39.6
36.8
36.7
38.5
40.3
36.6
34
37.9
46.7
30.4
47.3
38.4
35.4
27.2
34
39.4
30.9
12.7
51
66.4
39.2
45.5
52.6
51.6
46.6
40.2
47.9
44.1
33.1
34.9
30.4
31.8
33.5
24.7
28.5
24.7
22.6
18.2
6.4
38
37.9
44.2
36.3
36.6
34.9
47.3
38.6
37.1
31.8
33.6
45.2
47.3
31.8
36.2
33.4
24
30.7
36.1
32.6
12.3
37.7
39.4
41.6
38.1
36.7
35.4
34.8
39.4
39.5
35.9
32.5
38
38.4
33.5
36.2
30.5
35.8
28.1
30.8
25.2
13.3
30.7
31
36.8
30.7
28.8
28.7
29.6
31.8
32.8
29.1
29.7
37.5
35.4
24.7
33.4
30.5
24
26.9
31.1
20
11.2
31.8
33.7
33.4
36.6
29.8
31.3
26.6
33.8
32.2
35.2
31
27.3
27.2
28.5
24
35.8
24
26.8
20.9
19.5
16.1
31.8
30.9
38.6
32
29.9
29.8
28
36.5
29.4
31.2
45.2
38.7
34
24.7
30.7
28.1
26.9
26.8
26
22
13.6
27.3
29.1
37.5
28.5
25.4
27.3
28.4
28.9
25.4
25.5
27.4
39.8
39.4
22.6
36.1
30.8
31.1
20.9
26
25.1
10.5
23.4
23.2
28.6
23.1
22.3
22.4
26.3
24.8
20.4
20.3
19.3
31.8
30.9
18.2
32.6
25.2
20
19.5
22
25.1
14.2
8.7
8.5
13.4
10.7
8
7.7
8.8
11.3
8.9
10.4
11.1
12.2
12.7
6.4
12.3
13.3
11.2
16.1
13.6
10.5
14.2
Descending similarity (median % overlap)
n=22
A B
R F
Proteome Informatics
Research Group
n=35
Relatedness of Participants by
Overlap
n=22
A B
R F
Proteome Informatics
Research Group
If Participants Agree on the Identity, Do
They Also Agree Site Localization Can be
Certain?
No possibility of ambiguity
10.0%
Frxn 4
Subset of
472 spectra
for which
20/22 participants
all agree on
Identity
% of spectra
8.0%
6.0%
4.0%
2.0%
0.0%
100%
85%
70%
55%
40%
25%
% participants indicating localization Yes
10%
NPA
A B
R F
What Fraction of the Time Do They
Agree On Localization(s)?
Proteome Informatics
Research Group
8050 spectra with > 2/22 Id Yes (Frxn 3, 4, 12)
# spectra
0
# N loc all partic
no ambiguity
670, 11%
1000 2000 3000 4000 5000 6000 7000
100% partic agree
67-99% partic agree
< 67% partic agree
563, 10%
5918
# Y loc 2-22 partic
#Y loc 1 partic
5918/8050 spectra with > 2/22 Loc Yes
and Site Ambiguity Possible
798
498
836
5918
Y loc
4685, 79%
For all of the participants that agree on identity when
• site ambiguity is possible (#S,T,Y > # phos)
• >2 participants mark Loc=Y
 For 79% (4,685 of 5,918) of the spectra, all participants
who mark Loc=Y unanimously agree on the localization of
the phosphosites
18621
61963v
97219i
71263
47587
91943i
56365
50308i
20109
20441v
13867
45682
870486i
92536i
53706
20899i
25.0%
20.0%
15.0%
10.0%
5.0%
0.0%
15.0%
10.0%
5.0%
0.0%
# Spectra with Loc Agreement 50.1-99.9%
20.0%
Frxn 3: 154
Frxn 4: 498
Frxn 12: 227
The participants who are the most willing
to localize
are more likely to disagree with the
majority view.
18621
61963v
97219i
71263
47587
91943i
56365
50308i
20109
20441v
13867
45682
870486i
92536i
53706
20899i
84940v
13800
86010
22730
87133
14941
% of spectra in minority localization choice
18621
61963v
97219i
71263
47587
91943i
56365
50308i
20109
20441v
13867
45682
870486i
92536i
53706
20899i
84940v
13800
86010
22730
87133
Proteome Informatics
Research Group
84940v
13800
86010
22730
87133
14941
25.0%
14941
% of spectra in minority localization choice
30.0%
% of spectra in minority localization choice
A B
R F
Which Participants are More Likely to
Disagree on Localization?
20.0%
15.0%
10.0%
5.0%
0.0%
A B
R F
Proteome Informatics
Research Group
Quotes From Participants
• “This study has started a dialogue of accurate
phosphate identification and localization.
Perhaps, the results of this study will point out
the inadequacies of current informatics
methods in identification and localization of
phosphates.”
• “Great study choice! I learned a lot about
available software, limitations, etc!”
A B
R F
Proteome Informatics
Research Group
YY:
YN:
NS:
ND:
Resource for Inspecting Peptide Id
Certainty Overlaps - Frxn 4
Y – identification
Y – localization
Y – identification
N – localization
N – identification, but top sequence same as consensus
N – identification, and top sequence different than consensus
2800
2400
2000
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486 i
45682
870484 i
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
# spectra
1600
1400
1200
#DN Diff Id No
#SN Same Id No
#DY Diff Id Yes
#SY Same Id Yes
#Y1P Id Yes single
1000
800
400
0
Frxn 12 – highest precursor charges
#DN Diff Id No
#SN Same Id No
#DY Diff Id Yes
#SY Same Id Yes
#Y1P Id Yes single
1600
1200
800
400
0
# spectra
Proteome Informatics
Research Group
1800
Frxn 3 – most multiple phos per peptide
4000
3000
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486 i
45682
870484 i
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486 i
45682
870484 i
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
# spectra
A B
R F
Room for Improvement in ID Certainty Thresholds
Frxn 4 – most phosphopeptides
#DN Diff Id No
#SN Same Id No
#DY Diff Id Yes
#SY Same Id Yes
#Y1P Id Yes single
2000
600
1000
200
0
A B
R F
Proteome Informatics
Research Group
1.
2.
3.
Preliminary Conclusions
Wide range of spectra marked confidently identified
Wide range of spectra marked confidently localized
Lack of a uniform method for calculating and reporting ambiguity made it hard
to compare results from some participants (13 of 35 were only partially included)
4. Some participants succeeded without localization software but most at least
used some measure of ambiguity
5. Typically, very few identifications were unique to any one participant
6. Unique peptide assignments were roughly 57% identical (n=35)
7. Confidently localized phosphopeptides assignments were roughly 38% identical
(n=22)
8. Participants that performed well often shared the highest similarity with other
participants
9. Participants did not hesitate to mark peptides with ambiguous phosphosite
localizations (57% of identified spectra on average)
10. If all of the participants agree on the identification, phosphosite ambiguity is
possible, and that localization is possible, for 79% of the spectra, participants
unanimously agree on the localization(s)
A B
R F
Proteome Informatics
Research Group
iPRG Membership
•
•
•
•
•
Manor Askenazi - Dana-Farber Cancer Institute
Karl Clauser - Broad Institute of MIT and Harvard
Lennart Martens (incoming chair) - Ghent University, Belgium
W. Hayes McDonald - Vanderbilt University
Paul A Rudnick (outgoing chair) – NIST
•
•
•
•
Karen Meyer-Arendt (outgoing member) - University of Colorado
*Brian C. Searle (outgoing member) - Proteome Software, Inc.
*William S. Lane (outgoing member) - Harvard University
*Jeffrey A Kowalak (EB Liaison) (outgoing) – NIMH
•
•
•
Eric Deutsch (incoming member) – Institute for Systems Biology
Nuno Bandiera (incoming member) – UCSD
Robert Chalkley (incoming member) – UCSF
*Founding member
A B
R F
Proteome Informatics
Research Group
Acknowledgements
• Phillip Mertins, The Broad Institute
– All wet lab work and an analysis
• Steve Gygi, Harvard Medical School
– Test datasets
• Matthew Chambers, Vanderbilt University
Medical Center
– Data format conversions (ProteoWizard)
• Steve Stein and Yuri Mirokhin, NIST
– A K562 phosphopeptide spectral library
• Renee Robinson, Harvard University
– “The Anonymizer”
A B
R F
Proteome Informatics
Research Group
Selected Survey Responses
• Do you think this type of study is useful?
– Yes 33 (100%)
• How difficult do you think this study was?
– Easy 4
– Challenging 17
– Just right 12
• Based on this study, would you consider participating in
future ABRF studies?
– Yes 33 (100%)
• Have you participated in previous ABRF studies?
– No 14 (42%)
– Yes 19 (58%)
A B
R F
Proteome Informatics
Research Group
Survey cont.
• Before this study, how confident were you of your ability to
identify and rank phosphopeptides including assessing
phosphorylation site localization?
• Now, after completing the study, how confident are you of
your ability to identify and rank phosphopeptides including
assessing specific phosphorylation sites?
Download