pmic7100-sup-0001-s1

advertisement
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Index for Supplementary Files :
Supplement_S1toS12.doc
S1 : Additional descriptive statistics from salivary dataset.
S2 : Representative spectra from MaxQuant processed and ReAdW4Mascot2 processed
ProteinPilot searches from small salivary dataset.
S3 : Summary of fractions in salivary dataset.
S4 : Workflow for analyzing the human salivary dataset and results from ProteinPilot
analysis.
S5: Effect of Proteominer treatment and mass accuracy on predicted modifications on
identified peptides.
S6 : Scaffold results for normalized spectral counts.
S7 : Gene Ontology (GO) analysis of the whole salivary proteome.
S8 : Descriptive statistics for IPRG Phosphoproteome dataset.
Identification at protein level, peptide level and spectral level for phosphoproteome dataset.
Mass Accuracy plots, Cumulative Mass Accuracy plots and Distribution of peptide scores for
Phosphoproteome dataset.
S9 : Descriptive statistics for Rat SILAC dataset.
Identification at protein level, peptide level and spectral level for Rat SILAC dataset. Mass
Accuracy plots, Cumulative Mass Accuracy plots and Distribution of peptide scores for Rat
SILAC dataset.
S10 : Tranche hyperlinks for the data.
S11. Materials and Methods
S12: Protocol for converting .RAW files from LTQ/Orbitrap to High mass accuracy .MGF
files for ProteinPilot search.
1
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplement Text S1 to S11 :
S1. Additional descriptive statistics from salivary dataset.
A) Summary of dataset: Mascot generic format (.MGF) peaklists were generated from .RAW
files by using ReAdW4Mascot2 [Ref S1] (See 1A ‘Dataset Key’ column). In an alternative
workflow, the "Quant" module from MaxQuant was used to generate .MSM files - that were
further converted to .MGF format (See 1B in ‘Dataset Key’ column). The .MGF files thus
generated from data conversion tools were searched using ProteinPilot v 4.0 against Human
(Datasets 1). Dataset was generated using an LTQ/Orbitrap mass spectrometer [Ref S2].
a
Sample # of raw
MS
Datase
Preparatio
files acquisition Number of t Key
Description
n
mode
spectra
Dataset 1
2D
20
1A
b
fractionate
Human
d and
1B
whole ProteoMin
saliva er treated.
Centroid 88,308
Dataset
#
a All searches were conducted using ProteinPilot.
b Subset of data from Bandhakavi et al 2009 [Ref S2].
e MaxQuant “Quant” processed peaklists reflect high mass accuracy.
2
Peaklist generation e
ReadW4Mascot2
MaxQuant "Quant"
module
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
S1 B) Identification at peptide level and spectral level for salivary dataset.
The small salivary dataset (Dataset 1) was processed with ReadW or MaxQuant and then
searched with sub ppm instrument settings using ProteinPilot. Identifications were at 5 % local
FDR threshold at distinct peptide level (a) and spectral level (b).
a
b
3
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
S1 C) Cumulative Mass Accuracy plots from ProteinPilot searches of MaxQuant processed
and ReAdW processed peaklist.
Cumulative distribution of percent of precursors identified by ProteinPilot has been plotted
against precursor Delta ppm. Spectra identified from ProteinPilot searches using MaxQuant
processed peaklist are represented with a dark line. Spectra identified from ProteinPilot searches
using ReAdW processed peaklist are represented with a grey line.
a
4
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplement S2. Representative spectra from MaxQuant processed and ReAdW4Mascot2
processed ProteinPilot searches from the small salivary dataset (Dataset 1). Spectral tab
from ProteinPilot was used to represent fragmentation evidence for MaxQuant processed and
unprocessed spectra. In the following pages, the top half of each page shows spectrum generated
from ReADW and bottom half shows spectrum generated from MaxQuant processing. Text
panel also shows spectrum number (from the .group of the dataset), Theoretical m/z value (in
Da), Precursor m/z value (in Da), Charge state, Delta mass (in Da), Best peptide sequence and its
annotation, modification (if any), Peptide Conf and Sc values and the Protein Rank in
ProteinPilot .group file.
5
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Fig S2a
6
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Fig S2b
7
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Fig S2c
8
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Fig S2d
9
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Fig S2e
10
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Fig S2f
11
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Fig S2g
12
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Fig S2h
---------------------------------------------------------------------------------------------------------------------
13
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplement S3 : Summary of whole salivary dataset.
A) Datasets were generated using an LTQ/Orbitrap mass spectrometer. MSM peaklists were
generated from .RAW files by using the "Quant" module from MaxQuant - that were further
converted to .MGF format. The .MGF files were searched using ProteinPilot v 4.0 against
Human database. Note that Dataset 1 from Fig 1B is a subset of this dataset.
Dataset a
#
# of
MS
Number
Sample
raw acquisition
of
Description Preparation files
mode
spectra
Dataset b
Native
200
sample, 2D
fractionated,
3D
fractionated
Human
or
whole ProteoMiner
saliva
treated
Centroid 988,974
Peaklist
generationc
MaxQuant
"Quant"
module.
a All searches were conducted using ProteinPilot.
b Data from Bandhakavi et al 2009 [Ref S2] along with additional fractions.
c MaxQuant “Quant” processed peaklists reflect high mass accuracy.
B) Summary of whole salivary dataset fractions.
Sample fractionation
ProteoMiner treatment Number of fractions
2Da
No
20
75271
2Da
Yes (Library-1)d
20
87469
3Db
No
41
250553
3Db
Yes (Library-1)d
57
224079
2Dc
Yes (Library-2)e
42
235072
3Db
Yes (Library-2)e
20
116530
a Salt fractionated
14
MS/MS spectra
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
b IEF separated and salt-fractionated
c IEF separated
d ProteoMiner ™ Library-1
e ProteoMiner ™ Library -2
15
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
---------------------------------------------------------------------------------------------------------------------
Supplement S4.
A)Workflow for analyzing the human salivary dataset
Peaklists in MSM format from LTQ/Orbitrap human salivary dataset were generated from 200
.RAW files by using “Quant" module from MaxQuant.
In workflow that used ProteinPilot, MSM format peaklists were further converted to .mgf format
and were searched using ProteinPilot v 4.0 against Human IPI database. PDST tool was used to
analyze outputs from representative fraction searches to compare effect of ProteoMiner treatment
on predicted modifications.
In a subsequent workflow that used MaxQuant, MSM files were searched against Human IPI
database using Mascot v2.2. Further, Mascot search .dat files were used to generate
proteingroups.txt file using MaxQuant “Identify” module. This proteingroups.txt output was
used to parse out information about Gene Ontology categories. Representative fraction searches
were used to compare effect of ProteoMiner treatment on protein abundance.
16
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
S4 B) Human whole salivary dataset.
Summary of ProteinPilot and MaxQuant results for large (988,974 MS/MS spectra) salivary
dataset (Dataset 2 in Table S3 A)
Search
MS/MS
Workflow Identified
ProteinPilot
82980
MaxQuant
55337
Peptide
Sequences
Identified
Total proteins
identified a
13162
2224
10757
2131
a. Proteins identified at 1% FDR with at least 1 peptide at 1% FDR
ProteinPilot Results for whole salivary dataset :
988,974 spectra from a MaxQuant .MSM peaklist input file were searched using ProteinPilot.
Identification statistics from the FDR Single Table Summary showed 82,980 spectral matches at
1% global FDR and 2224 protein identifications at 1% global FDR (S4B). Supporting FDR
reports contain plots of error rates at all thresholds for both global and local error rate
calculations and ROC plots, which show absolute numbers of correct versus incorrect answers
(S4C). A robust and comprehensive list of proteins (Supplement S18) was generated after
analysis of whole saliva by combining MaxQuant’s ability to accurately process acquired peaks,
and ProteinPilot’s ability to search multiple modifications and perform robust protein reporting.
MaxQuant results for whole salivary dataset : The whole salivary dataset was processed with
the MaxQuant workflow (with Mascot) (Supplement S4A) and results were compared to
ProteinPilot results. The results were processed by MaxQuant’s “Identify” module, which
generated a list of proteins (Proteingroups.txt; Supplement S19) at 1% protein and peptide FDR
thresholds. From a total of 988,974 spectra, 55,337 spectra were matched at 1% global FDR and
2131 proteins were identified at 1% global FDR with a minimum of one peptide at 1% global
FDR (Table S4B). The average ppm error for the dataset was 0.56 with an SD of 0.86 ppm. From
the 2131 proteins inferred from the MaxQuant workflow, 1956 (91.8%) were also inferred from
the ProteinPilot workflow.
Proteins were grouped into cellular component, biological processes and molecular function
categories upon Gene Ontology (GO) analysis. MaxQuant grouped proteins by molecular weight
(Supplement S7).
--------------------------------------------------------------------------------------------------------------------17
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
S4 C) . Single Table Summary of the FDR analysis output from ProteinPilot search.
--------------------------------------------------------------------------------------------------------------------18
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplement S4 D) . Identification statistics of the FDR analysis output from ProteinPilot search.
Protein (a), distinct peptide (b) and spectral level (c).
ProteinPilot generates a post-search FDR analysis output using ‘Proteomics System Performance
Evaluation Pipeline’ (PSPEP). FDR results are tabulated and plotted at spectral, distinct peptide
and protein level. The PSPEP method uses a non-linear fitting method to calculate a local or
instantaneous level FDR that measures the error rate of the last protein in the list of proteins as
opposed to global FDR, which estimates the error rate for an entire protein list.
19
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
a
20
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
b
21
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
C
22
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplement S5. Effect of Proteominer treatment on spectral utilization and predicted
modifications on identified peptides.
The whole saliva was treated with hexapeptide libraries (ProteoMiner ™), which resulted in
reduction in the relative amounts of highly abundant proteins, while increasing the relative
amounts of low-abundance proteins. Although the mechanism of action is still under debate,
sample treatment with hexapeptide beads has resulted in detection of a much larger number of
proteins from various biological samples, as compared to non-treated samples. In order to
analyze the effect of DRC on relative ranking of the most frequently predicted modifications,
ProteinPilot’s ability to predict multiple peptide modifications was used in conjunction with the
post-search PDST tool.
The effect of ProteoMiner treatment on enrichment of modified peptides was evaluated using
ProteinPilot. Results from 20 SCX HPLC fractions (untreated) were compared against 20 SCX
HPLC fractions of ProteoMiner treated and salt-fractionated fractions.
Table S5 A) ProteinPilot results from twenty fractions of either ProteoMiner treated or untreated whole
salivary were compared. MaxQuant processed and ReAdW processed peaklists were used to compare the
effect of high mass accuracy on PTM identification. The ProteinPilot results outputs were used for
subsequent ProteinPilot Descriptive Statistics Template (PDST) analysis.
Number of MS/MS spectra
Spectral level 5% local FDR
Protein level 5% local FDR
Percent of spectra with specified
protein and peptide confidence
% Modified peptides
MaxQuant processed
Untreate ProteoMin
d
er treated
75271
87469
5869
10557
267
716
6.80%
37.30%
10.80%
37.50%
ReAdW processed
Untreate ProteoMin
d
er treated
75271
87469
5389
10249
218
562
6.00%
38.60%
9.90%
37.10%
An increase in spectral utilization in ProteoMiner treated sample was observed (Bandhakavi et
al 2009) when compared to the untreated sample (Table S5A). The percent of modified peptides
23
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
(37.5%) from treated sample was similar to untreated sample (37.3%). The type and ranking of
the predicted modifications in the untreated sample was assessed with PDST output and
compared to results from the treated sample. PDST generated a list of twenty most frequent
modifications (predicted amino acid features) in the dataset (Table S5B).
Table S5B) Peptides identified at 5% local FDR were compared for their most frequent single features
using PDST.
MaxQuant processed
ReAdW processed
Exact
Delta
Untreated
Rank
ProteoMiner
treated
Rank
Untreated
Rank
ProteoMiner
treated
Rank
Methyl(E)
14.0157
386
1
896
2
316
2
883
2
Oxidation(M)
15.9949
384
2
1136
1
362
1
1054
1
Feature
Deamidated(N)
Gln->pyroGlu@N-term
0.984
239
3
175
6
171
3
134
7
-17.027
136
4
153
7
120
4
145
6
Methyl(K)
14.0157
100
5
315
3
98
5
291
3
Methyl(D)
14.0157
94
6
218
5
85
6
226
4
Methyl(R)
14.0157
89
7
226
4
74
7
181
5
Oxidation(W)
15.9949
78
8
67
8
Dioxidation(C)
31.9898
67
9
20
19
66
9
24
14
Cys->Dha(C)
Protein Terminal
Acetyl@N-term
-33.988
58
10
20
18
59
10
23
15
42.0106
29
14
91
8
18
16
60
9
Dehydrated(D)
-18.011
26
15
87
9
28
13
72
8
Delta:H(4)C(2)(K)
28.0313
17
19
52
10
15
18
51
10
Cation:Na(E)
21.9819
10
25
Pro->pyro-Glu(P)
13.9793
11
24
Cation:Na(D)
21.9819
12
22
Methyl(H)
14.0157
Dehydrated(T)
Glu->pyroGlu@N-term
-18.011
17
-18.011
Formyl(K)
9
25
20
20
12
23
26
13
18
13
23
13
21
11
24
13
22
28
14
14
20
27.9949
14
21
15
19
15
22
Oxidation(Y)
15.9949
26
16
35
13
18
17
43
11
Ammonia-loss(N)
-17.027
24
17
14
22
19
15
Dioxidation(W)
31.9898
42
12
24
14
14
23
Deamidated(Q)
0.984
51
11
40
11
36
11
23
16
26.0157
36
13
13
24
36
12
15
21
Delta:H(2)C(2)(H)
24
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
The modification list was ranked by the number of features in the untreated sample. Glutamate
methylation and methionine oxidation were top-ranked in both untreated and treated samples.
Comparison between the datasets showed that the intermediate and lower-ranked modifications
changed starting at the 8th rank modification. The relative ranking of most predicted
modifications changed after ProteoMiner treatment. This observation is noteworthy, even for the
study of native salivary sample. This makes a case for using both treated and untreated samples
to identify more PTMs.
Effect of mass accuracy on PTM identification :
MaxQuant processed and ReAdW processed peaklists were used to compare the effect of high
mass accuracy on PTM identification. MaxQuant processed peaklist yields more peptide
identifications and so it is not surprising to see that the number of modifications increases
proportionately to spectral identifications. In other words, the average number of PTM
identifications increases as much as Improvement in proteins and spectra improvement due to
high mass accuracy. (11-12%). There are a few exceptions to this observation such as
Deamidation, Gln -> Pyro-Glu@N-term, Oxidation(W), Dioxidation(C), Cys->Dha(C) . The
possible reasons for this observation will have to be investigated further.
25
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplement S6
Scaffold results for normalized spectral counts.
In order to compare the relative abundance of proteins in untreated samples to ProteoMiner
treated sample, we used the Mascot generated .dat files from the MaxQuant workflow in
Scaffold analysis. Scaffold was used to normalize the spectral counts in these samples.
For this analysis, we used results from all fractions from untreated sample (61 fractions) and
compared them against all fractions of ProteoMiner treated sample (75 fractions). The
normalized spectral counts were used to rank proteins relatively in the samples. The
corresponding normalized counts in ProteoMiner treated sample showed proteins that are
enriched after treatment (Figure a). Alpha-actinin-1, Nucleobindin-2, Carbonic anhydrase VI and
stratifin are some of the proteins that are enriched after ProteoMiner treatment (Figure a). The
complete list of proteins and their normalized spectral quantitative values can be found in
Supplement S23. As a measure of depletion of abundant proteins due to ProteoMiner treatment,
out of the 25 most abundant proteins in untreated samples, only 4 proteins are observed to be in
the 25 most abundant proteins list in ProteoMiner treated sample (Supplement S23). In other
words, 21 out of 25 most abundant proteins from untreated sample are depleted due to
ProteoMiner treatment.
When normalized spectral counts were used to relatively rank the proteins in ProteoMiner
treated sample, then the corresponding normalized counts in untreated sample show proteins that
are depleted after treatment (Figure b). Proteins such as amylase alpha 1A, Immunoglobulin
kappa constant, Mucin-5B, Lipocalin-1, Zinc-alpha-2-glycoprotein and cystatin A are depleted
after ProteoMiner treatment (Figure b). As a measure of enrichment of low-abundance proteins
due to ProteoMiner treatment, out of 353 least abundant proteins (Normalized spectral count =1)
in untreated samples, only 17 proteins are observed to have a lower ranking in ProteoMiner
26
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
treated sample (Supplement Sheet S23). In other words, 336 out of 353 least abundant proteins
from untreated sample are enriched due to ProteoMiner treatment.
a
b
Scaffold results for normalized spectral counts for ProteoMiner Treated and untreated
fractions.
a. Protein identifications from Untreated dataset were ranked according to their abundance
(Spectral counts). The corresponding normalized counts in ProteoMiner treated sample show
proteins (representative peaks denoted by gene symbols) that are enriched after treatment. The
list of proteins with their corresponding Quantitative values is available in Supplementary
section SY. b. Protein identifications from Proteominer-treated dataset were ranked according to
their abundance (Spectral counts). The corresponding normalized counts in untreated sample
show proteins (representative peaks denoted by gene symbols) that are greatly reduced after
treatment. The list of proteins with their corresponding Quantitative values is available in
Supplement S23.
--------------------------------------------------------------------------------------------------------------------27
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplement S7. Gene Ontology (GO) analysis of the salivary proteome. The salivary proteome
was analyzed for their Gene Ontology (GO) categories - cellular component (a), biological
processes (b) and molecular function (c) and molecular weight (d). Proteingroups.txt output from
MaxQuant search was used for this analysis.
The whole saliva dataset was also analyzed using MaxQuant. MaxQuant’s output, in form of
proteingroups.txt file, when searched with human IPI database, can be used for parsing
information about the biological content of identified proteins. The columns in the text file
contain gene ontology terms (biological processes, molecular functions and cellular component);
biological pathways (KEGG) and PTMs (associated localization scores).
Biological Processes
a
Cellular Components
b
28
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Molecular functions
c
Molecular Weights
d
29
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplement S8. Descriptive statistics for IPRG Phosphoproteome dataset.
A) Summary of dataset: Mascot generic format (.MGF) peaklists were generated from .RAW
files by using ProteoWizard [Ref S3] (See 2A in ‘Dataset Key’ column). In an alternative
workflow, the "Quant" module from MaxQuant was used to generate .MSM files - that were
further converted to .MGF format (See 2B in ‘Dataset Key’ column). The .MGF files thus
generated from data conversion tools were searched using ProteinPilot v 4.0 against Human IPI
databases. Dataset were generated using an LTQ/Orbitrap mass spectrometer [Ref S4].
Dataset a #
Description
Dataset 2
Sample
Preparation
c
# of raw
MS
Dataset
files
acquisition Number
Key
mode
of spectra
3
2A
Peaklist generation
e
ProteoWizard
Human
phosphopr
oteome IMAC enriched
2B
Profile
28,448
MaxQuant "Quant"
module
a All searches were conducted using ProteinPilot.
c ABRF 2010 iPRG study
e MaxQuant “Quant” processed peaklists reflect high mass accuracy.
S8 B) Identification at protein level, peptide level and spectral level for phosphoproteome
dataset. Phosphoproteome dataset (Dataset 2) was processed with ProteoWizard or MaxQuant;
and then searched with subppm instrument settings in ProteinPilot. Identifications were at 5 %
local FDR threshold at protein level (a), distinct peptide level (b) and spectral level (c).
a
30
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
b
c
Supplement S8 C) Mass Accuracy plots from ProteinPilot searches of MaxQuant
processed and ProteoWizard processed peaklists. The distribution of the frequency of spectra
identified by ProteinPilot has been plotted against precursor Delta ppm. Spectra identified from
ProteinPilot searches using MaxQuant processed peaklist are represented with a dark line.
Spectra identified from ProteinPilot searches using ProteoWizard processed peaklist for
Phosphoproteome dataset (dataset 2)) are represented with a grey line.
31
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
a
Supplement S8 D) Cumulative Mass Accuracy plots from ProteinPilot searches of
MaxQuant processed and ProteoWizard processed peaklists.
Cumulative distribution of percent of precursors identified by ProteinPilot has been plotted
against precursor Delta ppm. Spectra identified from ProteinPilot searches using MaxQuant
processed peaklist are represented with a dark line. Spectra identified from ProteinPilot searches
using ProteoWizard processed peaklist (for b. Phosphoproteome dataset) are represented with a
grey line.
a
Supplement S8 E) Distribution of peptide scores of confident identifications from
ProteinPilot search. The distribution of the frequency of spectra identified by ProteinPilot at 5%
local FDR has been plotted against Peptide Score (Sc). Spectra identified from ProteinPilot
searches using MaxQuant processed peaklist are represented with a dark line and spectra
identified from ProteinPilot searches using ProteoWizard processed peaklist from dataset 2 (for
a. Phosphoproteome dataset) are represented with a grey line.
32
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
33
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplement S9. Descriptive statistics for Rat SILAC dataset.
A) Summary of dataset: Mascot generic format (.MGF) peaklists were generated from .RAW
files by using ReAdW4Mascot2 [Ref S1] (See 3A in ‘Dataset Key’ column). In an alternative
workflow, the "Quant" module from MaxQuant was used to generate .MSM files - that were
further converted to .MGF format (See 3B in ‘Dataset Key’ column). The .MGF files thus
generated from data conversion tools were searched using ProteinPilot v 4.0 against Rat IPI
database (Dataset 3). Dataset was generated using an LTQ/Orbitrap mass spectrometer).
Dataset a #
Description
Sample
Preparation
Dataset 3
# of raw
MS
Dataset
files
acquisition Number
Key
mode
of spectra
9
3A
Peaklist generation
e
ReAdWMascot2
3B
Rat L6 cell SILAC (K+8, R
line
+10)
Profile
52,164
MaxQuant "Quant"
module.
a All searches were conducted using ProteinPilot.
e MaxQuant “Quant” processed peaklists reflect high mass accuracy.
Supplement S9 B) : Identification at protein level, peptide level and spectral level for Rat
SILAC dataset. The Rat SILAC dataset (Dataset 3) was processed with ReadW or MaxQuant
and then searched with sub ppm instrument settings using ProteinPilot. Identifications were at 5
% local FDR threshold at protein level (a), distinct peptide level (b) and spectral level (c).
a
34
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
b
Supplement S9 C). Mass Accuracy plots from ProteinPilot searches of MaxQuant
processed and ReAdW processed peaklists. The distribution of the frequency of spectra
identified by ProteinPilot has been plotted against precursor Delta ppm. Spectra identified from
ProteinPilot searches using MaxQuant processed peaklist are represented with a dark line.
Spectra identified from ProteinPilot searches using ReAdW processed peaklist from Dataset 3 (
b. SILAC dataset) are represented with a grey line.
35
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplement S9 D). Cumulative Mass Accuracy plots from ProteinPilot searches of
MaxQuant processed and ProteoWizard / ReAdW processed peaklists.
Cumulative distribution of percent of precursors identified by ProteinPilot has been plotted
against precursor Delta ppm. Spectra identified from ProteinPilot searches using MaxQuant
processed peaklist are represented with a dark line. Spectra identified from ProteinPilot searches
using ReAdW processed peaklist (for SILAC dataset) or are represented with a grey line.
c
Supplement S9 E). Distribution of peptide scores of confident identifications from
ProteinPilot search. The distribution of the frequency of spectra identified by ProteinPilot at 5%
local FDR has been plotted against Peptide Score (Sc). Spectra identified from ProteinPilot
searches using MaxQuant processed peaklist are represented with a dark line and spectra
identified from ProteinPilot searches using ReAdW processed peaklist from dataset 3 (for
SILAC dataset) are represented with a grey line.
36
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
37
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplement S10. Tranche hyperlinks for the data presented in Jagtap et al.,:
The individual components of the data associated with this manuscript can be
downloaded from ProteomeCommons.org Tranche by using the hyperlinks or using associated
hash code and the passphrase mentioned below.
Passphrase to access data: maxquantproteinpilot
The hash may be used to prove exactly what files were published as part of this manuscript's data
set, and the hash may also be used to check that the data has not changed since publication.
PEAKLISTS :
FmlX6MPmEeKG152QgcCVU/spcWHiTiR+sJahjzKYUj0XUXXJkhInFqin0pdvgLHlhwQPtyh
7+kjQwUHSlRV3fhq4VyEAAAAAAABcIg==
a) Salivary dataset – peaklist
(Peak list for salivary proteome fractions in MGF format)
b) Phosphoproteome dataset – peaklist
(Peak list for phosphoproteome fractions in MGF format)
c) Rat SILAC dataset – peaklist
(Peak list for RAT SILAC fractions in MGF format)
PROTEINPILOT SEARCH RESULTS :
pTyZ5Eo2DSH0U3h3Teq3U95txifLGStPmPsTUe8esa0I3AhN7mXRxV9DQuAFEbSxnudrxOJ
14R83Nwkc6EoY8vgG99gAAAAAAAAljg==
ProteinPilot Search results and False Positive Rate Analysis
(ProteinPilot .group files, PDST outputs, FDR Analysis Results including results for large
salivary dataset)
MAXQUANT SEARCH RESULTS :
fJhRymxnWQwYdjKtEAtiDDSWdWCL0wtdyAlKRgBz/gthCWvh8PE67OgEcX1Dv29K+CSr
YOToj2T7eJbwEUwHIqKOMHoAAAAAAAALcw==
MaxQuant Search results for large salivary dataset. (MaxQuant files)
MASCOT-SCAFFOLD-WORKFLOW RESULTS :
q2SqBKZd9usRNx3Z4KhMOVMYcD1nq1cAAMmYCGZt14pJ8eVoKCzZpNLEqluKeJPKim
wgkNYhwEtXqHdbl9RpnwgBABsAAAAAAAAG3g==
Mascot-Scaffold-Workflow for measuring relative protein abundance.
(Mascot search results (.dat files) and Scaffold results (.sf3 file))
38
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplement S11. MATERIALS AND METHODS
Sample datasets
Three LTQ Orbitrap datasets were processed and analyzed. The effect of MaxQuant processing
on protein and peptide identifications and dataset specific metrics were analyzed with Datasets 1,
2, and 3. Dataset 1, a subset of large whole salivary dataset, was a set of twenty .RAW files
generated from 3D-separated human whole saliva [Ref S2]. Dataset 2 was the iPRG 2010 data,
generated from a phosphopeptide-enriched sample [Ref S4]. Dataset 3 was generated from a
SILAC-labeled rat nuclear cell culture preparation. The datasets varied in MS acquisition mode
(centroid for Dataset 1, profile for Dataset 2 and 3) and peaklist generation software
(ReAdW4Mascot2 for Datasets 1A and 3A and ProteoWizard for Dataset 2A; MaxQuant Quant
module for Datasets 1B, 2B and 3B).
The whole saliva dataset was generated as described in Bandhakavi et al 2009. Briefly, whole
saliva sample was processed as ‘untreated saliva’ or treated with hexapeptide libraries
(ProteoMiner™; Bio-Rad Laboratories) for protein dynamic range compression (DRC). Protein
samples were trypsinized and fractionated by preparative IEF (OFFGEL) based on their
isoelectric points (pH 3−10). Peptide fractions were analyzed directly by C18 RP-LC-MS (2D
LC-MS fractionation) or fractionated by SCX prior to C18 LC-MS (3D LC-MS fractionation).
For sample treatment and fractionation scheme see Supplement S1.
Preprocessing of datasets and Protein and Peptide Detection
Orbitrap datasets were searched with ProteinPilot v 4.0 (ProteinPilot Software 4.0.8085;
Revision: 148085; Paragon Algorithm: 4.0.0.0. 1458083. The .RAW files were converted to
.MSM files with MaxQuant’s (v 1.0.13.13) "Quant" module. The .MSM files are .MGF files
with high precursor mass accuracy and limited product ion ‘noise’ peaks. After converting the
file extension from .MSM to .MGF format, the files were searched using Paragon. ProteinPilot
39
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
search parameters common to all datasets (1 – 3) were: Instrument: LTQ/Orbitrap subppm;
Digestion: Trypsin; ID Focus: Biological Modifications; Search effort: Thorough; Protein
identification threshold: 10% Conf. Datasets 1 , 2 and 4 were searched against the Human IPI
(hIPI) database (v3.52, Nov 2008) plus contaminant proteins (from MaxQuant installation) with
148372 forward plus reversed sequences. Cys alkylation was defined as none (Dataset 1 and 4)
and Iodoacetamide (Dataset 2 and 3); Sample Type was set to Identification for datasets 1, 2 and
4. For Dataset 2, Special factors setting was Phosphorylation emphasis.
Dataset 3 was searched against Rat IPI v3.52 database (November 2008; 80336 forward plus
reversed sequences); Sample Type was defined as SILAC (K+8, R+10) and Cys alkylation was
set to Iodoacetamide.
For comparisons of results as a function of input file (i.e., MaxQuant “Quant” module vs.
alternate peaklist generation methods), .MGF files were created from .RAW files with
ReAdW4Mascot2 [Ref S1] (for Dataset 1 and 3) or with ProteoWizard [Ref S3] (for Dataset 2)
and analyzed with ProteinPilot. The searches parameters were identical to the previous
parameters (described above for Datasets 1, 2 and 3).
Dataset 4 was searched against the "target-decoy" version of hIPI v3.52 with ProteinPilot v 4.0
and MaxQuant v1.0.13.13. For the ProteinPilot search, .RAW files were converted to .MSM files
with MaxQuant’s “Quant” module as described above. Additional ProteinPilot parameters were:
Sample Type: Identification; Cys alkylation: None. The Mascot search parameters for the
MaxQuant search were: Fragment tolerance: 0.50 Da (Monoisotopic); Precursor tolerance:
Adjusted individually using Quant; Variable Modifications: Met Oxidation; Digestion Enzyme:
Trypsin; Maximum Missed Cleavages: 2. In the “Identify” module, the protein identification
threshold setting was 1% FDR and the peptide threshold was one peptide minimum with 1 %
FDR.
40
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Post-search data analysis
For each ProteinPilot search, Protein and Peptide Summaries were exported and FDR reports
were generated. The FDR reports included the numbers of proteins identified, distinct peptides
matched and spectra matched at local and global FDR thresholds. The FDR reports also
generated Numeric ROC plots, Estimated FDRs and non-linear fitting curves at spectral, peptide
and protein levels. The Protein Descriptive Statistics Template (PDST, v3.61), an Excel-based
tool developed by AB SCIEX, was used for post-processing. The PDST tool generates extensive,
dataset-specific metrics from ProteinPilot output (peptide and protein exports and FDR). From
PDST, the effect of dynamic range compression on predicted modifications and spectral
utilization was determined for whle salivary dataset. The effects of dynamic range compression
on protein abundance for Dataset 4 was estimated after analysis of Mascot results for untreated
and treated samples in Scaffold Q+ v3.0.
41
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplemental S12: Protocol for converting .RAW files from LTQ/Orbitrap to High mass
accuracy .MGF files for ProteinPilot search.
RAW data conversion using MaxQuant :
Requirements :



XCalibur v2.1
MaxQuant v1.0.13.13
32-bit Windows PC.
1) Install MaxQuant v1.013.13 onto your 32-bit Windows machine.
The latest MaxQuant version uses Andromeda as a search engine and does not produce the
desired .MSM files. Please contact MaxQuant google group
(http://groups.google.com/group/maxquant-list) for the earlier version (v1.013.13) of MaxQuant.
2) Once installed please visit http://mediamill.cla.umn.edu/mediamill/display/61837 for a webinar on
the use of MaxQuant and requirement for setting this up. Ensure that the conf folder has been
replaced with appropriate file from Mascot ‘config’ folder.
3) Transfer your .RAW files into an accessible folder on the Windows machine (C:\ drive or an
external drive connected to the Windows machine)
42
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
4) Once transferred click on Quant.exe to start MaxQuant’s “Quant” module.
5) In the “Quant” module, click on “Select Files” to select RAW files from your folder.
43
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
6) Alternatively, you can also click on “Select Folder” for a folder with RAW files.
7) Once the RAW files are loaded, click on Parameters Tab. Also adjust the number of threads
to maximum available on your PC.
44
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
8) In the Parameters Tab, set the SILAC type to singlets for non-quantitative, identification
only study. Add or remove variable and fixed modifications from the list, according to
sample preparation used for the dataset. Choose correct database and enzyme for search.
9) Click on Raw files and press “Start” for data processing.
10) The bottom tab shows the status of data processing. Wait until the tab shows “Done”. This
typically takes 30 minutes per RAW file.
45
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
46
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
11) Once the Quant module displays the message “Done”, go to the RAW files folder and look
for .MSM files in individual folders for each RAW file.
12) For example, for processed folder for A1, one would sort the file type and select .MSM files
with extensions (sil0 and peaks) and copy them in a separate folder. The same procedure is
repeated for all files.
47
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
13) The .msm extension in the MSM files folder is changed to .mgf
14) These files now can be used for ProteinPilot search. Note that you can search the files with
“Orbi/FT MS (sub-ppm), LTQ MS/MS” Instrument settings – because of the peaklist’s high
precursor mass accuracy.
For webinar modules on use of ProteinPilot please visit : http://tinyurl.com/proteinpilotin
48
Jagtap et al. Optimal and robust analysis of high-mass-accuracy Orbitrap datasets.
Supplementary References :
S1: http://chemdata.nist.gov/mass-spc/ftp/download/peptide_library/software/current_releases/ReAdw4Mascot2
S2: Bandhakavi S, Stone MD, Onsongo G, Van Riper SK et al. (2009) A dynamic range
compression and three-dimensional peptide fractionation analysis platform expands proteome
coverage and the diagnostic potential of whole saliva. J Proteome Res. 8(12): 5590-5600.
S3: Kessner D, Chambers M, Burke R, Agus D et al. (2008) ProteoWizard: open source software
for rapid proteomics tools development. Bioinformatics. 24(21): 2534-2536.
S4: Rudnick P, Askenazi M, Clauser K, Lane W et al. (2010) ABRF iPRG2010 Study:
Informatic Evaluation of Phosphopeptide Identification and Phosphosite Localization Results
from Multiple Proteomics Laboratories. Proceedings of the 58th ASMS Conference on Mass
Spectrometry and Allied Topics, Salt Lake City, UT.
49
Download