pmic7685-sup-0001-SuppMat

advertisement
Support Information
1
2
Materials and Methods
3
1. Datasets
4
Dataset 1 (UPS1): This dataset is a protein standard derived from a set of 48 human
5
proteins (Sigma, Universal Proteomics Standard UPS1). It was previously used to
6
validate the accuracy of MascotPercolator [1]. The MS/MS spectra (8191 spectra)
7
were
8
ftp://ftp.sanger.ac.uk/pub4/resources/software/mascotpercolator/ .
downloaded
from
9
Dataset 2 (iPS-201B7): This dataset is from a large scale proteome analyses of
10
human induced pluripotent cell (201B7-P32) which was analyzed on an AB Sciex
11
TripleTOF 5600 System. The detailed generation process is described in reference [2].
12
The
13
(http://proteomecentral.proteomexchange.org) via the dataset identifier PXD000071.
raw
data
was
downloaded
from
ProteomeXchange
Consortium
14
Dataset 3 and 4 (Yeast-ETD and Yeast-ETcaD): The yeast dataset was collected
15
from 12 SCX fractions analyzed over a 40 min gradient on a modified hybrid linear
16
ion trap-Orbitrap (Thermo Scientific). The raw data was downloaded from
17
PeptideAtlas [3] (PAe001453) and is described in reference [4]. We used two of the
18
available datasets from PeptideAtlas, ETD (Trial 2, BioRep 2) (Yeast-ETD) and
19
ETcaD (Single Trial) (Yeast-ETcaD).
20
Dataset 5 and 6 (Ecoli-CID and Ecoli-ETcaD): The E.coli dataset is from Wright et
21
al.’s previous MascotPercolator study and was analyzed by LC-MS/MS using a dual
22
pressure linear ion trap Orbitrap instrument capable of both CID and ETcaD
23
fragmentation [5]. The data is available for download from PRIDE (http://
24
http://www.ebi.ac.uk/pride) with accessions: 18990 and 19002.
25
26
2. MS/MS database searching
27
Dataset 1 (UPS1): Peak lists (8190 spectra) from the UPS1 experiments were
28
searched against a bipartite database [6] and a decoy database using Open Mass
1
Spectrometry Search Algorithm (OMSSA 2.1.9) and Mascot 2.3.02 (Matrix Science,
2
London, UK). The bipartite database contained the UniProt sequences for the 48
3
standard proteins, plus common contaminates and entrapment protein sequences
4
which were from H. influenza (UniProtKB/Swiss-Prot, downloaded on 7 July 2013,
5
1709 target sequences) proteome database. The decoy database was generated by
6
reversing the full bipartite target database. Mascot used the following parameters:
7
enzyme = Trypsin; maximum missed cleavages = 3; fixed modifications =
8
Carbamidomethyl (C); variable modifications = Oxidation (M), Deamidated (NQ),
9
Acetyl (Protein N-term); precursor mass tolerance = 20 ppm; fragment mass tolerance
10
= 0.5 Da; instrument = Default; decoy = 0. The majority of OMSSA settings were
11
kept the same as for Mascot: “-to 0.5 –te 20 –te ppm –i 1,4 –mf 3 –mv 1,4,10 –e 0 –nt
12
8 –v 3 –zh 4 –zl 2 –zcc 1 –w –he 1000” (Descriptions about search terms used in
13
OMSSA were explained in supplement Table S2). The resulting PSMs from the
14
bipartite database were filtered and hits to the entrapment proteins used to estimate
15
false positives over a range of OMSSAPercolator q-values.
16
Data set 2 (iPS-201B7): The raw MS data files were processed and converted into
17
MGF file format using Proteowizard 3.0.4472 (http://proteowizard.sourceforge.net/)
18
[7]. The MS/MS spectra were then searched by OMSSA 2.1.9 and Mascot 2.3.02
19
against the human Uniprot database (UniProtKB/Swiss-Prot, downloaded on 7 July
20
2013, 88354 target sequences) concatenated with a decoy database which was
21
generated by reversing the full target database. Mascot used the following parameters:
22
enzyme = Trypsin; maximum missed cleavages = 2; fixed modifications =
23
Carbamidomethyl (C); variable modifications = Oxidation (M), Deamidated (NQ),
24
Acetyl (Protein N-term); precursor mass tolerance = 50 ppm; fragment mass tolerance
25
= 0.1 Da; instrument = Default; decoy = 0; peptide isotope error = 1. The majority of
26
OMSSA settings were kept the same as for Mascot: “-to 0.1 -te 50 –te ppm -i 1,4 -mf
27
3 –mv 1,4,10 -e 0 -nt 8 -v 2 –zcc 1 -zh 4 –zl 2 –cp 1 –tem 4 –ti 1 -w -he 1000”
28
(Descriptions about search terms used in OMSSA were explained in supplement
29
Table S2).
30
Dataset 3 and 4 (Yeast-ETD and Yeast-ETcaD): The raw MS data was processed as
1
described in the MascotPercolator study [5]. The MS/MS spectra were searched by
2
OMSSA 2.1.9 and Mascot 2.3.02, against the translated Saccharomyces Genome
3
Database (SGD [8], http://www.yeastgenome.org/) concatenated with a decoy
4
database, which was generated by reversing the full target database. Mascot used the
5
following parameters: enzyme = Lys-C; maximum missed cleavages = 3; fixed
6
modifications = Carbamidomethyl (C); variable modifications = Oxidation (M),
7
Deamidated (NQ), Acetyl (Protein N-term); precursor mass tolerance = 50 ppm;
8
fragment mass tolerance = 0.5 Da; instrument = ETD-TRAP; peptide isotope error = 1;
9
decoy = 0. The majority of OMSSA settings were kept the same as for Mascot: “-w
10
-he 1000 -to 0.5 -te 50 –te ppm -i 2,4,5 -mv 1,4,10 -mf 3 -e 5 -v 3 -zh 7 -nt 8 -zcc 1
11
-hl 3 -h1 3 -h2 3 -cp 1 -tem 4 -ti 1” (Descriptions about search terms used in OMSSA
12
were explained in supplement Table S2).
13
Dataset 5 and 6 (Ecoli-CID and Ecoli-ETcaD): Peak lists were searched by OMSSA
14
2.1.9 and Mascot 2.3.02 against the same database. Both the peak lists and database
15
were generated as described by Wright et al. [5]. For the CID dataset, Mascot used the
16
following parameters: enzyme = Trypsin; maximum missed cleavages = 3; fixed
17
modifications = Carbamidomethyl (C); variable modifications = Oxidation (M),
18
Deamidated (NQ); precursor mass tolerance = 50 ppm; fragment mass tolerance = 1.5
19
Da; instrument = Default; decoy = 0. The majority of OMSSA settings were kept the
20
same as for Mascot: “-w -he 1000 -to 1.5 -te 50 –te ppm -i 1,4 -mv 1,4 –mf 3 -e 0 -v 3
21
-zh 4 -nt 8 -zcc 1 -cp 1 -hl 3 -h1 3 -h2 3” (Descriptions about search terms used in
22
OMSSA were explained in supplement Table S2). For ETD dataset, Mascot used the
23
following parameters: enzyme = Trypsin; maximum missed cleavages = 3; fixed
24
modifications = Carbamidomethyl (C); variable modifications = Oxidation (M),
25
Deamidated (NQ); precursor mass tolerance = 50 ppm; fragment mass tolerance = 1.5
26
Da; instrument = ETD-TRAP; decoy = 0. The majority of OMSSA settings were kept
27
the same as for Mascot: “-w -he 1000 -to 1.5 -te 50 –te ppm -i 2,4,5 -mv 1,4 –mf 3 -e
28
0 -v 3 -nt 8 -zh 7 -zcc 1 -cp 1 -hl 3 -h1 3 -h2 3” (Descriptions about search terms used
29
in OMSSA were explained in supplement Table S2).
1
The decoy databases were generated by the Perl script decoy.pl, which is provided
2
by Matrix Science (http://www.matrixscience.com/help/decoy_help.html). Mascot
3
Percolator
4
http://www.sanger.ac.uk/resources/software/mascotpercolator/ and the Percolator
5
v2.04 was downloaded from http://per-colator.com/.
v2.02
[1]
was
downloaded
from
6
7
Table S1. Features used in OMSSAPercolator. In total, 28 features are applied in
8
OMSSAPercolator as an input feature vector to Percolator.
Index Features
Description
1
Log10Evalue
negative log10-value of E-value
2
Mass
calculated peptide mass in Da
3
Charge
peptide charge
4-5
DeltaMass,
DeltaMassPPM
Calculated minus observed peptide mass (in Dalton and ppm).
6-7
absDM, absDMppm
Absolute value of calculated minus observed peptide mass (in
Dalton and ppm)
8-9
isoDM, isoDMppm
Calculated minus observed peptide mass, isotope error
corrected (in Dalton and ppm)
10
VarModRatio
The number of sites with variable modifications divided by
the number of sites with potential variable modifications
11
TotalIntensity
total intensity, natural logarithm transformed
12
MatchedIonInt
total intensity of matched ions, natural logarithm transformed
13
relTotMatchedIonInt
the total intensity of all matched ions divided by the total
intensity of the spectrum
14
MaxMatchedIonInt
max intensity of matched fragment ions
15
FragError
mean mass error of matched fragment ions (in Dalton)
16-17
FragDeltaM_Med,
FragDeltaM_MedPPM
median mass error of matched fragment ions (in Dalton and
ppm)
18-19
FragDeltaM_Iqr,
FragDeltaM_IqrPPM
Inter-quartile range of mass errors of matched fragment
ions(in Dalton and ppm)
20
Qmatch
The number of peptide matches for which an ms-ms match
was attempted. (peptide to query, 1:n)
21
Longest
longest matched fragment ion series
22
EnzTryC
C-terminal enzymatic (tryptic) site, boolean
23
EnzTryN
N-terminal enzymatic (tryptic) site, boolean
24
PepLen
length of peptide sequence
25
Log10Pvalue
positive log10-value of P-value
26
fracIonSeries
fraction of calculated ions matched, reported separately for
each ion series
27
relMatchedIonInt
relative ion intensity of each ion series
28
EnzN
the number of enzymatic sites excluding terminal sites
1
2
3
Table S2. Descriptions about search terms used in OMSSA.
Parameter terms
Description
-w
include spectra and search params in search results
-he
the maximum evalue allowed in the hit list
-to
product ion m/z tolerance in Da
-te
precursor ion m/z tolerance in Da (or ppm if -teppm flag set)
-teppm
search precursor masses in units of ppm
-i
id numbers of ions to search (comma delimited, no spaces)
-mf
comma delimited (no spaces) list of id numbers for fixed
modifications
-mv
comma delimited (no spaces) list of id numbers for variable
modifications
-e
id number of enzyme to use
-v
number of missed cleavages allowed
-nt
number of search threads to use
-zh
maximum precursor charge to search when not 1+
-zcc
how should precursor charges be determined? (1=believe the
input file, 2=use a range)
-zl
minimum precursor charge to search when not 1+
-cp
eliminate charge reduced precursors in spectra (0=no, 1=yes)
-hl
maximum number of hits retained per precursor charge state
per spectrum
-h1
number of peaks allowed in single charge window (0 = number
of ion species)
-h2
number of peaks allowed in double charge window (0 =
number of ion species)
-tem
precursor ion search type (0 = mono, 1 = avg, 2 = N15, 3 =
exact, 4 = multiisotope)
-ti
when doing multiisotope search, number of isotopic peaks to
search. 0 = monoisotopic peak only
1
2
A
B
C
D
1
Figure S1. Performance comparison between OMSSAPercolator (OP), OMSSA,
2
Mascot Percolator (MP) and Mascot at different empirical PSM level q-values on (A)
3
UPS1, (B) iPS-201B7, (C) Yeast-ETcaD and (D) Ecoli-ETcaD. The number of target
4
PSMs was plotted against each q-value threshold.
5
A
B
C
D
E
F
1
Figure S2. Performance comparison between OMSSAPercolator (OP), OMSSA,
2
Mascot Percolator (MP) and Mascot at different empirical peptide level q-values on
3
(A) UPS1, (B) iPS-201B7, (C) Yeast-ETcaD, (D) Ecoli-ETcaD, (E) Yeast-ETD and
4
(F) Ecoli-CID. The number of peptides was plotted against each q-value threshold.
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Reference
[1] Brosch, M., Yu, L., Hubbard, T., Choudhary, J., Accurate and sensitive peptide identification
with Mascot Percolator. Journal of proteome research 2009, 8, 3176-3181.
[2] Yamana, R., Iwasaki, M., Wakabayashi, M., Nakagawa, M., et al., Rapid and deep profiling of
human induced pluripotent stem cell proteome by one-shot NanoLC-MS/MS analysis with meter-scale
monolithic silica columns. Journal of proteome research 2013, 12, 214-221.
[3] Desiere, F., Deutsch, E. W., King, N. L., Nesvizhskii, A. I., et al., The PeptideAtlas project.
Nucleic acids research 2006, 34, D655-658.
[4] Swaney, D. L., McAlister, G. C., Coon, J. J., Decision tree-driven tandem mass spectrometry
for shotgun proteomics. Nature methods 2008, 5, 959-964.
[5] Wright, J. C., Collins, M. O., Yu, L., Kall, L., et al., Enhanced peptide identification by
electron transfer dissociation using an improved Mascot Percolator. Molecular & cellular proteomics :
1
2
3
4
5
6
7
8
9
10
MCP 2012, 11, 478-491.
[6] Klimek, J., Eddes, J. S., Hohmann, L., Jackson, J., et al., The standard protein mix database: a
diverse data set to assist in the production of improved Peptide and protein identification software tools.
Journal of proteome research 2008, 7, 96-103.
[7] Kessner, D., Chambers, M., Burke, R., Agus, D., Mallick, P., ProteoWizard: open source
software for rapid proteomics tools development. Bioinformatics 2008, 24, 2534-2536.
[8] Cherry, J. M., Adler, C., Ball, C., Chervitz, S. A., et al., SGD: Saccharomyces Genome
Database. Nucleic acids research 1998, 26, 73-79.
Download