search_enzyme_number = 1 - Springer Static Content Server

advertisement
Supplemental Materials
Table 1 supplemental data:
LTQ Orbitrap Velos file FIG1C_TIIMAC_500UG_AUTO_1.raw downloaded from PRIDE
(http://www.ebi.ac.uk/pride/archive/projects/PXD000892) and converted to mzXML using
ReAdW version 2014.1.1.
Sequence database searched: UniProt proteomes human sequence database (downloaded
10/30/2014) with common contaminants appended (88,855 total sequence entries) or UniProt
proteomes yeast sequence database (downloaded 10/30/2014) with common contaminants
appended (6,792 total sequence entries). URL
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/proteomes/.
Relevant search parameters common to all searches:
decoy_search = 1
num_threads = 0
peptide_mass_tolerance = 20.00
peptide_mass_units = 2
mass_type_parent = 1
mass_type_fragment = 1
isotope_error = 1
search_enzyme_number = 1
num_enzyme_termini = 2
allowed_missed_cleavage = 2
variable_mod01 = 15.9949 M 0 3 -1 0 0
use_B_ions = 1
use_Y_ions = 1
use_NL_ions = 1
use_sparse_matrix = 0
add_C_cysteine = 57.021464
High-res searches:
fragment_bin_tol = 0.02
fragment_bin_offset = 0.0
theoretical_fragment_ions = 0
Low-res searches:
fragment_bin_tol = 1.0005
fragment_bin_offset = 0.4
theoretical_fragment_ions = 1
Human searches adds:
variable_mod02 = 79.966331 STY 0 3 -1 0 0
Figure 3 supplemental data:
LTQ Orbitrap Velos file FIG1C_TIIMAC_500UG_AUTO_1.raw downloaded from PRIDE
(http://www.ebi.ac.uk/pride/archive/projects/PXD000892), converted to mzXML using ReAdW
version 2014.1.1, and then converted to ms2 format using MzXML2Search (TPP v4.8.0). The
ms2 file was duplicated and concatenated together until there were 50,000 spectra in the Comet
search.
Sequence database searched: UniProt proteomes human sequence database (downloaded
10/30/2014) with common contaminants appended (88,855 total sequence entries).
Relevant search parameters common to all searches:
decoy_search = 0
num_threads = 4
peptide_mass_tolerance = 20.00
peptide_mass_units = 2
mass_type_parent = 1
mass_type_fragment = 1
isotope_error = 1
search_enzyme_number = 1
num_enzyme_termini = 2
allowed_missed_cleavage = 2
variable_mod01 = 15.9949 M 0 3 -1 0 0
fragment_bin_tol = 1.0005
fragment_bin_offset = 0.4
theoretical_fragment_ions = 1
use_B_ions = 1
use_Y_ions = 1
use_NL_ions = 1
use_sparse_matrix = 1
add_C_cysteine = 57.021464
Figure 5 supplemental data:
LTQ Orbitrap Velos files PRO_CAD_IT_01.raw and PRO_CAD_IT_02.raw downloaded from
SCOR (http://scor.chem.wisc.edu/data/raw/Frag_meth_Detect_comparison.tar.gz) and
converted to mzXML using ReAdW version 2014.1.1.
Sequence database searched: UniProt proteomes human sequence database (downloaded
10/30/2014) with common contaminants appended (88,855 total sequence entries).
Relevant search parameters common to all searches:
decoy_search = 0
num_threads = 0
peptide_mass_tolerance = 20.00
peptide_mass_units = 2
mass_type_parent = 1
mass_type_fragment = 1
isotope_error = 1
search_enzyme_number = 1
num_enzyme_termini = 2
allowed_missed_cleavage = 2
variable_mod01 = 15.9949 M 0 3 -1 0 0
fragment_bin_tol = 1.0005
theoretical_fragment_ions = 1
use_B_ions = 1
use_Y_ions = 1
use_NL_ions = 1
use_sparse_matrix = 1
add_C_cysteine = 57.021464
Figure 5 supplemental data:
LTQ Orbitrap Velos file PRO_CAD_IT_01.raw downloaded from SCOR
(http://scor.chem.wisc.edu/data/raw/Frag_meth_Detect_comparison.tar.gz) and converted to
mzXML using ReAdW version 2014.1.1.
Sequence database searched: UniProt proteomes human sequence database (downloaded
10/30/2014) with common contaminants appended are concatenated with their reverse decoy
counterparts (177,710 total sequence entries).
Relevant Comet search parameters:
peptide_mass_tolerance = 20.00
peptide_mass_units = 2
mass_type_parent = 1
mass_type_fragment = 1
isotope_error = 1
search_enzyme_number = 1
num_enzyme_termini = 2
allowed_missed_cleavage = 2
variable_mod01 = 15.9949 M 0 3 -1 0 0
fragment_bin_tol = 1.0005
theoretical_fragment_ions = 1
use_B_ions = 1
use_Y_ions = 1
use_NL_ions = 1
use_sparse_matrix = 1
add_C_cysteine = 57.021464
Relevant UW SEQUEST search parameters:
peptide_mass_tolerance = 20.00
peptide_mass_units = 2
mass_type_parent = 1
mass_type_fragment = 1
isotope_error = 1
search_enzyme_number = 1
num_enzyme_termini = 2
allowed_missed_cleavage = 2
diff_search_options: 15.9949 M 0.0 X 0.0 X 0.0 X 0.0 X 0.0 X
fragment_bin_tol = 1.0005
fragment_bin_startoffset = 0.4
ion_series = 0 0 1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
theoretical_fragment_ions = 1
use_sparse_matrix = 1
add_C_cysteine = 57.021464
Relevant Thermo SEQUEST search parameters:
peptide mass tolerance: 20.00 ppm
fragment mass tolerance:
variable modification:
static modification:
fragment ion types:
mass range:
0.8 Da
oxidation M
carbamidomethyl C
b, y
600 to 5000 Da
fragment and precursor mass type:
enzyme:
trypsin, full
missed cleavages:
2
monoisotopic
Sparse Matrix Format: determining the optimum number of bins in a segment
Evaluation of sparse matrix horizontal dimension. Table 1 data searched against the yeast
database. The number of bins per sparse matrix segment is varied from 10 to 500 and the
effects on search speed and memory use are measured. As the number of bins is varied,
there is no significant effect on search times. But memory use does vary with the optimal
minimum memory use at 100 bins. Thus 100 bins per segment are applied in the sparse
matrix.
bin size
memory use (GB)
run time (mm:ss)
10
8
2:28
25
4.5
2:22
50
3.5
2:25
100
3.3
2:20
200
3.7
2:21
300
4.2
2:25
500
5.2
2:21
Decoy database support
Comet’s internally generated decoy sequences, invoked using the “decoy_search” parameter,
are based on the pseudo-reversed strategy that was initially implemented by Sage-N Sorcerer.
This strategy takes each target peptide and reverses all amino acids except for either the Nterminal residue (most enzymes) or C-terminal residue (for AspN digestion), which is kept in
place. For example, the tryptic peptide DLSTYAK would generate a decoy peptide AYTSLDK.
If the protease AspN were applied in a search, the target peptide DVLNHGST would generate a
corresponding decoy peptide DTSGHNLV. The benefit of this decoy strategy is multi-fold:
every target peptide will have a corresponding decoy peptide of the exact same mass so the
target and decoy mass distributions are exactly the same; the number of target and decoy
peptides analyzed is exactly the same; and all decoy peptides maintain an appropriate terminal
residue consistent with the enzyme applied.
Comet itself applies no filtering based on the target or decoy entries. This means every peptide
hit, whether target or decoy, is faithfully reported. The user can choose to run decoy searches
as if the database is concatenated (target and decoy entries are scored against each other in
competition) or as if the target and decoy databases were searched separately (resulting in
separate search results reporting the target hits and the decoy hits for each spectrum query).
Download