pmic7812-sup-0001-SupMat

advertisement
Supporting information
Technical Brief
IQuant: an automated pipeline for quantitative proteomics based
upon isobaric tags
Authors
Bo Wen1, Ruo Zhou1, Qiang Feng1,2, Quanhui Wang1,5, Jun Wang1,2,3,4 and Siqi
Liu1,5*
1BGI-Shenzhen,
2Department
of Biology, University of Copenhagen, Copenhagen, Denmark.
3King Abdulaziz
4The
Shenzhen, 518083, China.
University, Jeddah, Saudi Arabia.
Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen
Copenhagen, Denmark.
5Beijing
Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
Correspondence: Dr. Siqi Liu, BGI-Shenzhen, Shenzhen, 518083, China
E-Mail: siqiliu@genomics.cn
Fax: 86-10-80485324
1. Datasets
Data set D1: This data set is a same-same data set derived from the same complex
sample labeled with different reagents and mixed together to evaluate the precision of
quantitative analysis. The complex sample consisted of the soluble protein extract
from wildtype Erwinia carotovora (ECA). This data set was identified 650 proteins
based on IQuant workflow and most of the protein ratios were close to 1. The detailed
generation process is described in reference [1]. This data set can be accessed at the
PRIDE database (http://www.ebi.ac.uk/pride) through the accessions numbers
9266~9283.
Data set D2: This data set is derived from four spiked proteins of known ratios to
evaluate the accuracy of quantitative analysis. The ratio range for the four proteins is
from 0.25 to 4. The detailed generation process is described in reference [1]. This data
set can be accessed at the PRIDE database (http://www.ebi.ac.uk/pride) through the
accessions numbers 10635.
Data set D3: This data set is derived from an experiment which a six aliquots of
tryptic BSA peptides were labeled with the TMT 6-plex reagents (Thermo Scientific)
and mixed subsequently in a 1:1:1:1:1:1 ratio, and finally analyzed by
nano-LC/LTQ-Orbitrap XL. The detailed information is described in reference [2].
Data set D4: This data set is derived from previous reported differential
proteomics experiments [3] to evaluate the utility in finding differential proteins. We
chose the biological replicate 1 and technique replicate 3 data set for this test.
Data set D5 and D6: The two data sets are derived from the Chromosome-Centric
Human Proteome Project (C-HPP) [4, 5] to evaluate the applicability of IQuant for
large-scale differential proteomics experiments. The data set of D5 is derived from the
human breast cancer tissues [4]. The raw data was obtained directly from the
ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) with the
data set identifier PXD000066 and only the HCD spectra was used in this paper. The
data set of D6 is derived from the human colorectal cancer tissue samples [5]. The
raw
data
was
obtained
directly from
the
ProteomeXchange
Consortium
(http://proteomecentral.proteomexchange.org) with the data set identifier PXD000089
and only the HCD spectra of the “CRC_iTRAQ” data set was used in this paper. The
raw MS data files were processed and converted into MGF file format using
Proteowizard 3.04472 (http://proteowizard.sourceforge.net/) [6]. The detailed
generation process for the two data sets are described in reference [4] and [5],
respectively.
2. Protein identification and quantification
Data set D1: The MS/MS spectra were searched by Mascot (v2.3.02, Matrix
Science) and ProteinPilot™ (v4.5, AB Sciex) against the UniProt Erwinia database
(http://www.uniprot.org, Taxon identifier: 218491, entries: 2146, downloaded: March
2013). Mascot parameters were set as follows: two maximum missed cleavage of
trypsin; fixed modifications including Methylthio (C), iTRAQ 4-plex (K) and iTRAQ
4-plex (N-term); variable modifications consisting of Oxidation (M) and iTRAQ
4-plex (Y); 1.0 Da of peptide mass tolerance; 0.5 Da of fragment mass tolerance. The
automatic Mascot decoy database search was performed. The parameters for
ProteinPilot™ Software 4.5 were as follows: Sample Type — iTRAQ 4plex (Peptide
Labeled); Cys Alkylation — MMTS; Digestion — Trypsin; Special Factors — no
selection; Species —no selection; Search Effort — Rapid; FDR Analysis — Yes;
Quantitate — On; Bias Correction — On; Background Correction — Off; ID Focus
—no selection. For ProteinPilot quantification, only proteins at 1% global FDR and
with equal to or more than two peptides were used for further analysis.
For IQuant quantification, false discovery rates were obtained using
MascotPercolator V2.03beta [7, 8] with a q-value equal to or less than 0.01, the
minimal number of peptides for each protein quantitation was set as two and Variance
Stabilization Normalization (VSN) normalization were performed.
For IsobariQ (version 1.3.2) quantification, the dat file was imported with the
preset configurations for iTRAQ 4-Plex. The minimum ion score was set to control
the FDR <=1% with the target-decoy strategy [9]. The minimal number of peptides
for each protein quantitation was set as two and VSN normalization were performed.
Significance threshold was set at 0.05, and only unique peptides were included in
quantitation.
Data set D2: The MS/MS spectra were searched by Mascot (v2.3.02, Matrix
Science) and ProteinPilot™ (v4.5, AB Sciex) against the UniProt Erwinia database
(http://www.uniprot.org, Taxon identifier: 218491, entries: 2146, downloaded: March
2013) with the sequences of the four known proteins BSA, CYT, ENO, and PHOSB
added. Mascot parameters were set as follows: two maximum missed cleavage of
trypsin; fixed modifications including Methylthio (C), iTRAQ 4-plex (K) and iTRAQ
4-plex (N-term); variable modifications consisting of Oxidation (M) and iTRAQ
4-plex (Y); 1.0 Da of peptide mass tolerance; 0.5 Da of fragment mass tolerance. The
automatic Mascot decoy database search was performed. The parameters for
ProteinPilot™ Software 4.5 were as follows: Sample Type — iTRAQ 4plex (Peptide
Labeled); Cys Alkylation — MMTS; Digestion — Trypsin; Special Factors — no
selection; Species —no selection; Search Effort — Rapid; FDR Analysis — Yes;
Quantitate — On; Bias Correction — On; Background Correction — Off; ID Focus
—no selection. For ProteinPilot quantification, only proteins at 1% global FDR and
with equal to or more than two peptides were used for further analysis.
For IQuant quantification, false discovery rates were obtained using
MascotPercolator V2.03beta [7, 8] with a q-value equal to or less than 0.01, the
minimal number of peptides for each protein quantitation was set as two and median
normalization were performed.
For IsobariQ (version 1.3.2) quantification, the dat file was imported with the
preset configurations for iTRAQ 4-Plex. The minimum ion score was set to control
the FDR <=1% with the target-decoy strategy [9]. The minimal number of peptides
for each protein quantitation was set as two and median normalization were
performed. Significance threshold was set at 0.05, and only unique peptides were
included in quantitation.
Data set D3: The MS/MS spectra were searched by Mascot (v2.3.02, Matrix
Science) against the SwissProt database (version 57.15, taxonomy: Mammalia). The
automatic Mascot decoy database search was performed. The other parameters for
Mascot were set as the same as reference [2].
For IQuant quantification, false discovery rates were obtained using
MascotPercolator V2.03beta [7, 8] with a q-value equal to or less than 0.01, the
minimal number of peptides for each protein quantitation was set as two and VSN
normalization were performed.
For IsobariQ (version 1.3.2) quantification, the dat file was imported with the
preset configurations for TMT 6-Plex. The minimum ion score was set to control the
FDR <=1% with the target-decoy strategy [9]. The minimal number of peptides for
each protein quantitation was set as two and VSN normalization were performed.
Significance threshold was set at 0.05, and only unique peptides were included in
quantitation.
Data set D4: The raw MS/MS data were converted into MGF format by Proteome
Discoverer 1.2 (Thermo Fisher Scientific, Waltham, MA, USA) and the parameters
for Mascot search were the same as reference [3]. The parameters for ProteinPilot™
Software 4.5 were as follows: Sample Type — iTRAQ 8plex (Peptide Labeled); Cys
Alkylation — Iodoacetamide; Digestion — Trypsin; Special Factors — no selection;
Species —no selection; Search Effort — Rapid; FDR Analysis — Yes; Quantitate —
On; Bias Correction — On; Background Correction — Off; ID Focus —no selection.
For ProteinPilot quantification, only proteins at 1% global FDR and with equal to or
more than two peptides were used for further analysis.
For IQuant quantification, false discovery rates were obtained
using
MascotPercolator V2.03beta [7, 8] with a q-value equal to or less than 0.01, the
minimal number of peptides for each protein quantitation was set as two and VSN
normalization were performed.
For IsobariQ (version 1.3.2) quantification, the dat file was imported with the
preset configurations for iTRAQ 8-Plex. The minimum ion score was set to control
the FDR <=1% with the target-decoy strategy [9]. The minimal number of peptides
for each protein quantitation was set as two and VSN normalization were performed.
Significance threshold was set at 0.05, and only unique peptides were included in
quantitation.
Data set D5: The MS/MS spectra were searched by Mascot (v2.3.02,
MatrixScience) against the Swiss-Prot database (20130505, human, 20256 sequences).
Mascot parameters were set as follows: two maximum missed cleavage of trypsin;
fixed modifications including Carbamidomethyl (C), iTRAQ4plex (K) and
iTRAQ4plex (N-term); variable modifications consisting of Oxidation (M) and
iTRAQ4plex (Y); 7 ppm of peptide mass tolerance; 0.01 Da of fragment mass
tolerance. The automatic Mascot decoy database search was performed. False
discovery rates were obtained using MascotPercolator V2.02 with a q-value equal to
or less than 0.01. For IQuant quantification, false discovery rates were obtained using
MascotPercolator V2.03beta [7, 8] with a q-value equal to or less than 0.01, the
minimal number of peptides for each protein quantitation was set as two and VSN
normalization were performed.
Data set D6: The MS/MS spectra were searched by Mascot (v2.3.02,
MatrixScience) against the Swiss-Prot database (20130505, human, 20256 sequences).
Mascot parameters were set as follows: one maximum missed cleavage of trypsin;
fixed modifications including Carbamidomethyl (C), iTRAQ4plex (K) and
iTRAQ4plex (N-term); variable modifications consisting of Oxidation (M) and
iTRAQ4plex (Y); 7 ppm of peptide mass tolerance; 0.01 Da of fragment mass
tolerance. The automatic Mascot decoy database search was performed for all data
sets. False discovery rates were obtained using MascotPercolator with a q-value equal
to or less than 0.01. The parameters of IQuant is the same as data set 5.
Figure S1. Demonstration of IQuant's performance on a same-same dataset where
equal amounts of sample were labeled, protein abundance ratios was calculated by
IQuant, IsobariQ and ProteinPilot, and the distribution of relative errors of these ratios
is presented
Figure S2.Comparison of quantitation accuracy of IQuant, IsobariQ and ProteinPilot
using a spiked-in standards data set. The measured protein ratio was compared to the
expected ratio quantified by IQuant, IsobariQ and ProteinPilot.
3. References
[1] Karp, N. A., Huber, W., Sadowski, P. G., Charles, P. D., et al., Addressing accuracy and precision
issues in iTRAQ quantitation. Molecular & cellular proteomics : MCP 2010, 9, 1885-1897.
[2] Arntzen, M. O., Koehler, C. J., Barsnes, H., Berven, F. S., et al., IsobariQ: Software for Isobaric
Quantitative Proteomics using IPTL, iTRAQ, and TMT. Journal of proteome research 2011, 10, 913-920.
[3] Chen, Z., Wen, B., Wang, Q., Tong, W., et al., Quantitative proteomics reveals the
temperature-dependent proteins encoded by a series of cluster genes in Thermoanaerobacter
tengcongensis. Molecular & cellular proteomics : MCP 2013.
[4] Muraoka, S., Kume, H., Adachi, J., Shiromizu, T., et al., In-depth membrane proteomic study of
breast cancer tissues for the generation of a chromosome-based protein list. Journal of proteome
research 2013, 12, 208-213.
[5] Shiromizu, T., Adachi, J., Watanabe, S., Murakami, T., et al., Identification of missing proteins in the
neXtProt database and unregistered phosphopeptides in the PhosphoSitePlus database as part of the
Chromosome-centric Human Proteome Project. Journal of proteome research 2013, 12, 2414-2421.
[6] Kessner, D., Chambers, M., Burke, R., Agus, D., Mallick, P., ProteoWizard: open source software for
rapid proteomics tools development. Bioinformatics 2008, 24, 2534-2536.
[7] Brosch, M., Yu, L., Hubbard, T., Choudhary, J., Accurate and sensitive peptide identification with
Mascot Percolator. Journal of proteome research 2009, 8, 3176-3181.
[8] Kall, L., Canterbury, J. D., Weston, J., Noble, W. S., MacCoss, M. J., Semi-supervised learning for
peptide identification from shotgun proteomics datasets. Nature methods 2007, 4, 923-925.
[9] Elias, J. E., Gygi, S. P., Target-decoy search strategy for increased confidence in large-scale protein
identifications by mass spectrometry. Nature methods 2007, 4, 207-214.
Download