Supporting information Technical Brief IQuant: an automated pipeline for quantitative proteomics based upon isobaric tags Authors Bo Wen1, Ruo Zhou1, Qiang Feng1,2, Quanhui Wang1,5, Jun Wang1,2,3,4 and Siqi Liu1,5* 1BGI-Shenzhen, 2Department of Biology, University of Copenhagen, Copenhagen, Denmark. 3King Abdulaziz 4The Shenzhen, 518083, China. University, Jeddah, Saudi Arabia. Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen Copenhagen, Denmark. 5Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China. Correspondence: Dr. Siqi Liu, BGI-Shenzhen, Shenzhen, 518083, China E-Mail: siqiliu@genomics.cn Fax: 86-10-80485324 1. Datasets Data set D1: This data set is a same-same data set derived from the same complex sample labeled with different reagents and mixed together to evaluate the precision of quantitative analysis. The complex sample consisted of the soluble protein extract from wildtype Erwinia carotovora (ECA). This data set was identified 650 proteins based on IQuant workflow and most of the protein ratios were close to 1. The detailed generation process is described in reference [1]. This data set can be accessed at the PRIDE database (http://www.ebi.ac.uk/pride) through the accessions numbers 9266~9283. Data set D2: This data set is derived from four spiked proteins of known ratios to evaluate the accuracy of quantitative analysis. The ratio range for the four proteins is from 0.25 to 4. The detailed generation process is described in reference [1]. This data set can be accessed at the PRIDE database (http://www.ebi.ac.uk/pride) through the accessions numbers 10635. Data set D3: This data set is derived from an experiment which a six aliquots of tryptic BSA peptides were labeled with the TMT 6-plex reagents (Thermo Scientific) and mixed subsequently in a 1:1:1:1:1:1 ratio, and finally analyzed by nano-LC/LTQ-Orbitrap XL. The detailed information is described in reference [2]. Data set D4: This data set is derived from previous reported differential proteomics experiments [3] to evaluate the utility in finding differential proteins. We chose the biological replicate 1 and technique replicate 3 data set for this test. Data set D5 and D6: The two data sets are derived from the Chromosome-Centric Human Proteome Project (C-HPP) [4, 5] to evaluate the applicability of IQuant for large-scale differential proteomics experiments. The data set of D5 is derived from the human breast cancer tissues [4]. The raw data was obtained directly from the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) with the data set identifier PXD000066 and only the HCD spectra was used in this paper. The data set of D6 is derived from the human colorectal cancer tissue samples [5]. The raw data was obtained directly from the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) with the data set identifier PXD000089 and only the HCD spectra of the “CRC_iTRAQ” data set was used in this paper. The raw MS data files were processed and converted into MGF file format using Proteowizard 3.04472 (http://proteowizard.sourceforge.net/) [6]. The detailed generation process for the two data sets are described in reference [4] and [5], respectively. 2. Protein identification and quantification Data set D1: The MS/MS spectra were searched by Mascot (v2.3.02, Matrix Science) and ProteinPilot™ (v4.5, AB Sciex) against the UniProt Erwinia database (http://www.uniprot.org, Taxon identifier: 218491, entries: 2146, downloaded: March 2013). Mascot parameters were set as follows: two maximum missed cleavage of trypsin; fixed modifications including Methylthio (C), iTRAQ 4-plex (K) and iTRAQ 4-plex (N-term); variable modifications consisting of Oxidation (M) and iTRAQ 4-plex (Y); 1.0 Da of peptide mass tolerance; 0.5 Da of fragment mass tolerance. The automatic Mascot decoy database search was performed. The parameters for ProteinPilot™ Software 4.5 were as follows: Sample Type — iTRAQ 4plex (Peptide Labeled); Cys Alkylation — MMTS; Digestion — Trypsin; Special Factors — no selection; Species —no selection; Search Effort — Rapid; FDR Analysis — Yes; Quantitate — On; Bias Correction — On; Background Correction — Off; ID Focus —no selection. For ProteinPilot quantification, only proteins at 1% global FDR and with equal to or more than two peptides were used for further analysis. For IQuant quantification, false discovery rates were obtained using MascotPercolator V2.03beta [7, 8] with a q-value equal to or less than 0.01, the minimal number of peptides for each protein quantitation was set as two and Variance Stabilization Normalization (VSN) normalization were performed. For IsobariQ (version 1.3.2) quantification, the dat file was imported with the preset configurations for iTRAQ 4-Plex. The minimum ion score was set to control the FDR <=1% with the target-decoy strategy [9]. The minimal number of peptides for each protein quantitation was set as two and VSN normalization were performed. Significance threshold was set at 0.05, and only unique peptides were included in quantitation. Data set D2: The MS/MS spectra were searched by Mascot (v2.3.02, Matrix Science) and ProteinPilot™ (v4.5, AB Sciex) against the UniProt Erwinia database (http://www.uniprot.org, Taxon identifier: 218491, entries: 2146, downloaded: March 2013) with the sequences of the four known proteins BSA, CYT, ENO, and PHOSB added. Mascot parameters were set as follows: two maximum missed cleavage of trypsin; fixed modifications including Methylthio (C), iTRAQ 4-plex (K) and iTRAQ 4-plex (N-term); variable modifications consisting of Oxidation (M) and iTRAQ 4-plex (Y); 1.0 Da of peptide mass tolerance; 0.5 Da of fragment mass tolerance. The automatic Mascot decoy database search was performed. The parameters for ProteinPilot™ Software 4.5 were as follows: Sample Type — iTRAQ 4plex (Peptide Labeled); Cys Alkylation — MMTS; Digestion — Trypsin; Special Factors — no selection; Species —no selection; Search Effort — Rapid; FDR Analysis — Yes; Quantitate — On; Bias Correction — On; Background Correction — Off; ID Focus —no selection. For ProteinPilot quantification, only proteins at 1% global FDR and with equal to or more than two peptides were used for further analysis. For IQuant quantification, false discovery rates were obtained using MascotPercolator V2.03beta [7, 8] with a q-value equal to or less than 0.01, the minimal number of peptides for each protein quantitation was set as two and median normalization were performed. For IsobariQ (version 1.3.2) quantification, the dat file was imported with the preset configurations for iTRAQ 4-Plex. The minimum ion score was set to control the FDR <=1% with the target-decoy strategy [9]. The minimal number of peptides for each protein quantitation was set as two and median normalization were performed. Significance threshold was set at 0.05, and only unique peptides were included in quantitation. Data set D3: The MS/MS spectra were searched by Mascot (v2.3.02, Matrix Science) against the SwissProt database (version 57.15, taxonomy: Mammalia). The automatic Mascot decoy database search was performed. The other parameters for Mascot were set as the same as reference [2]. For IQuant quantification, false discovery rates were obtained using MascotPercolator V2.03beta [7, 8] with a q-value equal to or less than 0.01, the minimal number of peptides for each protein quantitation was set as two and VSN normalization were performed. For IsobariQ (version 1.3.2) quantification, the dat file was imported with the preset configurations for TMT 6-Plex. The minimum ion score was set to control the FDR <=1% with the target-decoy strategy [9]. The minimal number of peptides for each protein quantitation was set as two and VSN normalization were performed. Significance threshold was set at 0.05, and only unique peptides were included in quantitation. Data set D4: The raw MS/MS data were converted into MGF format by Proteome Discoverer 1.2 (Thermo Fisher Scientific, Waltham, MA, USA) and the parameters for Mascot search were the same as reference [3]. The parameters for ProteinPilot™ Software 4.5 were as follows: Sample Type — iTRAQ 8plex (Peptide Labeled); Cys Alkylation — Iodoacetamide; Digestion — Trypsin; Special Factors — no selection; Species —no selection; Search Effort — Rapid; FDR Analysis — Yes; Quantitate — On; Bias Correction — On; Background Correction — Off; ID Focus —no selection. For ProteinPilot quantification, only proteins at 1% global FDR and with equal to or more than two peptides were used for further analysis. For IQuant quantification, false discovery rates were obtained using MascotPercolator V2.03beta [7, 8] with a q-value equal to or less than 0.01, the minimal number of peptides for each protein quantitation was set as two and VSN normalization were performed. For IsobariQ (version 1.3.2) quantification, the dat file was imported with the preset configurations for iTRAQ 8-Plex. The minimum ion score was set to control the FDR <=1% with the target-decoy strategy [9]. The minimal number of peptides for each protein quantitation was set as two and VSN normalization were performed. Significance threshold was set at 0.05, and only unique peptides were included in quantitation. Data set D5: The MS/MS spectra were searched by Mascot (v2.3.02, MatrixScience) against the Swiss-Prot database (20130505, human, 20256 sequences). Mascot parameters were set as follows: two maximum missed cleavage of trypsin; fixed modifications including Carbamidomethyl (C), iTRAQ4plex (K) and iTRAQ4plex (N-term); variable modifications consisting of Oxidation (M) and iTRAQ4plex (Y); 7 ppm of peptide mass tolerance; 0.01 Da of fragment mass tolerance. The automatic Mascot decoy database search was performed. False discovery rates were obtained using MascotPercolator V2.02 with a q-value equal to or less than 0.01. For IQuant quantification, false discovery rates were obtained using MascotPercolator V2.03beta [7, 8] with a q-value equal to or less than 0.01, the minimal number of peptides for each protein quantitation was set as two and VSN normalization were performed. Data set D6: The MS/MS spectra were searched by Mascot (v2.3.02, MatrixScience) against the Swiss-Prot database (20130505, human, 20256 sequences). Mascot parameters were set as follows: one maximum missed cleavage of trypsin; fixed modifications including Carbamidomethyl (C), iTRAQ4plex (K) and iTRAQ4plex (N-term); variable modifications consisting of Oxidation (M) and iTRAQ4plex (Y); 7 ppm of peptide mass tolerance; 0.01 Da of fragment mass tolerance. The automatic Mascot decoy database search was performed for all data sets. False discovery rates were obtained using MascotPercolator with a q-value equal to or less than 0.01. The parameters of IQuant is the same as data set 5. Figure S1. Demonstration of IQuant's performance on a same-same dataset where equal amounts of sample were labeled, protein abundance ratios was calculated by IQuant, IsobariQ and ProteinPilot, and the distribution of relative errors of these ratios is presented Figure S2.Comparison of quantitation accuracy of IQuant, IsobariQ and ProteinPilot using a spiked-in standards data set. The measured protein ratio was compared to the expected ratio quantified by IQuant, IsobariQ and ProteinPilot. 3. References [1] Karp, N. A., Huber, W., Sadowski, P. G., Charles, P. D., et al., Addressing accuracy and precision issues in iTRAQ quantitation. Molecular & cellular proteomics : MCP 2010, 9, 1885-1897. [2] Arntzen, M. O., Koehler, C. J., Barsnes, H., Berven, F. S., et al., IsobariQ: Software for Isobaric Quantitative Proteomics using IPTL, iTRAQ, and TMT. Journal of proteome research 2011, 10, 913-920. [3] Chen, Z., Wen, B., Wang, Q., Tong, W., et al., Quantitative proteomics reveals the temperature-dependent proteins encoded by a series of cluster genes in Thermoanaerobacter tengcongensis. Molecular & cellular proteomics : MCP 2013. [4] Muraoka, S., Kume, H., Adachi, J., Shiromizu, T., et al., In-depth membrane proteomic study of breast cancer tissues for the generation of a chromosome-based protein list. Journal of proteome research 2013, 12, 208-213. [5] Shiromizu, T., Adachi, J., Watanabe, S., Murakami, T., et al., Identification of missing proteins in the neXtProt database and unregistered phosphopeptides in the PhosphoSitePlus database as part of the Chromosome-centric Human Proteome Project. Journal of proteome research 2013, 12, 2414-2421. [6] Kessner, D., Chambers, M., Burke, R., Agus, D., Mallick, P., ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 2008, 24, 2534-2536. [7] Brosch, M., Yu, L., Hubbard, T., Choudhary, J., Accurate and sensitive peptide identification with Mascot Percolator. Journal of proteome research 2009, 8, 3176-3181. [8] Kall, L., Canterbury, J. D., Weston, J., Noble, W. S., MacCoss, M. J., Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nature methods 2007, 4, 923-925. [9] Elias, J. E., Gygi, S. P., Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nature methods 2007, 4, 207-214.