PROTEORED Multicentric Study QUANTITATIVE PROTEOMICS METHOD SPECTRAL COUNT 2010 UNIVERSITY OF BARCELONA Salamanca 16th March Objectives: •Test each laboratory abilities to perform quantitative proteomic analysis. •Comparison of methodologies for relative quantitative analysis of proteomes. The study should provide data to assess and compare performance of different methodologies and intra- and inter-lab reproducibility of these. •Evaluation of data reporting and data sharing tools (MIAPE documents, standard formats, public repositories). Samples: •Each participant laboratory will receive two protein mixture samples, labeled A and B, containing each 100 µg of total protein. •100 micrograms of each protein mixture A and B dissolved in 6M Urea /1% CHAPS, at 6 micrograms/microliter concentration. Samples contain: A mixture of around 150 E. Coli proteins (identical in each sample). This mixture has been prepared by fractionation of the cytoplasmatic proteome of E.Coli. It contains soluble proteins, of a wide range of pI and Mw. Four spiked mammalian proteins: •CYC_HORSE (Cytochrome C, Mw 12362), added at the ~ 1 pmol/ 1 mg total protein level. •MYG_HORSE (Apomyoglobin, Mw 16952), at ~ 200 fmol / 1 mg total protein •ALDOA_RABIT (Aldolase, Mw 39212), at ~ 25 fmol / 1 mg total protein •ALBU_BOVIN (Serum albumin, Mw 66430), at ~ 1 fmol / 1 mg total protein These four proteins have been spiked in different amounts in samples A and B, with ratios ranging from 1.5:1 to 5:1 between the two samples. Purpose of the analysis: •The intended purpose of the analysis is to measure the ratios between samples A and B for the four spiked proteins. The “matrix” E. Coli proteins, which should be unchanged, will provide a measure of dispersion for the method used. •The samples can be also used to test methods for absolute quantitation, if desired. •In order to evaluate reproducibility in an homogenous dataset, we ask to perform a minimum of 4 replicate analysis of the samples. (Depending on the method of choice this would demand a maximum of 4 + 4 LC-MS runs). Methods: •Sample complexity has been chosen to allow for the analysis of the mixture on single LC-MS runs. In principle, there is no need for pre-fractionation. A long enough gradient (90-120 min) gradient is suggested, but this of course will strongly depend on the MS instrument available for analysis. •1-2 micrograms of total protein per run should be enough to cover the range of abundances of the spiked proteins in the samples. Again, this will depend a lot on the instrument used, and should be adjusted by each Lab. according to their expertise. •The sample is primarily intended to test non-targeted relative quantitation methodologies. Both label-based methods (ICPL, iTRAQ, TMT, O18,...) and label-free methods (based on spectral counts, Hi3, “LCMS Image analysis”...) can be performed and tested to analyze the samples. Some of them will require 4 + 4 LC-MS runs, while others (i.e. 8-plex iTRAQ) could require a single run to provide comparable measurements of reproducibility. Try to choose the number of replicate analysis in a way that 4 independent measurements of each A:B ratio are obtained, so that comparable statistics can be calculated. •The sample can be also used if desired to test targeted methods, such MRM methods for relative or absolute quantitation. The concentration of the spiked proteins is probably too high to provide a real challenge for those methods, but it can still be useful for test purposes (one can test accuracy, sensitivity on serial dilutions of the sample...) •The amount of sample provided, as well as the concentration of the spiked proteins, should allow also a 2D-DIGE analysis of the samples, although this is not the main purpose of the experiment. Quantitative Proteomic Approaches • Label free – Spectral counting – Ion current based (Extracted ion chromatograms) – Other • Stable isotope labeling – Stable isotope label reagent as ICAT and ITRAQ – Metabolic labeling (SILAC, 15N) – Others Shotgun Proteomics • Digestion of proteins and separation of peptides – Extensive chromatographic separation (one or mutliple dimensional separations, columns,..) • Data acquisition – Data-dependent acquisition (Automated acquisition of MS/MS spectra from as many precursor ions as possible) • Data analysis – Automated interpretation of the MS/MS spectra (DB search) Spectral Counting Summary • Spectral count correlates well with protein abundance • Fold change can be calculated and statistically evaluated • Simple and straightforward implementation • Sensitive to protein abundance changes – for abundant proteins 2 fold change easily detected with high confidence Fu et al, 2006 Limitations • The response to increasing protein amount is saturable • Noisy data at low spectral counts – large difference in spectral count necessary to determine significant change Spectral count reflects relative abundance of a protein (r2 ≥0.99) Issues to address: - Variability of Spectral counts - Sensitivity of Spectral count to protein abundance changes - How to determine relative changes between two samples Variability of Spectral counting LCMSMS analysis of replicate SCX fractions of K562 cell lysates, G-test Old W. et al, MCP 2005 How to determine relative changes between two samples Fold change determination Old W. et al, MCP 2005 • Practical issue – no peptides found in one of the compared samples • Data discontinuity (spectral count – integers) – not amenable to Student t-test • Differences in sampling depth Fold change determination. RSC = log2[(n2 + f)/(n1 + f)] + log2[(t1 - n1 + f)/(t2 - n2 + f)] n1, n2 - spectral counts for sample 1 and 2 t1, t2 – total spectral count (sampling depth) for samples 1 and 2 f – correction factor 1.25 (Beissbarth et al – Bioinformatics 2004) Observed RSC correlates well with expected RSC for standard proteins spiked into complex samples (Old W. et al, MCP 2005) •100 micrograms of each protein mixture A and B are dissolved •in 6M Urea /1% CHAPS, at 6 micrograms/microliter concentration. Samples were kept at -20ºC . Precipitation with TCA/ACETONE Re suspended in 100 uL 0.3 % SDS/50 mM Tris HCl pH 8.0/200 mM DTT 5 uL(5ug) Sample digested with trypsin O/N at 1/100 ratio Separate with nanoHPLC (4 replicas 1uL) MS/MS LTQ VelosOrbitrap Analysis Proteored A1 Proteored A2 Proteored A3 Proteored A4 Spectra analyzed 11.976 12.090 12.567 14.889 Proteored B1 Proteored B2 Proteored B3 Proteored B4 14.444 14.936 15.115 15.852 TOTAL 111.869 SEQUEST PARAMS peptide_mass_tolerance = 0.07 fragment_ion_tolerance = 0.6 diff_search_options = 15.9949 M 0.000 C 0.000 X Item LC-MS run 1 Number of MS/MS spectra acquired 2 A-1 A-2 A-3 A-4 B-1 B-2 B-3 B-4 Total Sample A-B (Combined AB14 runs)** 11976 12090 12567 14889 14444 14936 15115 15852 13984 Number of total assigned peptides id. 1724 1702 1705 2660 2136 2388 2440 2576 2166 3 Number of unique peptides id. 1226 1183 1190 1559 1389 1447 1499 1660 1394 4 Number of E Coli proteins id. (total) 209 213 211 266 223 250 244 266 235 5 Number of E Coli single hit- proteins id 19 31 28 28 18 20 18 30 24 6 Number of Spiked proteins id. 4 3 3 3 4 4 4 4 3.6 7 FDR* 0 0 0 0 0 0 0 0.0038 0.0005 8 Total Number of proteins quantitated 5 9 Number of proteins quantitated > 3 peptides 2 10 Number of proteins quantitated > 2 peptides 3 11 Number of proteins quantitated 1 peptide A/B ratio 12 Average of A/B ratios for E Coli proteins 13 Standard deviation A/B ratios 14 % CV A/B ratios E Coli proteins The Normalized Spectrum Counts bar chart shows a protein's relative abundance across different samples. The y-axis is the normalized count of the spectra matching any of the peptides in the protein. This count depends upon the protein, peptide, required mods and search filters set on the Samples page. Each bar along the x-axis is for a different biological sample. The bars are color coded. Each sample category is colored a different color. The bar chart can be used as a visual confirmation of a differential expression flagged by the Quantitative Analysis in the Samples view. FDR= 0 242 Proteins SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE A1 A2 A3 A4 B1 B2 B3 B4 ALDOA_RABIT 16 18 21 18 10 12 10 10 ALBU_BOVIN 2 1 0 1 5 1 1 1 MYG_HORSE 17 21 15 19 9 14 14 12 CIC_HORSE 13 14 12 21 22 27 27 19 30 25 Av A 18.25 1.00 18.00 15.00 STD A 2.06 0.82 2.58 4.08 Av B 10.50 2.00 12.25 23.75 STD B 1.00 2.00 2.36 3.95 A/B 1.74 0.50 1.47 0.63 B/A 0.58 2.00 0.68 1.58 ALDOA_RABIT MYG_HORSE CIC_HORSE 20 15 10 5 0 ALBU_BOVIN P-value=0.52 ALBUMIN_BOVIN SAMPLE A2 Xcorr 0.88 DeltaCn 0.46 SAMPLE A1 Xcorr 3.16 DeltaCn 0.43 SAMPLE B2 Xcorr 2.9 DeltaCn 0.57 SAMPLE A1 Xcorr 3.23 DeltaCn 0.64 P-value=0.0.00053 ALDOA_RABIT SAMPLE A2 Xcorr 2.12 DeltaCn 0.5 SAMPLE A3 Xcorr 5.33 DeltaCn 0.78 SAMPLE B3 Xcorr 5.35 DeltaCn 0.77 P-value=0.0.018 CYC_HORSE SAMPLE B3 Xcorr 5.46 DeltaCn 0.66 SAMPLE A1 Xcorr 4.78 DeltaCn 0.66 SAMPLE B1 Xcorr 3.87 DeltaCn 0.55 MYG_HORSE y9 100% 1,605.85 AMU, +2 H (Parent Error: 3.7 ppm) P-value=0.000019 V E D A R I I A E V L G H Q G Q G H E A G 80% V L I I D R A E V y10 60% 516.2? y13-NH3+2H+1 y9+1 y10+1 40% 635.3? y13+2H 653.4? y8 y13-H2O-H2O+2H y10+2H y11+2H y6 y8+1 y7+1 a11-H2O-H2O+2H 20% y4 y3 b4 b3 b14 b5 y11 b12 b11 b10 b6y5b7 b13 b9 0% 0 250 500 750 1000 1250 1500 m/z SAMPLE A3 Xcorr 4.66 DeltaCn 0.72 y9 100% 1,852.96 AMU, +2 H (Parent Error: 1.3 ppm) G H K H T A S E H A E Q L A K L P L P A L K Q E S A H E A T H K H y15+2H 80% Relative Intensity Relative Intensity y7 b14-NH3-H2O+2H 60% b7 40% y10 y9+1 b6 20% y4 y11 y13 y7 b4 y3 y15-NH3-NH3+2H b7+1b8-NH3-H2O y5 b5 y8 b8 y6 y12 b12 b13 b11 b9 b10 y14 b14 0% 0 250 500 750 1000 m/z 1250 1500 1750 G ECOLI 1.2 1 0.95 0.8 0.6 ECOLI 0.4 0.30 0.2 0.12 0 AV STD R A FDR= 0 219 A STD R B AV A/B CIC_HORSE MYG_HORSE ALDOA_RABIT Conclusions: •Spectral count can be an easy way to try to perform quantitative proteome analysis, but : •Needs the ability to perform different LC runs with very low dispersion. • The response to increasing protein amount is saturable. • Noisy data at low spectral counts – large difference in spectral count necessary to determine significant change. Proteomic Facility University of Barcelona M José Fidalgo Eva Olmedo Francisco Fernández Josep M Estanyol Oriol Bachs SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE A1 A2 A3 A4 B1 B2 B3 B4 MA ALDOA_RABIT 23 27 26 29 18 17 15 15 26.25 ALBU_BOVIN 2 1 0 1 5 1 1 2 1.00 MYG_HORSE 31 28 31 31 18 17 15 18 30.25 CIC_HORSE 22 18 17 26 28 33 31 27 20.75 DES A 2.50 0.82 1.50 4.11 MB 16.25 2.25 17.00 29.75 DES B 1.50 1.89 1.41 2.75 A/B 1.62 0.44 1.78 0.70 B/A 0.62 2.25 0.56 1.43 35 ALDOA_RABIT 30 MYG_HORSE CIC_HORSE 25 ALBU_BOVIN 20 15 10 5 0 fmol/microgram E. Coli protein ALDOA_RABIT BSA_BOVIN MYG_HORSE CYC_HORSE MW 39212 66430 16952 12362 A 50 1 520 1000 B 25 5 200 1500 B/A 0.50 5.00 0.38 1.50 A/B 2.00 0.20 2.60 0.67