Advancing Statistical Analysis of Multiplexed MS/MS Quantitative Data with Scaffold Q+ Brian C. Searle and Mark Turner Proteome Software Inc. Vancouver Canada, ASMS 2012 Creative Commons Attribution Reference 114 115 116 117 Ref 114 Ref 115 116 117 114 115 116 117 ANOVA 2 1.5 1 0.5 117 (2) 116 (2) 115 (2) 117 116 114 (2) Oberg et al 2008 (doi:10.1021/pr700734f) 115 114 0 “High Quality” Data • Virtually no missing data • Symmetric distribution • High Kurtosis “Normal Quality” Data • High Skew due to truncation • >20% of intensities are missing in this channel! • Either ignore channels with any missing data (0.84 = 41%) … “Normal Quality” Data …Or deal with a very non-Gaussian data! Contents • A Simple, Non-parametric Normalization Model • Refinement 1: Intelligent Intensity Weighting • Refinement 2: Standard Deviation Estimation • Refinement 3: Kernel Density Estimation • Refinement 4: Permutation Testing Simple, Non-parametric Normalization Model Additive Effects on Log Scale log 2 (intensity ) experiment sample peptide error • Experiment: sample handling effects across MS acquisitions (LC and MS variation, calibration etc) • Sample: sample handling effects between channels (pipetting errors, etc) • Peptide: ionization effects • Error: variation due to imprecise measurements Oberg et al 2008 (doi:10.1021/pr700734f) Additive Effects on Log Scale Effect Subtract median for all Experiment intensities in MS/MS Add Back Across median for all intensities entire experiment median of all channels each MS/MS Sample median for each channel Peptide summed intensity median summed for each peptide intensity each protein Median Polish “Non-Parametric ANOVA” Remove Inter-Experiment Effects Remove Intra-Sample Effects Remove Peptide Effects 3x Refinement 1: Intensity Weighting Linear Intensity Weighting Low Intensity, Low Weight High Intensity, High Weight Desired Intensity Weighting Most Data, High Weight Saturated Data, Decreased Weight Low Intensity, Low Weight Variance At Different Intensities Estimate Confidence from Protein Deviation Estimate Confidence from Protein Deviation • Pij = 2 * cumulative t-distribution(tij), where i = raw intensity bin j = each spectrum in bin i x = protein median for spectrum j x ij tij = s • Pi = P ni ij n n 1 Data Dependent Intensity Weighting Most Data, High Weight Low Intensity, Low Weight Saturated Data, Decreased Weight Desired Intensity Weighting Most Data, High Weight Saturated Data, Decreased Weight Low Intensity, Low Weight Data Dependent Intensity Weighting Most Data, High Weight Low Intensity, Low Weight Algorithm Schematic Remove Inter-Experiment Effects Remove Intra-Sample Effects Remove Peptide Effects 3x Data Dependent Intensity Weighting Refinement 2: Standard Deviation Estimation Standard Deviation Estimation Stdev i x ij ni i = intensity bin j = each spectrum in bin i x = protein median for spectrum j Data Dependent Standard Deviation Estimation Data Dependent Standard Deviation Estimation Algorithm Schematic Remove Inter-Experiment Effects Remove Intra-Sample Effects Remove Peptide Effects 3x Data Dependent Intensity Weighting Data Dependent Standard Dev Estimation Refinement 3: Kernel Density Estimation Protein Variance Estimation Protein Variance Estimation Kernels Kernels Pi 1 .0 Stdev i max min n Kernels Kernel Density Estimation Kernel Density Estimation Kernel Density Estimation 0.3 shift on Log2 Scale Deviation that shifts distribution Improved Kernels • We have a better estimate for Pi: the intensity-based weight! • We have a better estimate for Stdevi: the intensity-based standard deviation! Improved Kernels Improved Kernel Density Estimation Improved Kernel Density Estimation Improved Kernel Density Estimation Significant Deviation Worth Investigating Unimportant Deviation Improved Kernel Density Estimation 1.0 shift on Log2 Scale = 2 Fold Change Refinement 4: Permutation Testing Why Use Permutation Testing? • Why go through all this work to just use a t-test or ANOVA? • Ranked-based Mann-Whitney and Kruskal-Wallis tests “work”, but lack power Basic Permutation Test 1.1 1.1 0.8 1.1 1.4 1.0 1.0 0.9 1.2 1.0 0.7 1.0 0.7 0.9 0.9 0.0 0.5 0.3 0.7 1.0 T=4.84 Basic Permutation Test 1.1 1.1 0.8 1.1 1.4 1.0 1.0 0.9 1.2 1.0 0.7 1.0 0.7 0.9 0.9 0.0 0.5 0.3 0.7 1.0 0.5 1.1 1.1 0.0 1.0 0.8 1.0 1.0 1.1 0.3 1.0 0.7 0.7 1.0 0.7 1.4 0.9 0.9 1.2 0.9 T=4.84 T=1.49 Basic Permutation Test 1.1 1.1 0.8 1.1 1.4 1.0 1.0 0.9 1.2 1.0 0.7 1.0 0.7 0.9 0.9 0.0 0.5 0.3 0.7 1.0 0.5 1.1 1.1 0.0 1.0 0.8 1.0 1.0 1.1 0.3 1.0 0.7 0.7 1.0 0.7 1.4 0.9 0.9 1.2 0.9 0.5 0.9 1.0 0.7 0.7 1.1 1.2 1.0 1.1 1.1 1.0 0.9 0.3 1.0 0.8 1.0 0.7 0.0 1.4 0.9 0.5 0.9 1.4 1.0 0.7 1.1 1.1 0.3 1.2 1.0 1.1 0.7 0.8 0.9 1.0 1.0 0.0 1.0 0.7 0.9 T=4.84 T=1.49 T=1.34 T=1.14 x1000 Basic Permutation Test 950 below 50 above 50 1000 p - value 0.05 Improvements… • N is frequently very small • Instead of randomizing N points, randomly select N points from Kernel Densities • Expensive! What if you want more precision? Extrapolating Precision 1000 below 0 above Actual T-Statistic of 6.6? 0 1000 Last Usable Permutation p - value ? Extrapolating Precision Actual T-Statistic of 6.6? Knijnenburg, et al 2011 (doi:10.1186/1471-2105-12-411) Extrapolating Precision Last Usable Permutation Extrapolating Precision p-value = 0.0000018 Last Usable Permutation Conclusions Normalization Remove Inter-Experiment Effects Remove Intra-Sample Effects Remove Peptide Effects 3x Interpretation Data Dependent Intensity Weighting Kernel Density Estimation (Fold Changes) Data Dependent Standard Dev Estimation • All of these ideas work for SILAC/ICAT as well! Permutation Testing (P-Values) Acknowledgements Proteome Software Team –Bryan Head –Jana Lee –Audrey Lester –Susan Ludwigsen –Jimar Millar –De’Mel Mojica –Mark Turner –Nick Vincent-Maloney –Luisa Zini Institute of Molecular Pathology –Karl Mechtler Colorado State University –Jessica Prenni –Karen Dobos Mayo Clinic, MN –Ann Oberg