Advancing Statistical Analysis of Multiplexed MS/MS Quantitative

advertisement
Advancing Statistical Analysis
of Multiplexed MS/MS
Quantitative Data with
Scaffold Q+
Brian C. Searle and Mark Turner
Proteome Software Inc.
Vancouver Canada, ASMS 2012
Creative Commons Attribution
Reference
114
115
116
117
Ref
114
Ref
115
116
117
114
115
116
117
ANOVA
2
1.5
1
0.5
117 (2)
116 (2)
115 (2)
117
116
114 (2)
Oberg et al 2008 (doi:10.1021/pr700734f)
115
114
0
“High Quality” Data
• Virtually no
missing data
• Symmetric
distribution
• High Kurtosis
“Normal Quality” Data
• High Skew due
to truncation
• >20% of intensities
are missing in this
channel!
• Either ignore
channels with any
missing data
(0.84 = 41%) …
“Normal Quality” Data
…Or deal with a very
non-Gaussian data!
Contents
• A Simple, Non-parametric Normalization Model
• Refinement 1: Intelligent Intensity Weighting
• Refinement 2: Standard Deviation Estimation
• Refinement 3: Kernel Density Estimation
• Refinement 4: Permutation Testing
Simple, Non-parametric
Normalization Model
Additive Effects on Log Scale
log 2 (intensity )  experiment
 sample  peptide  error
• Experiment: sample handling effects across MS
acquisitions (LC and MS variation, calibration etc)
• Sample: sample handling effects between channels
(pipetting errors, etc)
• Peptide: ionization effects
• Error: variation due to imprecise measurements
Oberg et al 2008 (doi:10.1021/pr700734f)
Additive Effects on Log Scale
Effect
Subtract
median for all
Experiment intensities in
MS/MS
Add Back
Across
median for all
intensities
entire
experiment
median of all
channels
each MS/MS
Sample
median for each
channel
Peptide
summed intensity median summed
for each peptide
intensity
each protein
Median Polish
“Non-Parametric ANOVA”
Remove
Inter-Experiment
Effects
Remove
Intra-Sample
Effects
Remove
Peptide
Effects
3x
Refinement 1:
Intensity Weighting
Linear Intensity Weighting
Low Intensity,
Low Weight
High Intensity,
High Weight
Desired Intensity Weighting
Most Data,
High Weight
Saturated Data,
Decreased Weight
Low Intensity,
Low Weight
Variance At Different Intensities
Estimate Confidence from
Protein Deviation
Estimate Confidence from
Protein Deviation
• Pij = 2 * cumulative t-distribution(tij), where
i = raw intensity bin
j = each spectrum in bin i
x = protein median for spectrum j
x   ij
tij =
s

• Pi =

P
ni
ij
n
n 1
Data Dependent
Intensity Weighting
Most Data,
High Weight
Low Intensity,
Low Weight
Saturated Data,
Decreased Weight
Desired Intensity Weighting
Most Data,
High Weight
Saturated Data,
Decreased Weight
Low Intensity,
Low Weight
Data Dependent
Intensity Weighting
Most Data,
High Weight
Low Intensity,
Low Weight
Algorithm Schematic
Remove
Inter-Experiment
Effects
Remove
Intra-Sample
Effects
Remove
Peptide
Effects
3x
Data Dependent
Intensity
Weighting
Refinement 2:
Standard Deviation Estimation
Standard Deviation Estimation
Stdev


i

x
ij
ni
i = intensity bin
j = each spectrum in bin i
x = protein median for spectrum j
Data Dependent
Standard Deviation Estimation
Data Dependent
Standard Deviation Estimation
Algorithm Schematic
Remove
Inter-Experiment
Effects
Remove
Intra-Sample
Effects
Remove
Peptide
Effects
3x
Data Dependent
Intensity
Weighting
Data Dependent
Standard Dev
Estimation
Refinement 3:
Kernel Density Estimation
Protein Variance Estimation
Protein Variance Estimation
Kernels
Kernels
Pi  1 .0
Stdev


i

max  min
n
Kernels
Kernel Density Estimation
Kernel Density Estimation
Kernel Density Estimation
0.3 shift on
Log2 Scale
Deviation that
shifts
distribution
Improved Kernels
• We have a better estimate for Pi:
the intensity-based weight!
• We have a better estimate for Stdevi:
the intensity-based standard deviation!
Improved Kernels
Improved Kernel Density
Estimation
Improved Kernel Density
Estimation
Improved Kernel Density
Estimation
Significant
Deviation Worth
Investigating
Unimportant
Deviation
Improved Kernel Density
Estimation
1.0 shift on Log2 Scale
= 2 Fold Change
Refinement 4:
Permutation Testing
Why Use Permutation Testing?
• Why go through all this work to just use
a t-test or ANOVA?
• Ranked-based Mann-Whitney and
Kruskal-Wallis tests “work”, but lack
power
Basic Permutation Test
1.1
1.1
0.8
1.1
1.4
1.0
1.0
0.9
1.2
1.0
0.7
1.0
0.7
0.9
0.9
0.0
0.5
0.3
0.7
1.0
T=4.84
Basic Permutation Test
1.1
1.1
0.8
1.1
1.4
1.0
1.0
0.9
1.2
1.0
0.7
1.0
0.7
0.9
0.9
0.0
0.5
0.3
0.7
1.0
0.5
1.1
1.1
0.0
1.0
0.8
1.0
1.0
1.1
0.3
1.0
0.7
0.7
1.0
0.7
1.4
0.9
0.9
1.2
0.9
T=4.84
T=1.49
Basic Permutation Test
1.1
1.1
0.8
1.1
1.4
1.0
1.0
0.9
1.2
1.0
0.7
1.0
0.7
0.9
0.9
0.0
0.5
0.3
0.7
1.0
0.5
1.1
1.1
0.0
1.0
0.8
1.0
1.0
1.1
0.3
1.0
0.7
0.7
1.0
0.7
1.4
0.9
0.9
1.2
0.9
0.5
0.9
1.0
0.7
0.7
1.1
1.2
1.0
1.1
1.1
1.0
0.9
0.3
1.0
0.8
1.0
0.7
0.0
1.4
0.9
0.5
0.9
1.4
1.0
0.7
1.1
1.1
0.3
1.2
1.0
1.1
0.7
0.8
0.9
1.0
1.0
0.0
1.0
0.7
0.9
T=4.84
T=1.49
T=1.34
T=1.14
x1000
Basic Permutation Test
950 below 50 above
50
1000

 p - value 0.05
Improvements…
• N is frequently very small
• Instead of randomizing N points, randomly
select N points from Kernel Densities
• Expensive! What if you want more
precision?
Extrapolating Precision
1000 below 0 above
Actual T-Statistic of 6.6?
0
1000
Last
Usable
Permutation

 p - value ?
Extrapolating Precision
Actual T-Statistic of 6.6?
Knijnenburg, et al 2011 (doi:10.1186/1471-2105-12-411)
Extrapolating Precision
Last
Usable
Permutation
Extrapolating Precision
p-value =
0.0000018
Last
Usable
Permutation
Conclusions
Normalization
Remove
Inter-Experiment
Effects
Remove
Intra-Sample
Effects
Remove
Peptide
Effects
3x
Interpretation
Data Dependent
Intensity
Weighting
Kernel Density
Estimation
(Fold Changes)
Data Dependent
Standard Dev
Estimation
• All of these ideas work for SILAC/ICAT as well!
Permutation
Testing
(P-Values)
Acknowledgements
Proteome Software Team
–Bryan Head
–Jana Lee
–Audrey Lester
–Susan Ludwigsen
–Jimar Millar
–De’Mel Mojica
–Mark Turner
–Nick Vincent-Maloney
–Luisa Zini
Institute of Molecular Pathology
–Karl Mechtler
Colorado State University
–Jessica Prenni
–Karen Dobos
Mayo Clinic, MN
–Ann Oberg
Download