Powerpoint slides

advertisement
A "Consistency" Test for Determining
the Significance of Gene Expression
Changes on Replicate Samples
and
Two Convenient Variance-stabilizing
Transformations
Peter J. Munson, Ph.D.
Mathematical and Statistical Computing Laboratory
DCB, CIT, NIH
munson@helix.nih.gov
Page 1
P. J. Munson, National Institutes of Health, Nov. 2001
Introduction
• Math. Stat. Comp. Lab. at NIH
• Run Affy LIMS database
– Started Dec 2000, Stores >700 chips,
– Serves 3 core facilities at NIH
• Study 1
– 2 treatments, 5 time points, 6 subjects, 60 U95A chips,
PBMC cells
• Study 2
– 3 treatments, 5 time points, 5 subj., 75 Hu6800 chips, human
cells in culter
• Study 3
– 4 doses, 2 time oints, 20 subjects, 20 RG U34A chips, blood
cells
Page 2
P. J. Munson, National Institutes of Health, Nov. 2001
Outline
• Development of Consistency Test
• Variance-stabilizing transforms
– Generalize Logarithm, GLog
– Adaptive transform for Average Diff, TAD
• Normalization
– Normal quantile + adaptive transform
• Application
• Probe-pair data visualization:
– Parallel Axis Coordinate Display
Page 3
P. J. Munson, National Institutes of Health, Nov. 2001
Comparing Two Cell Lines
• Don’t subtract
background
• Ignore background-level
points
• Calibrate on median
intensity of each cell type
• Over 3-fold change = =
Outside dashed lines
• Are these expression level
changes significant? real?
Data from Carlisle, et al., Mol.Carcinogen., 2000
Page 4
P. J. Munson, National Institutes of Health, Nov. 2001
Duplicate Experiments and
"Consistency" Plot
Identifies Real Changes in Expression
Keratin 5
Vimentin
Page 5
P. J. Munson, National Institutes of Health, Nov. 2001
Replication Permits Calculation of
Significance (P-values)
4 False-positives
Out of 5760 spots:
P ≈ 4/5760 = 0.0007
Page 6
P. J. Munson, National Institutes of Health, Nov. 2001
Consistency Plot
L21b**exp45
• Compare duplicate
experiments, Log Ratio scale
• Set Cutoffs for Over-, Underexpression
• Calculate number detected, D
• Assume Independence,
calculate expected number, E,
above both, below both cutoffs
• Estimate false positive rate,
E/D
Page 7
1
D=24
0.8
0.6
0.4
0.2
-0
-0.2
-0.4
-0.6
-0.8
D=16
-1
-1
-0.8 -0.6 -0.4 -0.2 -0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
L12b**exp44
0
0. 3
22
45.2
D=24
E=0. 6
E/D=3%
46
11
4074
26.1 4036.6
28
50.4
4113
90
16
E=0.6
74
88.4
0
1.1
27
4170
52 4249
P. J. Munson, National Institutes of Health, Nov. 2001
p53 +/+ cells 6 hrs,
replicate reciprocal experiment
1
1
0.8
0.6
0.4
L21**exp64
0.2
-0
-0.2
0
-0.4
-0.6
-0.8
-1
-1
-1
Page 8
0
L12**exp44
1
-1
-0.8 -0.6 -0.4 -0.2 -0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
L12**exp63
P. J. Munson, National Institutes of Health, Nov. 2001
Consistency Test on Relative Expression
DEFINE:
x(g, i) = relative expression value for gene g (=1,...,n)
in experiment i (=1,...,m)
Fi(X) = empirical cdf of xi across genes (spots)
c
= minj x(g, j), across experiments
THEN
assuming that { x(g, i), g=1,...,n }
are an independent sample from distribution Fi ,
the probability that x(g, i) is consistently large is:
pup (g) = Pr(Xi ≥ c, for all i) = ∏i (1 - Fi(c))
Page 9
P. J. Munson, National Institutes of Health, Nov. 2001
Consistency Test on Relative Expression- 2
DEFINE:
x(g, i) = relative expression value for gene g (= 1,...,n)
in experiment i (= 1,...,m)
pup(g) = ∏i (1 - Fi( minj x(g, j) ))
pdn(g) = ∏i (Fi( maxj x(g, j) ))
THEN
Expected number of false positives:
E(g) = n * p(g)
Page 10
P. J. Munson, National Institutes of Health, Nov. 2001
Assumptions of Consistency Test
• Independence between experiments
• “Exchangeability” of genes
• Homogeneity of variance across genes (i.e.
across expression intensity)
Does NOT require:
• Identical distribution in separate experiments
But, variance homogeneity violated for Affy Avg.
Diff. data
Page 11
P. J. Munson, National Institutes of Health, Nov. 2001
Variance Stabilizing Transformations
• Logarithm
• Box-Cox, power
• Generalized Logarithm, GLog
• Adaptive, TAD
Page 12
P. J. Munson, National Institutes of Health, Nov. 2001
Model Variance as Function of Mean AD
Page 13
P. J. Munson, National Institutes of Health, Nov. 2001
Model Variance as Function of Mean AD
Var(y) = a0
Var(y) = a0 + a1*y
Var(y) = a0 + a1*y + a2*y2
Var(y) = a2*y2
=>> use logarithms
What about:
Var(y) = a0 + a2*y2
Page 14
P. J. Munson, National Institutes of Health, Nov. 2001
Generalized Log Transform (G-Log)
Var(y) = a0 + a2 * y2
= a0*( 1+ (y/c)2)
where c = sqrt(a0/a2)
GLog(y; c) =
sign(y) *ln{ |y/c| + sqrt(1 + y2/c2) }
e.g.
Page 15
= s.d. at y = 0 / CV,
= 10 / 0.1 = 100
P. J. Munson, National Institutes of Health, Nov. 2001
Quantile Normalization for AD (before)
Page 16
P. J. Munson, National Institutes of Health, Nov. 2001
Quantile Normalization for AD (after)
Page 17
P. J. Munson, National Institutes of Health, Nov. 2001
Normal Quantile Transform after GLog(AD)
(it’s almost linear)
Page 18
P. J. Munson, National Institutes of Health, Nov. 2001
Adaptive Transform of AD (TAD) - 1
Model variance (over many
replicates) vs. mean AD
Plot:
Log(SD) or
Wilson-Hilferty, SD^(2/3)
transform
vs.
Mean of NQ(AD)
Fit smooth function, g which
predicts SD
Page 19
P. J. Munson, National Institutes of Health, Nov. 2001
Adaptive Transform of AD (TAD) - 2
T(X) = Int(-inf,X,1/g)
Page 20
P. J. Munson, National Institutes of Health, Nov. 2001
Adaptive Transform of AD (TAD)
Page 21
P. J. Munson, National Institutes of Health, Nov. 2001
Consistency Test p-values
Time 2 vs. Time 0
Time 1 vs. Time 0
1000
Treatment
500
0
.1 .2
.3
.4
.5
.6
.7
.8 .9
Count Axis
1500
0
.1
.2
.3
.4
.5
.6
0
.1 .2
.3
.4
.5
.6
.7
.8
.9
1
1
Sham
200
100
0
Page 22
.1 .2
.3
.4
.5
.6
.7
.8 .9
1
Count Axis
300
.7
.8 .9
1
P. J. Munson, National Institutes of Health, Nov. 2001
Results of Study 1
(5 time points, 2 treatments, 6 subjects)
Table 1. Numb er of genes detected by consistency test with expected false positives
set to 1.0
Group
Any Time
1-0
2-0
3-0
4-0
Treated
Controls
Both
385
83
2
13
21
0
340
23
1
22
26
2
Table 3. Numb er of genes detected by Maximu m TAD greater than 1
Group
Any time
1-0
2-0
3-0
Treated
275
5
264
4
Controls
6
1
2
4
Both
1
0
0
0
Page 23
19
24
1
4-0
5
4
1
P. J. Munson, National Institutes of Health, Nov. 2001
Probe Pair Data, Delta TAD = 2
Parallel Axis Coordinate Display
Page 24
P. J. Munson, National Institutes of Health, Nov. 2001
Probe Pair Data Delta TAD = 0.5
Page 25
P. J. Munson, National Institutes of Health, Nov. 2001
Probe Pair Data, Delta TAD = -1.5
Page 26
P. J. Munson, National Institutes of Health, Nov. 2001
Probe Pair Data, Delta TAD = -0.5
Page 27
P. J. Munson, National Institutes of Health, Nov. 2001
Acknowledgements
Lynn Young, MSCL
Vinay Prabhu, MSCL
Jennifer Barb, MSCL
Howard Shindel, MSCL
Andrew Schwartz, CIT
Steve Bailey, CIT
Sayed Daoud, NCI
Yves Pommier, NCI
John Weinstein, NCI
Robert Danner, CC
Anthony Suffredini, CC
Peter Eichacker, CC
James Shelhamer, CC
Eric Gerstenberger, CC
David Rocke, UC Davis
Page 28
David Krizman, NCI
Alex Carlisle, NCI
P. J. Munson, National Institutes of Health, Nov. 2001
Download