Lecture 7 - Tresch Group

advertisement
Achim Tresch
UoC / MPIPZ
Cologne
Statistics
treschgroup.de/OmicsModule1415.html
tresch@mpipz.mpg.de
1
Multiple Testing
Challenge
• You test plants/patients/… in two settings (or from
different populations).
• You want to know which / how many genes are
differentially expressed (alternate)
• You don’t want to make too many mistakes
(declaring a gene to be alternate = differentially
expressen when in fact they are null – not
differentially expressed).
The Multiple Testing Problem
• You choose a significance level, say 0.05.
• You calculate p-values of the differences in
expression.
• The p-value of g is the probability that if g is null
(not differentially expressed), it would have a test
statistic (e.g., t-statistic) at least that large.
• You say all genes that differ with p-value ≤ 0.05
are truly different.
What’s the problem?
The Multiple Testing Problem
You are testing many genes at the same time
• Suppose that you test 10,000 genes, but no
genes are truly differentially expressed.
• You will conclude that about 5% of those you
called significant are differentially expressed.
• You will find 500 “significant” genes.
• Bad.
The Multiple Testing Problem
Bonferroni Correction
Bonferroni Correction (FWER control)
Pr(at least one gene
found diff.expr.)
Bonferroni controls the probability by which our list of
differentially expressed genes contains at least one mistake
= Family-wise error rate (FWER). This is very strict.
False Discovery Rate (FDR) estimation
A Fundamental Insight
• All truly null genes (i.e. not truly differentially
expressed) are equally likely to have any p-value.
• That is by construction of p-val: under the null
hypothesis, 1% of the genes will be in the top 1
percentile, 1% will be in percentile between 89 and
90th and so on. P-val is just a way of saying percentile
in null condition.
0
p-value
1
False Discovery Rate (FDR) estimation
Idea: The observed p-value distribution is a mixture of
• null genes (light blue marbles) and truly different
genes (red marbles).
• If the chosen test is appropriate, red marbles should
be concentrated at the low p-values.
Differential
gene
0
p-value
Non-Differential
gene
1
False Discovery Rate (FDR) estimation
• We don’t of course know the colors of the
marbles/we don’t know which genes are true
alternates.
• However, we know that null marbles are equally
likely to have any p-value.
• So, at the p-value where the height of the marbles
levels off, we have primarily light blue marbles/null
genes.
False Discovery Rate (FDR) estimation
• Because if all genes/marbles were null, the heights
would be about uniform.
• Provided the reds are concentrated near the low
p-values, the flat regions will be primarily light blues.
Absolute
frequency
• We estimate the baseline of
null marbles
0
p-value
1
≈nondifferential
genes
False Discovery Rate (FDR) estimation
Absolute
frequency
• Subtracting the “baseline” of true null hypotheses,
the remaining balls are primarily red, i.e., they are
true alternative hypotheses
≈ differential
genes
0
p-value
1
≈nondifferential
genes
False Discovery Rate (FDR) estimation
• Given a p-value cutoff, we can estimate the rate of
false discoveries (FDR) that pass this threshold.
Absolute
frequency
FDR(p-cut) =
+
≈ differential
genes
0
p-value
cutoff
p-value
1
≈nondifferential
genes
FDR-based p-value cutoff
• Given a desired FDR (e.g., 20%), we can find the
largest p-value cutoff for which this FDR is achieved.
Absolute frequency
FDR(p-cut1)= 9%
Baseline of nulls
0
p-cut1
= 0.1
p-value
1
FDR-based p-value cutoff
• Given a desired FDR (e.g., 20%), we can find the
largest p-value cutoff for which this FDR is achieved.
FDR(p-cut1)= 9%
Absolute frequency
FDR(p-cut1)= 20%
Baseline of nulls
0
p-cut1 p-cut1
= 0.1 = 0.2
p-value
1
FDR-based p-value cutoff
• Given a desired FDR (e.g., 20%), we can find the
largest p-value cutoff for which this FDR is achieved.
FDR(p-cut1)= 9%
Absolute frequency
FDR(p-cut1)= 20%
FDR(p-cut3)= 52%
Baseline of nulls
0
p-cut1 p-cut1
= 0.1 = 0.2
p-cut1 p-value
= 0.7
1
FDR-based p-value cutoff
• Given a desired FDR (e.g., 20%), we can find the
largest p-value cutoff for which this FDR is achieved.
FDR(p-cut1)= 9%
Absolute frequency
FDR(p-cut1)= 20%
FDR(p-cut3)= 52%
Baseline of nulls
0
p-cut1 p-cut1
= 0.1 = 0.2
p-cut1 p-value
= 0.7
1
Example: All null
• Consider the all null case (all marbles are blue).
• For any p-value cutoff, the estimated FDR will be
close to 100%.
• For any sensible FDR (substantially below 100%),
there will be no suitable p-value cutoff, and the
method will not return any gene.
• Good.
0
p-value
1
Examples: All alternate
• Consider the all alternate case (all marbles are red).
• For a large range of p-value cutoffs, the estimated
FDR will be close to 0.
• For sensible FDR cutoffs (e.g. 20%), the
corresponding p-value cutoff will be high.
• The method will return many genes
• Good.
0
p-value
1
Conclusions
• A flat p-value distribution may force us to the far
left in order to get a low False Discovery Rate.
• This may eliminate genes of interest.
• If subsequent validation experiments are not too
expensive, we can accept a higher False Discovery
Rate (e.g., 20%)
• FDR rate and significance level are entirely
different things!
Gene Set Enrichment
Fisher‘s exact test, once more
Fisher‘s exact test, once more
Gene Ontology Example
559
Gene Ontology Example
(immune response)
(macromolecule biosynthesis)
Kolmogorov-Smirnov Test
< 10-10
• Move 1/K up when you see a
gene from group a
• Move 1/(N-K) down when you
see a gene not in group a
GO scoring: general problem
GO Independence Assumption
GO sets
light yellow
GO Independence Assumption
light yellow
The elim method
The elim method
Top 10 significant nodes
(boxes) obtained with
the elim method
Algorithms Summary
Evaluation: Top scoring GO term
Significant
GO terms in
the ALL
dataset
Advantages & Disadvantages for ALL
Simulation Study
Introduce noise
Simulation Study
Simulation Study
Quality of GO scoring methods
10% noise level
40% noise level
Summary
Download