False Discovery Rate, Concept and R Implementation

advertisement
False Discovery Rate
Concept, and Implementation in R, WEB
Joseph Jin and Bruce Ling
Biotechnology Core
Stanford University
http://biomarker.stanford.edu
1
Copyright, 1996 © Dale Carnegie & Associates, Inc.
Agenda
• FDR Concept and Procedure
• FDR Implementation In R
• Implementation DEMO
4/13/2015
2
•Statistical Hypothesis Tests
A method of making statistical decisions
from experimental data
– T-test for known distribution of a control group;
– Mann-Whitney test for an unknown distribution;
4/13/2015
3
Single Hypothesis Test and Error
Example: coin tossing
Null Hypothesis: 50% heads
Data:
9 heads out of 10 tosses
Test:
prob(9 heads) = 11x(1/2)^10 = 1.07% (p-value)
Significance Level:
Decision:
0.05
p –value < 0.05 (coin is biased)
Actual Condition
Positive
Test
Result
4/13/2015
Negative
Positive
True Positive
False Positive
(Type I Error)
Negative
False Negative
(Type II Error)
True Negative
4
Multiple Comparison Problem
Repeat test on 100 coins
4/13/2015
P-value
True Positive
Type I
Error
Type I Error
(general)
Single
Test
0.0107
0.9893
0.0107
P-value
(0.05)
Multiple
Tests
0.0107
(1-0.0107)^100
= 0.34
1-0.34
= 0.66
1-(1-p-value)^m
FWER
p-value/m
> 1- p-value
< p-value
Bonferroni
method
5
Issues in Multiplicity Using p-value
• If we ignore multiplicity and m is large, the
probability of Type I errors/false positives
occurring becomes too high
• Controlling the FWER is too restrictive
– p-value = 0.05/m is overly conservative. As a
result, much harder to find significance and
probability of false negatives increases.
– why should a researcher be penalized for
conducting a more thorough study (doing
multiple tests)
4/13/2015
6
False Discovery Rate (FDR)
•
Instead of controlling type I error (false positive), FDR controls
the expected proportion of false positives.
•
FDR definition:
– R is observable random variable
– V is non-observable random variable.
– FDR is the expectation of random variable V/R.
FDR = E(V/R)
declared
negative
true null hypothesis
non-true null hypothesis
total
4/13/2015
declared
positive
total
U
V (type I)
m0
T (Type II)
S
m – m0
R
m
m
-R
7
Two Classes FDR R Implementation
Input: K features and m set of
data in two class (1 and 2);
•
•
•
4/13/2015
1
f1
Perform paired test (Ttest/wilcox) on each
feature
…1
2
…2
d11 … d1j d1i
… d1m
dk1 … dkj dki
… dkm
…..
fk
Generate a new table T’ by
random permutation of
columns. And perform test
again.
Original
p01 p02 … p0k
permutation1
p11 p12 … p1k
permutationn
pn1 pn2 … pnk
FDR = Average positive
on permutated (false) data/
positive on original data.
8
Two Classes FDR R Implementation -2
•
•
For each p-value threshold
incremented by interval
(0.01), count the p values
that less than the
threshold.
0.01
0.02
…
Original
c01
c02
… c0100
permutation1
c11
c12
… c1100
permutationn
cn1
cn2
… cn100
For each column j,
FDRj = ((c1j+..+cnj)/n)/cj0
•
Threshold
Plot FDR / P-value
diagram
1.00
Original
p01 p02 … p0k
permutation1
p11 p12 … p1k
permutationn
pn1 pn2 … pnk
Web User Interface for R script
• A user friendly web interface for an R script means a script
can be used by anyone, even if they have no knowledge of R.
• Features:
– Facility for uploading input files
– Results files displayed on results page and available for
download.
– Repeat analyses with different parameters and data files new results added to results list, as a link to the
corresponding results page. (future)
– Real time progress information displayed when running
the application. (future)
• Example: T-test; Wilcox Test ; False Discovery Rate
4/13/2015
10
Credits
• This work is under the supervision of Dr. Bruce Ling
• References:
– Benjamini, Y., and Hochberg Y. (1995). "Controlling the
false discovery rate: a practical and powerful approach to
multiple testing". Journal of the Royal Statistical Society.
Series B (Methodological) 57 (1), 289–300
– Tibshirani, Robert. "Diagnosis of Multiple Cancer Types by
Shrunken Centroids of Gene Expression." PNAS 10 Oct.
2001. 20 Jan. 2008
<http://www.pnas.org/cgi/reprint/99/10/6567>.
4/13/2015
11
Download