Formalized Data Snooping Based on Generalized Error Rates

advertisement
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
Formalized Data Snooping
Based on Generalized Error Rates
Joseph P. Romano1
Azeem S. Shaikh2 and Michael Wolf3
1 Department
of Statistics
Stanford University
2 Department
of Economics
University of Chicago
3 Institute
for Empirical Research in Economics
University of Zurich
Problem Formulation
Existing Methods
New Methods
Outline
1
Problem Formulation
2
Existing Methods
3
New Methods
4
Some Theory
5
Empirical Applications
6
Conclusions
Some Theory
Empirical Applications
Conclusions
Problem Formulation
Existing Methods
New Methods
Outline
1
Problem Formulation
2
Existing Methods
3
New Methods
4
Some Theory
5
Empirical Applications
6
Conclusions
Some Theory
Empirical Applications
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
The Dilemma
Empirical research in economics and finance often involves
data snooping.
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
The Dilemma
Empirical research in economics and finance often involves
data snooping.
Common situation 1:
Many strategies are compared to a benchmark
Which strategies beat the benchmark?
If many strategies are tested, some will appear better
just by chance
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
The Dilemma
Empirical research in economics and finance often involves
data snooping.
Common situation 1:
Many strategies are compared to a benchmark
Which strategies beat the benchmark?
If many strategies are tested, some will appear better
just by chance
Common situation 2:
Multiple regression model with many regressors
Which regressors are significant?
If many regressors are tested, some will appear significant
just by chance
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
The Dilemma
Empirical research in economics and finance often involves
data snooping.
Common situation 1:
Many strategies are compared to a benchmark
Which strategies beat the benchmark?
If many strategies are tested, some will appear better
just by chance
Common situation 2:
Multiple regression model with many regressors
Which regressors are significant?
If many regressors are tested, some will appear significant
just by chance
Goal: take data snooping into account!
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
General Set-Up & Notation
Data is collected in T × L matrix XT
Interest focuses on parameter vector θ = (θ1 , . . . , θS )0
The individual hypotheses concern the elements θs ,
for s = 1, . . . , S, and can be (all) one-sided or (all) two-sided
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
General Set-Up & Notation
Data is collected in T × L matrix XT
Interest focuses on parameter vector θ = (θ1 , . . . , θS )0
The individual hypotheses concern the elements θs ,
for s = 1, . . . , S, and can be (all) one-sided or (all) two-sided
One-sided hypotheses:
Hs : θs ≤ θ0,s
vs. Hs0 : θs > θ0,s
Two-sided hypotheses:
Hs : θs = θ0,s
vs. Hs0 : θs 6= θ0,s
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
General Set-Up & Notation
Data is collected in T × L matrix XT
Interest focuses on parameter vector θ = (θ1 , . . . , θS )0
The individual hypotheses concern the elements θs ,
for s = 1, . . . , S, and can be (all) one-sided or (all) two-sided
One-sided hypotheses:
Hs : θs ≤ θ0,s
vs. Hs0 : θs > θ0,s
Two-sided hypotheses:
Hs : θs = θ0,s
vs. Hs0 : θs 6= θ0,s
Test statistic zT ,s = (wT ,s − θ0,s )/σ̂T ,s
√
σ̂T ,s is a standard error for wT ,s or σ̂T ,s ≡ 1/ T
p̂T ,s is an individual p-value
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Example 1
Absolute Performance of Investment Strategies:
Matrix XT records historic returns
µs is the average return of strategy s
µB is the average return of the benchmark
θs = µs − µB
θs,0 = 0 and the individual tests are one-sided
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Example 1
Absolute Performance of Investment Strategies:
Matrix XT records historic returns
µs is the average return of strategy s
µB is the average return of the benchmark
θs = µs − µB
θs,0 = 0 and the individual tests are one-sided
Test statistics:
∑Tt=1 xt,s and x̄T ,B =
wT ,s = x̄T ,s − x̄T ,B
x̄T ,s =
1
T
1
T
∑Tt=1 xB,s
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Example 2
Multiple Regression:
yt = δ + θ1 xt,1 + . . . + θS xt,S + et
Matrix XT records the response variable and
the regressors
θ0,s = 0 and the individual tests are two-sided
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Example 2
Multiple Regression:
yt = δ + θ1 xt,1 + . . . + θS xt,S + et
Matrix XT records the response variable and
the regressors
θ0,s = 0 and the individual tests are two-sided
Test statistics:
wT ,s = θ̂T ,s (estimated by OLS, say)
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
The Traditional Error Rate
Consider S individual tests Hs vs. Hs0
We need a formal concept to account for data snooping
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
The Traditional Error Rate
Consider S individual tests Hs vs. Hs0
We need a formal concept to account for data snooping
Familywise error rate
P is the (unknown) probability mechanism
FWEP = P{Reject at least one true Hs }
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
The Traditional Error Rate
Consider S individual tests Hs vs. Hs0
We need a formal concept to account for data snooping
Familywise error rate
P is the (unknown) probability mechanism
FWEP = P{Reject at least one true Hs }
Goal: (strong) asymptotic control of the FWE at level α:
lim sup FWEP ≤ α
T →∞
for all P
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
The Traditional Error Rate
Consider S individual tests Hs vs. Hs0
We need a formal concept to account for data snooping
Familywise error rate
P is the (unknown) probability mechanism
FWEP = P{Reject at least one true Hs }
Goal: (strong) asymptotic control of the FWE at level α:
lim sup FWEP ≤ α
for all P
T →∞
Problem: If S is very large, this criterion might be too strict.
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Generalized Error Rates
Generalized familywise error rate
k-FWEP = P{Reject at least k of the true Hs }
Note that 1-FWEP = FWEP
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Generalized Error Rates
Generalized familywise error rate
k-FWEP = P{Reject at least k of the true Hs }
Note that 1-FWEP = FWEP
False discovery proportion
F = # false rejections; R = # total rejections
FDP =
F
1{R > 0}
R
Control P{FDP > γ} for γ ∈ [0, 1)
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Generalized Error Rates
Generalized familywise error rate
k-FWEP = P{Reject at least k of the true Hs }
Note that 1-FWEP = FWEP
False discovery proportion
F = # false rejections; R = # total rejections
FDP =
F
1{R > 0}
R
Control P{FDP > γ} for γ ∈ [0, 1)
False discovery rate
Control FDR = E(FDP)
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Generalized Error Rates
Generalized familywise error rate
k-FWEP = P{Reject at least k of the true Hs }
Note that 1-FWEP = FWEP
False discovery proportion
F = # false rejections; R = # total rejections
FDP =
F
1{R > 0}
R
Control P{FDP > γ} for γ ∈ [0, 1)
False discovery rate
Control FDR = E(FDP)
Common philosophy: gain power by relaxing the strict FWE.
Conclusions
Problem Formulation
Existing Methods
New Methods
Outline
1
Problem Formulation
2
Existing Methods
3
New Methods
4
Some Theory
5
Empirical Applications
6
Conclusions
Some Theory
Empirical Applications
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Existing Methods: FWE Control
Bonferroni method (single-step):
p̂T ,s is individual p-value for testing hypothesis Hs
Reject Hs if p̂T ,s ≤ α/S
Can be very conservative
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Existing Methods: FWE Control
Bonferroni method (single-step):
p̂T ,s is individual p-value for testing hypothesis Hs
Reject Hs if p̂T ,s ≤ α/S
Can be very conservative
Holm method (stepwise):
Order p-values from smallest to largest
Reject H(s) if p̂T ,(j) ≤ α/(S − j + 1) for all j = 1, . . . , s
More powerful than Bonferroni
But can still be conservative
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Existing Methods: FWE Control
Bonferroni method (single-step):
p̂T ,s is individual p-value for testing hypothesis Hs
Reject Hs if p̂T ,s ≤ α/S
Can be very conservative
Holm method (stepwise):
Order p-values from smallest to largest
Reject H(s) if p̂T ,(j) ≤ α/(S − j + 1) for all j = 1, . . . , s
More powerful than Bonferroni
But can still be conservative
Problem:
Methods are based on the individual p-values only
They neglect the dependence structure across p-values
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Existing Methods: k-FWE Control
Generalized Bonferroni method (single-step):
p̂T ,s is individual p-value for testing hypothesis Hs
Reject Hs if p̂T ,s ≤ kα/S
Due to Hommel and Hoffmann (1988).
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Existing Methods: k-FWE Control
Generalized Bonferroni method (single-step):
p̂T ,s is individual p-value for testing hypothesis Hs
Reject Hs if p̂T ,s ≤ kα/S
Due to Hommel and Hoffmann (1988).
Generalized Holm method (stepwise):
Order p-values from smallest to largest
Reject H(s) if p̂T ,(s) ≤ αj for all j = 1, . . . , s with
αj =
kα/S
for j ≤ k
kα/(S + k − j) for j > k
Due to Lehmann and Romano (2005).
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Existing Methods: FDR Control
Benjamini and Hochberg (1995) method:
Let j ∗ = max j : p̂T ,(j) ≤ γj , where γj = jγ/S
Reject H(1) , . . . , H(j ∗ )
Achieves FDR ≤ γ
This is a stepup method, since it starts with the least
significant hypothesis
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Existing Methods: FDR Control
Benjamini and Hochberg (1995) method:
Let j ∗ = max j : p̂T ,(j) ≤ γj , where γj = jγ/S
Reject H(1) , . . . , H(j ∗ )
Achieves FDR ≤ γ
This is a stepup method, since it starts with the least
significant hypothesis
Comments:
Original proof assumes independence of p-values
Validity has been extended to certain dependence types
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Existing Methods: FDR Control
Benjamini and Hochberg (1995) method:
Let j ∗ = max j : p̂T ,(j) ≤ γj , where γj = jγ/S
Reject H(1) , . . . , H(j ∗ )
Achieves FDR ≤ γ
This is a stepup method, since it starts with the least
significant hypothesis
Comments:
Original proof assumes independence of p-values
Validity has been extended to certain dependence types
Still missing:
Method to account for unknown dependence structure.
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Existing Methods: FDP Control
Generalized Holm method:
Order p-values from smallest to largest
Reject H(s) if p̂T ,(s) ≤ αj for all j = 1, . . . , s with
αj =
(bγjc + 1)α
S + bγjc + 1 − j
Achieves P{FDP > γ} ≤ α
Due to Lehmann and Romano (2005).
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
Existing Methods: FDP Control
Generalized Holm method:
Order p-values from smallest to largest
Reject H(s) if p̂T ,(s) ≤ αj for all j = 1, . . . , s with
αj =
(bγjc + 1)α
S + bγjc + 1 − j
Achieves P{FDP > γ} ≤ α
Due to Lehmann and Romano (2005).
Comments:
Valid under certain dependence types, e.g., independence
Can be made always valid by multiplying the αj with
a common constant =⇒ very conservative in general
Problem Formulation
Existing Methods
New Methods
Outline
1
Problem Formulation
2
Existing Methods
3
New Methods
4
Some Theory
5
Empirical Applications
6
Conclusions
Some Theory
Empirical Applications
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
New Methods: k -FWE Control
Context:
One-sided tests Hs : θs ≤ θ0,s vs. Hs0 : θs > θ0,s
Test statistics zT ,s = (wT ,s − θ0,s )/σ̂T ,s
We start with a single-step method to build intuition:
What is the ideal common critical value, called d1 ?
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
New Methods: k -FWE Control
Context:
One-sided tests Hs : θs ≤ θ0,s vs. Hs0 : θs > θ0,s
Test statistics zT ,s = (wT ,s − θ0,s )/σ̂T ,s
We start with a single-step method to build intuition:
What is the ideal common critical value, called d1 ?
It’s the (1 − α) quantile under P of k-max{(wT ,s − θs )/σ̂T ,s }
Then reject all Hs for which zT ,s ≥ d1
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
New Methods: k -FWE Control
Context:
One-sided tests Hs : θs ≤ θ0,s vs. Hs0 : θs > θ0,s
Test statistics zT ,s = (wT ,s − θ0,s )/σ̂T ,s
We start with a single-step method to build intuition:
What is the ideal common critical value, called d1 ?
It’s the (1 − α) quantile under P of k-max{(wT ,s − θs )/σ̂T ,s }
Then reject all Hs for which zT ,s ≥ d1
Equivalent to inverting the generalized confidence region
[wT ,1 − σ̂T ,1 d1 , ∞) × . . . × [wT ,S − σ̂T ,S d1 , ∞)
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
New Methods: k -FWE Control
Context:
One-sided tests Hs : θs ≤ θ0,s vs. Hs0 : θs > θ0,s
Test statistics zT ,s = (wT ,s − θ0,s )/σ̂T ,s
We start with a single-step method to build intuition:
What is the ideal common critical value, called d1 ?
It’s the (1 − α) quantile under P of k-max{(wT ,s − θs )/σ̂T ,s }
Then reject all Hs for which zT ,s ≥ d1
Equivalent to inverting the generalized confidence region
[wT ,1 − σ̂T ,1 d1 , ∞) × . . . × [wT ,S − σ̂T ,S d1 , ∞)
Feasible solution:
Use the bootstrap to estimate d1 as the (1 − α) quantile
under P̂T of k-max{(wT∗ ,s − wT ,s )/σ̂T∗ ,s }
Important: P̂T is an unrestricted estimate of P
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
New Methods: k -FWE Control
Power can be increased by considering a stepwise method!
Imagine R1 ≥ k hypotheses were rejected in the first step.
What should be the critical value, called d2 , in step two?
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
New Methods: k -FWE Control
Power can be increased by considering a stepwise method!
Imagine R1 ≥ k hypotheses were rejected in the first step.
What should be the critical value, called d2 , in step two?
Let K be an index set corresponding to k − 1 of the
rejected hypotheses and all remaining hypotheses
dK is the (1 − α) quantile of k-maxs∈K {(wT ,s − θs )/σ̂T ,s }
h
i
R1
Then d2 = max{dK }
there are k−1
such dK
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
New Methods: k -FWE Control
Power can be increased by considering a stepwise method!
Imagine R1 ≥ k hypotheses were rejected in the first step.
What should be the critical value, called d2 , in step two?
Let K be an index set corresponding to k − 1 of the
rejected hypotheses and all remaining hypotheses
dK is the (1 − α) quantile of k-maxs∈K {(wT ,s − θs )/σ̂T ,s }
h
i
R1
Then d2 = max{dK }
there are k−1
such dK
Comments:
Again, d2 can be estimated via the bootstrap in practice
R1
In case k−1
is too large, the method can be made
operative by maximizing over a feasible subset
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
New Methods: k -FWE Control (k -StepM Method)
Power can be increased by considering a stepwise method!
Imagine R1 ≥ k hypotheses were rejected in the first step.
What should be the critical value, called d2 , in step two?
Let K be an index set corresponding to k − 1 of the
rejected hypotheses and all remaining hypotheses
dK is the (1 − α) quantile of k-maxs∈K {(wT ,s − θs )/σ̂T ,s }
h
i
R1
Then d2 = max{dK }
there are k−1
such dK
Comments:
Again, d2 can be estimated via the bootstrap in practice
R1
In case k−1
is too large, the method can be made
operative by maximizing over a feasible subset
Continue with such steps until no further rejections occur . . .
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
New Methods: FDP Control
Illustrate the method using γ = 0.1:
Start with 1-FWE control (at level α)
Empirical Applications
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
New Methods: FDP Control
Illustrate the method using γ = 0.1:
Start with 1-FWE control (at level α)
If less then 9 hypotheses are rejected, stop
Otherwise, move on to 2-FWE control (at level α)
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
New Methods: FDP Control
Illustrate the method using γ = 0.1:
Start with 1-FWE control (at level α)
If less then 9 hypotheses are rejected, stop
Otherwise, move on to 2-FWE control (at level α)
If less then 19 hypotheses are rejected, stop
Otherwise, move on to 3-FWE control (at level α)
And so on . . .
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
New Methods: FDP Control
Illustrate the method using γ = 0.1:
Start with 1-FWE control (at level α)
If less then 9 hypotheses are rejected, stop
Otherwise, move on to 2-FWE control (at level α)
If less then 19 hypotheses are rejected, stop
Otherwise, move on to 3-FWE control (at level α)
And so on . . .
General stopping rule at any given step j:
Stop if # rejections < j/γ − 1
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
New Methods: FDP Control (FDP-StepM Method)
Illustrate the method using γ = 0.1:
Start with 1-FWE control (at level α)
If less then 9 hypotheses are rejected, stop
Otherwise, move on to 2-FWE control (at level α)
If less then 19 hypotheses are rejected, stop
Otherwise, move on to 3-FWE control (at level α)
And so on . . .
General stopping rule at any given step j:
Stop if # rejections < j/γ − 1
Comments:
Works for any ‘underlying’ k-FWE controlling method
But in the interest of power, use the k-StepM method
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Alternative to FDR Control: Median(FDP) Control
The FDR is given by E(FDP).
Alternatively, control Median(FDP) by choosing α = 0.5:
P{FDP > γ} ≤ 0.5
=⇒
Median(FDP) ≤ γ
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
Alternative to FDR Control: Median(FDP) Control
The FDR is given by E(FDP).
Alternatively, control Median(FDP) by choosing α = 0.5:
P{FDP > γ} ≤ 0.5
=⇒
Median(FDP) ≤ γ
Some advantages:
Valid for any dependence structure
Implicitly accounts for true dependence to improve power
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
Alternative to FDR Control: Median(FDP) Control
The FDR is given by E(FDP).
Alternatively, control Median(FDP) by choosing α = 0.5:
P{FDP > γ} ≤ 0.5
=⇒
Median(FDP) ≤ γ
Some advantages:
Valid for any dependence structure
Implicitly accounts for true dependence to improve power
Caveat:
Both the FDR and Median(FDP) are central tendencies
of the sampling distribution of the FDP
The realized FDP can easily be greater than γ
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
Augmentation Methods
van der Laan et al. have a different approach to account for
dependence while controlling generalized error rates.
k -FWE control:
Start out controlling the FWE (accounting for dependence)
Assume R hypotheses have been rejected
Then also reject the next k − 1 most significant hypotheses
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
Augmentation Methods
van der Laan et al. have a different approach to account for
dependence while controlling generalized error rates.
k -FWE control:
Start out controlling the FWE (accounting for dependence)
Assume R hypotheses have been rejected
Then also reject the next k − 1 most significant hypotheses
FDP control:
Start out controlling the FWE (accounting for dependence)
Assume R hypotheses have been rejected
Then also reject the next D most significant hypotheses,
where D is the largest integer satisfying
D
≤γ
D +R
Problem Formulation
Existing Methods
New Methods
Outline
1
Problem Formulation
2
Existing Methods
3
New Methods
4
Some Theory
5
Empirical Applications
6
Conclusions
Some Theory
Empirical Applications
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Validity of the Bootstrap Methods
Assumption
√
The sampling distribution of T (WT − θ ) under P
converges to a continuous limit distribution
The bootstrap consistently estimates this distribution
√
√
T σ̂T ,s and T σ̂T∗ ,s converge to the same constant
in probability (for s = 1, . . . , S)
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
Validity of the Bootstrap Methods
Assumption
√
The sampling distribution of T (WT − θ ) under P
converges to a continuous limit distribution
The bootstrap consistently estimates this distribution
√
√
T σ̂T ,s and T σ̂T∗ ,s converge to the same constant
in probability (for s = 1, . . . , S)
Theorem
(i) If θs > 0, then Hs will be rejected with prob. → 1 as T → ∞
(ii) The bootstrap methods asymptotically control the k-FWE
and the FDP at level α
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Modification & Extension
Modification to two-sided tests:
Hs : θs = θ0,s vs. Hs0 : θs 6= θ0,s
Test statistics are now |zT ,s |
Critical value in the first step, say, is the (1 − α) quantile
of the distribution of k-max{|wT ,s − θs |/σ̂T ,s }
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
Modification & Extension
Modification to two-sided tests:
Hs : θs = θ0,s vs. Hs0 : θs 6= θ0,s
Test statistics are now |zT ,s |
Critical value in the first step, say, is the (1 − α) quantile
of the distribution of k-max{|wT ,s − θs |/σ̂T ,s }
Extension to non-standard cases:
The bootstrap does not always work
Often this√happens when the rate of convergence is not
equal to T or when the limiting distribution is not normal
Example: Manski’s (1975) maximum score estimator
In such cases, one can use subsampling instead
Problem Formulation
Existing Methods
New Methods
Outline
1
Problem Formulation
2
Existing Methods
3
New Methods
4
Some Theory
5
Empirical Applications
6
Conclusions
Some Theory
Empirical Applications
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Application 1: Hedge Fund Evaluation
Set-up:
Use S = 210 hedge funds in CISDM database with
complete return history from 01/1994 until 12/2003
(so T = 120)
The common benchmark is the riskfree rate
Consider absolute performance: θs = µs − µB
Tests are one-sided and θ0,s = 0 always
Question:
How many funds are identified as ‘outperformers’ using
different methods and error rates?
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
Application 1: Hedge Fund Evaluation
Number of funds identified as ‘outperformers’:
gHolm
1-FWE
3-FWE
FDP0.1
FDR0.1
Naı̈ve
5%
10
16
13
10%
13
22
22
No α
101
102
130
Bootstrap
1-FWE
3-FWE
FDP0.1
FDPMed
0.1
Naı̈ve
5%
11
29
17
10%
16
33
36
50%
127
102
130
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Application 2: Multiple Regression
Set-up:
Mincer regression with log-wage as response variable
and a combined S = 291 explanatory variables
There are a total of T = 4, 975 persons in the sample
From Austrian Social Security data base on 08/10/2001
Tests are two-sided and θ0,s = 0 always
Question:
How many explanatory variables are identified as
‘important’ using different methods and error rates?
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
Application 2: Multiple Regression
Number of explanatory variables identified as ‘important’:
gHolm
1-FWE
5-FWE
FDP0.1
FDR0.1
Naı̈ve
0.05
0
9
0
0.10
0
11
0
No α
16
23
33
Bootstrap
1-FWE
5-FWE
FDP0.1
FDPMed
0.1
Naı̈ve
0.05
5
12
5
0.10
6
12
6
0.50
12
23
33
Problem Formulation
Existing Methods
New Methods
Outline
1
Problem Formulation
2
Existing Methods
3
New Methods
4
Some Theory
5
Empirical Applications
6
Conclusions
Some Theory
Empirical Applications
Conclusions
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
Conclusions
Methodology:
Generalized error rates result in greater power by relaxing
the strict FWE criterion
Bootstrap methods that account for the dependence
structure improve upon existing methods based on
the individual p-values
Stepwise methods provide a ‘free lunch’ compared to
their single-step counterparts
Problem Formulation
Existing Methods
New Methods
Some Theory
Empirical Applications
Conclusions
Conclusions
Methodology:
Generalized error rates result in greater power by relaxing
the strict FWE criterion
Bootstrap methods that account for the dependence
structure improve upon existing methods based on
the individual p-values
Stepwise methods provide a ‘free lunch’ compared to
their single-step counterparts
Moral:
A wide array of powerful multiple testing methods are
now available, satisfying different needs
There should be no more excuse for data snooping!
Download