Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions Formalized Data Snooping Based on Generalized Error Rates Joseph P. Romano1 Azeem S. Shaikh2 and Michael Wolf3 1 Department of Statistics Stanford University 2 Department of Economics University of Chicago 3 Institute for Empirical Research in Economics University of Zurich Problem Formulation Existing Methods New Methods Outline 1 Problem Formulation 2 Existing Methods 3 New Methods 4 Some Theory 5 Empirical Applications 6 Conclusions Some Theory Empirical Applications Conclusions Problem Formulation Existing Methods New Methods Outline 1 Problem Formulation 2 Existing Methods 3 New Methods 4 Some Theory 5 Empirical Applications 6 Conclusions Some Theory Empirical Applications Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications The Dilemma Empirical research in economics and finance often involves data snooping. Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications The Dilemma Empirical research in economics and finance often involves data snooping. Common situation 1: Many strategies are compared to a benchmark Which strategies beat the benchmark? If many strategies are tested, some will appear better just by chance Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions The Dilemma Empirical research in economics and finance often involves data snooping. Common situation 1: Many strategies are compared to a benchmark Which strategies beat the benchmark? If many strategies are tested, some will appear better just by chance Common situation 2: Multiple regression model with many regressors Which regressors are significant? If many regressors are tested, some will appear significant just by chance Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions The Dilemma Empirical research in economics and finance often involves data snooping. Common situation 1: Many strategies are compared to a benchmark Which strategies beat the benchmark? If many strategies are tested, some will appear better just by chance Common situation 2: Multiple regression model with many regressors Which regressors are significant? If many regressors are tested, some will appear significant just by chance Goal: take data snooping into account! Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions General Set-Up & Notation Data is collected in T × L matrix XT Interest focuses on parameter vector θ = (θ1 , . . . , θS )0 The individual hypotheses concern the elements θs , for s = 1, . . . , S, and can be (all) one-sided or (all) two-sided Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions General Set-Up & Notation Data is collected in T × L matrix XT Interest focuses on parameter vector θ = (θ1 , . . . , θS )0 The individual hypotheses concern the elements θs , for s = 1, . . . , S, and can be (all) one-sided or (all) two-sided One-sided hypotheses: Hs : θs ≤ θ0,s vs. Hs0 : θs > θ0,s Two-sided hypotheses: Hs : θs = θ0,s vs. Hs0 : θs 6= θ0,s Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions General Set-Up & Notation Data is collected in T × L matrix XT Interest focuses on parameter vector θ = (θ1 , . . . , θS )0 The individual hypotheses concern the elements θs , for s = 1, . . . , S, and can be (all) one-sided or (all) two-sided One-sided hypotheses: Hs : θs ≤ θ0,s vs. Hs0 : θs > θ0,s Two-sided hypotheses: Hs : θs = θ0,s vs. Hs0 : θs 6= θ0,s Test statistic zT ,s = (wT ,s − θ0,s )/σ̂T ,s √ σ̂T ,s is a standard error for wT ,s or σ̂T ,s ≡ 1/ T p̂T ,s is an individual p-value Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Example 1 Absolute Performance of Investment Strategies: Matrix XT records historic returns µs is the average return of strategy s µB is the average return of the benchmark θs = µs − µB θs,0 = 0 and the individual tests are one-sided Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Example 1 Absolute Performance of Investment Strategies: Matrix XT records historic returns µs is the average return of strategy s µB is the average return of the benchmark θs = µs − µB θs,0 = 0 and the individual tests are one-sided Test statistics: ∑Tt=1 xt,s and x̄T ,B = wT ,s = x̄T ,s − x̄T ,B x̄T ,s = 1 T 1 T ∑Tt=1 xB,s Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Example 2 Multiple Regression: yt = δ + θ1 xt,1 + . . . + θS xt,S + et Matrix XT records the response variable and the regressors θ0,s = 0 and the individual tests are two-sided Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Example 2 Multiple Regression: yt = δ + θ1 xt,1 + . . . + θS xt,S + et Matrix XT records the response variable and the regressors θ0,s = 0 and the individual tests are two-sided Test statistics: wT ,s = θ̂T ,s (estimated by OLS, say) Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications The Traditional Error Rate Consider S individual tests Hs vs. Hs0 We need a formal concept to account for data snooping Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications The Traditional Error Rate Consider S individual tests Hs vs. Hs0 We need a formal concept to account for data snooping Familywise error rate P is the (unknown) probability mechanism FWEP = P{Reject at least one true Hs } Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications The Traditional Error Rate Consider S individual tests Hs vs. Hs0 We need a formal concept to account for data snooping Familywise error rate P is the (unknown) probability mechanism FWEP = P{Reject at least one true Hs } Goal: (strong) asymptotic control of the FWE at level α: lim sup FWEP ≤ α T →∞ for all P Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications The Traditional Error Rate Consider S individual tests Hs vs. Hs0 We need a formal concept to account for data snooping Familywise error rate P is the (unknown) probability mechanism FWEP = P{Reject at least one true Hs } Goal: (strong) asymptotic control of the FWE at level α: lim sup FWEP ≤ α for all P T →∞ Problem: If S is very large, this criterion might be too strict. Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Generalized Error Rates Generalized familywise error rate k-FWEP = P{Reject at least k of the true Hs } Note that 1-FWEP = FWEP Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Generalized Error Rates Generalized familywise error rate k-FWEP = P{Reject at least k of the true Hs } Note that 1-FWEP = FWEP False discovery proportion F = # false rejections; R = # total rejections FDP = F 1{R > 0} R Control P{FDP > γ} for γ ∈ [0, 1) Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Generalized Error Rates Generalized familywise error rate k-FWEP = P{Reject at least k of the true Hs } Note that 1-FWEP = FWEP False discovery proportion F = # false rejections; R = # total rejections FDP = F 1{R > 0} R Control P{FDP > γ} for γ ∈ [0, 1) False discovery rate Control FDR = E(FDP) Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Generalized Error Rates Generalized familywise error rate k-FWEP = P{Reject at least k of the true Hs } Note that 1-FWEP = FWEP False discovery proportion F = # false rejections; R = # total rejections FDP = F 1{R > 0} R Control P{FDP > γ} for γ ∈ [0, 1) False discovery rate Control FDR = E(FDP) Common philosophy: gain power by relaxing the strict FWE. Conclusions Problem Formulation Existing Methods New Methods Outline 1 Problem Formulation 2 Existing Methods 3 New Methods 4 Some Theory 5 Empirical Applications 6 Conclusions Some Theory Empirical Applications Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Existing Methods: FWE Control Bonferroni method (single-step): p̂T ,s is individual p-value for testing hypothesis Hs Reject Hs if p̂T ,s ≤ α/S Can be very conservative Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Existing Methods: FWE Control Bonferroni method (single-step): p̂T ,s is individual p-value for testing hypothesis Hs Reject Hs if p̂T ,s ≤ α/S Can be very conservative Holm method (stepwise): Order p-values from smallest to largest Reject H(s) if p̂T ,(j) ≤ α/(S − j + 1) for all j = 1, . . . , s More powerful than Bonferroni But can still be conservative Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Existing Methods: FWE Control Bonferroni method (single-step): p̂T ,s is individual p-value for testing hypothesis Hs Reject Hs if p̂T ,s ≤ α/S Can be very conservative Holm method (stepwise): Order p-values from smallest to largest Reject H(s) if p̂T ,(j) ≤ α/(S − j + 1) for all j = 1, . . . , s More powerful than Bonferroni But can still be conservative Problem: Methods are based on the individual p-values only They neglect the dependence structure across p-values Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Existing Methods: k-FWE Control Generalized Bonferroni method (single-step): p̂T ,s is individual p-value for testing hypothesis Hs Reject Hs if p̂T ,s ≤ kα/S Due to Hommel and Hoffmann (1988). Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Existing Methods: k-FWE Control Generalized Bonferroni method (single-step): p̂T ,s is individual p-value for testing hypothesis Hs Reject Hs if p̂T ,s ≤ kα/S Due to Hommel and Hoffmann (1988). Generalized Holm method (stepwise): Order p-values from smallest to largest Reject H(s) if p̂T ,(s) ≤ αj for all j = 1, . . . , s with αj = kα/S for j ≤ k kα/(S + k − j) for j > k Due to Lehmann and Romano (2005). Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Existing Methods: FDR Control Benjamini and Hochberg (1995) method: Let j ∗ = max j : p̂T ,(j) ≤ γj , where γj = jγ/S Reject H(1) , . . . , H(j ∗ ) Achieves FDR ≤ γ This is a stepup method, since it starts with the least significant hypothesis Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Existing Methods: FDR Control Benjamini and Hochberg (1995) method: Let j ∗ = max j : p̂T ,(j) ≤ γj , where γj = jγ/S Reject H(1) , . . . , H(j ∗ ) Achieves FDR ≤ γ This is a stepup method, since it starts with the least significant hypothesis Comments: Original proof assumes independence of p-values Validity has been extended to certain dependence types Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Existing Methods: FDR Control Benjamini and Hochberg (1995) method: Let j ∗ = max j : p̂T ,(j) ≤ γj , where γj = jγ/S Reject H(1) , . . . , H(j ∗ ) Achieves FDR ≤ γ This is a stepup method, since it starts with the least significant hypothesis Comments: Original proof assumes independence of p-values Validity has been extended to certain dependence types Still missing: Method to account for unknown dependence structure. Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Existing Methods: FDP Control Generalized Holm method: Order p-values from smallest to largest Reject H(s) if p̂T ,(s) ≤ αj for all j = 1, . . . , s with αj = (bγjc + 1)α S + bγjc + 1 − j Achieves P{FDP > γ} ≤ α Due to Lehmann and Romano (2005). Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions Existing Methods: FDP Control Generalized Holm method: Order p-values from smallest to largest Reject H(s) if p̂T ,(s) ≤ αj for all j = 1, . . . , s with αj = (bγjc + 1)α S + bγjc + 1 − j Achieves P{FDP > γ} ≤ α Due to Lehmann and Romano (2005). Comments: Valid under certain dependence types, e.g., independence Can be made always valid by multiplying the αj with a common constant =⇒ very conservative in general Problem Formulation Existing Methods New Methods Outline 1 Problem Formulation 2 Existing Methods 3 New Methods 4 Some Theory 5 Empirical Applications 6 Conclusions Some Theory Empirical Applications Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications New Methods: k -FWE Control Context: One-sided tests Hs : θs ≤ θ0,s vs. Hs0 : θs > θ0,s Test statistics zT ,s = (wT ,s − θ0,s )/σ̂T ,s We start with a single-step method to build intuition: What is the ideal common critical value, called d1 ? Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions New Methods: k -FWE Control Context: One-sided tests Hs : θs ≤ θ0,s vs. Hs0 : θs > θ0,s Test statistics zT ,s = (wT ,s − θ0,s )/σ̂T ,s We start with a single-step method to build intuition: What is the ideal common critical value, called d1 ? It’s the (1 − α) quantile under P of k-max{(wT ,s − θs )/σ̂T ,s } Then reject all Hs for which zT ,s ≥ d1 Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions New Methods: k -FWE Control Context: One-sided tests Hs : θs ≤ θ0,s vs. Hs0 : θs > θ0,s Test statistics zT ,s = (wT ,s − θ0,s )/σ̂T ,s We start with a single-step method to build intuition: What is the ideal common critical value, called d1 ? It’s the (1 − α) quantile under P of k-max{(wT ,s − θs )/σ̂T ,s } Then reject all Hs for which zT ,s ≥ d1 Equivalent to inverting the generalized confidence region [wT ,1 − σ̂T ,1 d1 , ∞) × . . . × [wT ,S − σ̂T ,S d1 , ∞) Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions New Methods: k -FWE Control Context: One-sided tests Hs : θs ≤ θ0,s vs. Hs0 : θs > θ0,s Test statistics zT ,s = (wT ,s − θ0,s )/σ̂T ,s We start with a single-step method to build intuition: What is the ideal common critical value, called d1 ? It’s the (1 − α) quantile under P of k-max{(wT ,s − θs )/σ̂T ,s } Then reject all Hs for which zT ,s ≥ d1 Equivalent to inverting the generalized confidence region [wT ,1 − σ̂T ,1 d1 , ∞) × . . . × [wT ,S − σ̂T ,S d1 , ∞) Feasible solution: Use the bootstrap to estimate d1 as the (1 − α) quantile under P̂T of k-max{(wT∗ ,s − wT ,s )/σ̂T∗ ,s } Important: P̂T is an unrestricted estimate of P Problem Formulation Existing Methods New Methods Some Theory Empirical Applications New Methods: k -FWE Control Power can be increased by considering a stepwise method! Imagine R1 ≥ k hypotheses were rejected in the first step. What should be the critical value, called d2 , in step two? Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications New Methods: k -FWE Control Power can be increased by considering a stepwise method! Imagine R1 ≥ k hypotheses were rejected in the first step. What should be the critical value, called d2 , in step two? Let K be an index set corresponding to k − 1 of the rejected hypotheses and all remaining hypotheses dK is the (1 − α) quantile of k-maxs∈K {(wT ,s − θs )/σ̂T ,s } h i R1 Then d2 = max{dK } there are k−1 such dK Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications New Methods: k -FWE Control Power can be increased by considering a stepwise method! Imagine R1 ≥ k hypotheses were rejected in the first step. What should be the critical value, called d2 , in step two? Let K be an index set corresponding to k − 1 of the rejected hypotheses and all remaining hypotheses dK is the (1 − α) quantile of k-maxs∈K {(wT ,s − θs )/σ̂T ,s } h i R1 Then d2 = max{dK } there are k−1 such dK Comments: Again, d2 can be estimated via the bootstrap in practice R1 In case k−1 is too large, the method can be made operative by maximizing over a feasible subset Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications New Methods: k -FWE Control (k -StepM Method) Power can be increased by considering a stepwise method! Imagine R1 ≥ k hypotheses were rejected in the first step. What should be the critical value, called d2 , in step two? Let K be an index set corresponding to k − 1 of the rejected hypotheses and all remaining hypotheses dK is the (1 − α) quantile of k-maxs∈K {(wT ,s − θs )/σ̂T ,s } h i R1 Then d2 = max{dK } there are k−1 such dK Comments: Again, d2 can be estimated via the bootstrap in practice R1 In case k−1 is too large, the method can be made operative by maximizing over a feasible subset Continue with such steps until no further rejections occur . . . Conclusions Problem Formulation Existing Methods New Methods Some Theory New Methods: FDP Control Illustrate the method using γ = 0.1: Start with 1-FWE control (at level α) Empirical Applications Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications New Methods: FDP Control Illustrate the method using γ = 0.1: Start with 1-FWE control (at level α) If less then 9 hypotheses are rejected, stop Otherwise, move on to 2-FWE control (at level α) Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications New Methods: FDP Control Illustrate the method using γ = 0.1: Start with 1-FWE control (at level α) If less then 9 hypotheses are rejected, stop Otherwise, move on to 2-FWE control (at level α) If less then 19 hypotheses are rejected, stop Otherwise, move on to 3-FWE control (at level α) And so on . . . Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications New Methods: FDP Control Illustrate the method using γ = 0.1: Start with 1-FWE control (at level α) If less then 9 hypotheses are rejected, stop Otherwise, move on to 2-FWE control (at level α) If less then 19 hypotheses are rejected, stop Otherwise, move on to 3-FWE control (at level α) And so on . . . General stopping rule at any given step j: Stop if # rejections < j/γ − 1 Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions New Methods: FDP Control (FDP-StepM Method) Illustrate the method using γ = 0.1: Start with 1-FWE control (at level α) If less then 9 hypotheses are rejected, stop Otherwise, move on to 2-FWE control (at level α) If less then 19 hypotheses are rejected, stop Otherwise, move on to 3-FWE control (at level α) And so on . . . General stopping rule at any given step j: Stop if # rejections < j/γ − 1 Comments: Works for any ‘underlying’ k-FWE controlling method But in the interest of power, use the k-StepM method Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Alternative to FDR Control: Median(FDP) Control The FDR is given by E(FDP). Alternatively, control Median(FDP) by choosing α = 0.5: P{FDP > γ} ≤ 0.5 =⇒ Median(FDP) ≤ γ Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions Alternative to FDR Control: Median(FDP) Control The FDR is given by E(FDP). Alternatively, control Median(FDP) by choosing α = 0.5: P{FDP > γ} ≤ 0.5 =⇒ Median(FDP) ≤ γ Some advantages: Valid for any dependence structure Implicitly accounts for true dependence to improve power Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions Alternative to FDR Control: Median(FDP) Control The FDR is given by E(FDP). Alternatively, control Median(FDP) by choosing α = 0.5: P{FDP > γ} ≤ 0.5 =⇒ Median(FDP) ≤ γ Some advantages: Valid for any dependence structure Implicitly accounts for true dependence to improve power Caveat: Both the FDR and Median(FDP) are central tendencies of the sampling distribution of the FDP The realized FDP can easily be greater than γ Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions Augmentation Methods van der Laan et al. have a different approach to account for dependence while controlling generalized error rates. k -FWE control: Start out controlling the FWE (accounting for dependence) Assume R hypotheses have been rejected Then also reject the next k − 1 most significant hypotheses Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions Augmentation Methods van der Laan et al. have a different approach to account for dependence while controlling generalized error rates. k -FWE control: Start out controlling the FWE (accounting for dependence) Assume R hypotheses have been rejected Then also reject the next k − 1 most significant hypotheses FDP control: Start out controlling the FWE (accounting for dependence) Assume R hypotheses have been rejected Then also reject the next D most significant hypotheses, where D is the largest integer satisfying D ≤γ D +R Problem Formulation Existing Methods New Methods Outline 1 Problem Formulation 2 Existing Methods 3 New Methods 4 Some Theory 5 Empirical Applications 6 Conclusions Some Theory Empirical Applications Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Validity of the Bootstrap Methods Assumption √ The sampling distribution of T (WT − θ ) under P converges to a continuous limit distribution The bootstrap consistently estimates this distribution √ √ T σ̂T ,s and T σ̂T∗ ,s converge to the same constant in probability (for s = 1, . . . , S) Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions Validity of the Bootstrap Methods Assumption √ The sampling distribution of T (WT − θ ) under P converges to a continuous limit distribution The bootstrap consistently estimates this distribution √ √ T σ̂T ,s and T σ̂T∗ ,s converge to the same constant in probability (for s = 1, . . . , S) Theorem (i) If θs > 0, then Hs will be rejected with prob. → 1 as T → ∞ (ii) The bootstrap methods asymptotically control the k-FWE and the FDP at level α Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Modification & Extension Modification to two-sided tests: Hs : θs = θ0,s vs. Hs0 : θs 6= θ0,s Test statistics are now |zT ,s | Critical value in the first step, say, is the (1 − α) quantile of the distribution of k-max{|wT ,s − θs |/σ̂T ,s } Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions Modification & Extension Modification to two-sided tests: Hs : θs = θ0,s vs. Hs0 : θs 6= θ0,s Test statistics are now |zT ,s | Critical value in the first step, say, is the (1 − α) quantile of the distribution of k-max{|wT ,s − θs |/σ̂T ,s } Extension to non-standard cases: The bootstrap does not always work Often this√happens when the rate of convergence is not equal to T or when the limiting distribution is not normal Example: Manski’s (1975) maximum score estimator In such cases, one can use subsampling instead Problem Formulation Existing Methods New Methods Outline 1 Problem Formulation 2 Existing Methods 3 New Methods 4 Some Theory 5 Empirical Applications 6 Conclusions Some Theory Empirical Applications Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Application 1: Hedge Fund Evaluation Set-up: Use S = 210 hedge funds in CISDM database with complete return history from 01/1994 until 12/2003 (so T = 120) The common benchmark is the riskfree rate Consider absolute performance: θs = µs − µB Tests are one-sided and θ0,s = 0 always Question: How many funds are identified as ‘outperformers’ using different methods and error rates? Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions Application 1: Hedge Fund Evaluation Number of funds identified as ‘outperformers’: gHolm 1-FWE 3-FWE FDP0.1 FDR0.1 Naı̈ve 5% 10 16 13 10% 13 22 22 No α 101 102 130 Bootstrap 1-FWE 3-FWE FDP0.1 FDPMed 0.1 Naı̈ve 5% 11 29 17 10% 16 33 36 50% 127 102 130 Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Application 2: Multiple Regression Set-up: Mincer regression with log-wage as response variable and a combined S = 291 explanatory variables There are a total of T = 4, 975 persons in the sample From Austrian Social Security data base on 08/10/2001 Tests are two-sided and θ0,s = 0 always Question: How many explanatory variables are identified as ‘important’ using different methods and error rates? Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions Application 2: Multiple Regression Number of explanatory variables identified as ‘important’: gHolm 1-FWE 5-FWE FDP0.1 FDR0.1 Naı̈ve 0.05 0 9 0 0.10 0 11 0 No α 16 23 33 Bootstrap 1-FWE 5-FWE FDP0.1 FDPMed 0.1 Naı̈ve 0.05 5 12 5 0.10 6 12 6 0.50 12 23 33 Problem Formulation Existing Methods New Methods Outline 1 Problem Formulation 2 Existing Methods 3 New Methods 4 Some Theory 5 Empirical Applications 6 Conclusions Some Theory Empirical Applications Conclusions Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions Conclusions Methodology: Generalized error rates result in greater power by relaxing the strict FWE criterion Bootstrap methods that account for the dependence structure improve upon existing methods based on the individual p-values Stepwise methods provide a ‘free lunch’ compared to their single-step counterparts Problem Formulation Existing Methods New Methods Some Theory Empirical Applications Conclusions Conclusions Methodology: Generalized error rates result in greater power by relaxing the strict FWE criterion Bootstrap methods that account for the dependence structure improve upon existing methods based on the individual p-values Stepwise methods provide a ‘free lunch’ compared to their single-step counterparts Moral: A wide array of powerful multiple testing methods are now available, satisfying different needs There should be no more excuse for data snooping!