TAU Bootstrap Seminar 2011 Dr. Saharon Rosset (Better) Bootstrap Confidence Intervals Shachar Kaufman Based on Efron and Tibshirani’s “An introduction to the bootstrap” Chapter 14 Agenda • What’s wrong with the simpler intervals? • The (nonparametric) BCa method • The (nonparametric) ABC method – Not really Example: simpler intervals are bad π β var π΄ π π=0 π΄π − π΄ πβ π 2 Example: simpler intervals are bad Under the assumption that π΄π , π΅π ~π© π, Σ i.i.d. Under the assumption that π΄π , π΅π ~πΉ i.i.d. ο¨ Have exact analytical interval ο¨ Can do parametric-bootstrap ο¨ Can do nonparametric bootstrap Why are the simpler intervals bad? • Standard (normal) confidence interval assumes symmetry around π • Bootstrap-t often erratic in practice – “Cannot be recommended for general nonparametric problems” • Percentile suffers from low coverage – Assumes nonp. distribution of π ∗ is representative of π (e.g. has mean π like π does) • Standard & percentile methods assume homogenous behavior of π, whatever π is – (e.g. standard deviation of π does not change with π) A more flexible inference model Account for higher-order statistics Mean Standard deviation Skewness π∗ A more flexible inference model • If π~π© π, π 2 doesn’t work for the data, maybe we could find a transform π β π π and constants π§0 and π for which we can accept that π~π© π − π§0 ππ , ππ2 ππ β 1 + ππ • Additional unknowns – π ⋅ allows a flexible parameter-description scale – π§0 allows bias: β π < π = Φ π§0 – π allows “π 2 ” to change with π • As we know, “more flexible” is not necessarily “better” • Under broad conditions, in this case it is (TBD) Where does this new model lead? π~π© π − π§0 ππ , ππ2 ππ β 1 + ππ πΌ Assume known π and π§0 = 0, and initially that π = πππ,0 β 0, hence ππ,0 β 1 Calculate a standard πΌ-confidence endpoint from this πΌ π§ πΌ β Φ−1 πΌ , πππ,1 β π§ πΌ ππ,0 = π§ Now reexamine the actual stdev, this time assuming that πΌ π = πππ,1 According to the model, it will be πΌ ππ,1 β 1 + ππππ,1 = 1 + ππ§ πΌ πΌ Where does this new model lead? π~π© π − π§0 ππ , ππ2 ππ β 1 + ππ Ok but this leads to an updated endpoint πΌ πππ,2 β π§ πΌ ππ,1 = π§ πΌ 1 + ππ§ Which leads to an updated πΌ πΌ πΌ πΌ πΌ 2 ππ,2 = 1 + ππ§ 1 + ππ§ = 1 + ππ§ + ππ§ If we continue iteratively to infinity this way we end up with the confidence interval endpoint πΌ π§ πΌ πππ,∞ = 1 − ππ§ πΌ Where does this new model lead? • Do this exercise considering π§0 ≠ 0 and get πΌ πlo,∞ • Similarly for π§0 + π§ πΌ = π§0 + 1 − π π§0 + π§ πΌ πup,∞ with π§ 1−πΌ πΌ Enter BCa • “Bias-corrected and accelerated” • Like percentile confidence interval – Both ends are percentiles π ∗ bootstap instances of π ∗ – Just not the simple πΌ1 β πΌ πΌ2 β 1 − πΌ πΌ1 , π∗ πΌ2 of the π΅ BCa • Instead π§0 + π§ πΌ πΌ1 β Φ π§0 + 1 − π π§0 + π§ πΌ π§0 + π§ 1−πΌ πΌ2 β Φ π§0 + 1 − π π§0 + π§ 1−πΌ • π§0 and π are parameters we will estimate – When both zero, we get the good-old percentile CI • Notice we never had to explicitly find π β π π BCa • π§0 tackles bias β π < π = Φ π§0 π§0 β Φ−1 # π∗ π < π π΅ (since π is monotone) • π accounts for a standard deviation of π which varies with π (linearly, on the “normal scale” π) BCa • One suggested estimator for π is via the jackknife π π=1 πβ 6 where π and π ⋅ π β π π=1 π π π π −π −π 3 ⋅ 2 1.5 ⋅ β π‘ π₯ without sample π 1 π π π=1 π π • You won’t find the rationale behind this formula in the book (though it is clearly related to one of the standard ways to define skewness) Theoretical advantages of BCa • Transformation respecting – If the interval for π is πlo , πup then the interval for a monotone π’ π is π’ πlo , π’ πup – So no need to worry about finding transforms of π where confidence intervals perform well • Which is necessary in practice with bootstrap-t CI • And with the standard CI (e.g. Fisher corrcoeff trans.) • Percentile CI is transformation respecting Theoretical advantages of BCa • Accuracy πΌ – We want πlo s.t. β π < πlo = πΌ – But a practical πlo is an approximation where πΌ β π < πlo ≅ πΌ – BCa (and bootstrap-t) endpoints are “second order accurate”, where 1 πΌ β π < πlo = πΌ + π π – This is in contrast to the standard and percentile 1 methods which only converge at rate (“first order π accurate”) ο¨ errors one order of magnitude greater But BCa is expensive • The use of direct bootstrapping to calculate delicate statistics such as π§0 and π requires a large π΅ to work satisfactorily • Fortunately, BCa can be analytically approximated (with a Taylor expansion, for differentiable π‘ π₯ ) so that no Monte Carlo simulation is required • This is the ABC method which retains the good theoretical properties of BCa The ABC method • Only an introduction (Chapter 22) • Discusses the “how”, not the “why” • For additional details see Diciccio and Efron 1992 or 1996 The ABC method • Given the estimator in resampling form π=π π – Recall π, the “resampling vector”, is an π dimensional random variable with components ππ β β π₯π = π₯1∗ – Recall π0 β 1 1 1 , ,…, π π π • Second-order Taylor analysis of the estimate – as a function of the bootstrap resampling methodology π π π π ππ β π½ππ , ππ β π»πππ 0 0 π=π π=π The ABC method • Can approximate all the BCa parameter estimates (i.e. estimate the parameters in a different way) 1 π2 – π= – π= 1 6 π 2 π π=1 π 1 2 π 3 π=1 ππ 2 π 2 3 π=1 ππ – π§0 = π − πΎ, where π • πΎ β π − ππ 1 • π β 2π2 π π=1 ππ • ππ βsomething akin to a Hessian component but along a specific direction not perpendicular to any natural axis (the “least favorable family” direction) The ABC method • And the ABC interval endpoint ππ΄π΅πΆ ππΏ 1−πΌ βπ π + π 0 • Where –πβ –πΏβ π 1−ππ 2 π π0 with π β π§0 + π§ 1−πΌ • Simple and to the point, aint it?