(Better) Bootstrap Confidence Intervals

advertisement
TAU Bootstrap Seminar 2011
Dr. Saharon Rosset
(Better)
Bootstrap Confidence Intervals
Shachar Kaufman
Based on Efron and Tibshirani’s
“An introduction to the bootstrap”
Chapter 14
Agenda
• What’s wrong with the simpler intervals?
• The (nonparametric) BCa method
• The (nonparametric) ABC method
– Not really
Example: simpler intervals are bad
πœƒ ≔ var 𝐴
𝑛
𝑖=0 𝐴𝑖 − 𝐴
πœƒβ‰”
𝑛
2
Example: simpler intervals are bad
Under the assumption that
𝐴𝑖 , 𝐡𝑖 ~𝒩 πœ‡, Σ i.i.d.
Under the assumption that
𝐴𝑖 , 𝐡𝑖 ~𝐹 i.i.d.
 Have exact analytical interval
 Can do parametric-bootstrap
 Can do nonparametric bootstrap
Why are the simpler intervals bad?
• Standard (normal) confidence interval
assumes symmetry around πœƒ
• Bootstrap-t often erratic in practice
– “Cannot be recommended for general nonparametric
problems”
• Percentile suffers from low coverage
– Assumes nonp. distribution of πœƒ ∗ is representative of
πœƒ (e.g. has mean πœƒ like πœƒ does)
• Standard & percentile methods assume
homogenous behavior of πœƒ, whatever πœƒ is
– (e.g. standard deviation of πœƒ does not change with πœƒ)
A more flexible inference model
Account for higher-order
statistics
Mean
Standard deviation
Skewness
πœƒ∗
A more flexible inference model
• If πœƒ~𝒩 πœƒ, 𝜎 2 doesn’t work for the data, maybe we could
find a transform πœ™ ≔ π‘š πœƒ and constants 𝑧0 and π‘Ž for
which we can accept that
πœ™~𝒩 πœ™ − 𝑧0 πœŽπœ™ , πœŽπœ™2
πœŽπœ™ ≔ 1 + π‘Žπœ™
• Additional unknowns
– π‘š ⋅ allows a flexible parameter-description scale
– 𝑧0 allows bias: β„™ πœ™ < πœ™ = Φ π‘§0
– π‘Ž allows “𝜎 2 ” to change with πœƒ
• As we know, “more flexible” is not necessarily “better”
• Under broad conditions, in this case it is (TBD)
Where does this new model lead?
πœ™~𝒩 πœ™ − 𝑧0 πœŽπœ™ , πœŽπœ™2
πœŽπœ™ ≔ 1 + π‘Žπœ™
𝛼
Assume known π‘Ž and 𝑧0 = 0, and initially that πœ™ = πœ™π‘™π‘œ,0 ≔ 0, hence
πœŽπœ™,0 ≔ 1
Calculate a standard 𝛼-confidence endpoint from this
𝛼
𝑧 𝛼 ≔ Φ−1 𝛼 , πœ™π‘™π‘œ,1 ≔ 𝑧 𝛼 πœŽπœ™,0 = 𝑧
Now reexamine the actual stdev, this time assuming that
𝛼
πœ™ = πœ™π‘™π‘œ,1
According to the model, it will be
𝛼
πœŽπœ™,1 ≔ 1 + π‘Žπœ™π‘™π‘œ,1 = 1 + π‘Žπ‘§
𝛼
𝛼
Where does this new model lead?
πœ™~𝒩 πœ™ − 𝑧0 πœŽπœ™ , πœŽπœ™2
πœŽπœ™ ≔ 1 + π‘Žπœ™
Ok but this leads to an updated endpoint
𝛼
πœ™π‘™π‘œ,2 ≔ 𝑧 𝛼 πœŽπœ™,1 = 𝑧 𝛼 1 + π‘Žπ‘§
Which leads to an updated
𝛼
𝛼
𝛼
𝛼
𝛼
2
πœŽπœ™,2 = 1 + π‘Žπ‘§
1 + π‘Žπ‘§
= 1 + π‘Žπ‘§ + π‘Žπ‘§
If we continue iteratively to infinity this way we end up with
the confidence interval endpoint
𝛼
𝑧
𝛼
πœ™π‘™π‘œ,∞ =
1 − π‘Žπ‘§ 𝛼
Where does this new model lead?
• Do this exercise considering 𝑧0 ≠ 0 and get
𝛼
πœ™lo,∞
• Similarly for
𝑧0 + 𝑧 𝛼
= 𝑧0 +
1 − π‘Ž 𝑧0 + 𝑧
𝛼
πœ™up,∞
with 𝑧
1−𝛼
𝛼
Enter BCa
• “Bias-corrected and accelerated”
• Like percentile confidence interval
– Both ends are percentiles πœƒ ∗
bootstap instances of πœƒ ∗
– Just not the simple
𝛼1 ≔ 𝛼
𝛼2 ≔ 1 − 𝛼
𝛼1
, πœƒ∗
𝛼2
of the 𝐡
BCa
• Instead
𝑧0 + 𝑧 𝛼
𝛼1 ≔ Φ π‘§0 +
1 − π‘Ž 𝑧0 + 𝑧
𝛼
𝑧0 + 𝑧 1−𝛼
𝛼2 ≔ Φ π‘§0 +
1 − π‘Ž 𝑧0 + 𝑧 1−𝛼
• 𝑧0 and π‘Ž are parameters we will estimate
– When both zero, we get the good-old percentile CI
• Notice we never had to explicitly find πœ™ ≔ π‘š πœƒ
BCa
• 𝑧0 tackles bias β„™ πœ™ < πœ™ = Φ π‘§0
𝑧0 ≔ Φ−1
# πœƒ∗ 𝑏 < πœƒ
𝐡
(since π‘š is monotone)
• π‘Ž accounts for a standard deviation of πœƒ which
varies with πœƒ (linearly, on the “normal scale” πœ™)
BCa
• One suggested estimator for π‘Ž is via the jackknife
𝑛
𝑖=1
π‘Žβ‰”
6
where πœƒ
and πœƒ
⋅
𝑖
≔
𝑛
𝑖=1
πœƒ
πœƒ
𝑖
𝑖
−πœƒ
−πœƒ
3
⋅
2 1.5
⋅
≔ 𝑑 π‘₯ without sample 𝑖
1
𝑛
𝑛
𝑖=1 πœƒ 𝑖
• You won’t find the rationale behind this formula in the
book (though it is clearly related to one of the standard
ways to define skewness)
Theoretical advantages of BCa
• Transformation respecting
– If the interval for πœƒ is πœƒlo , πœƒup then the interval
for a monotone 𝑒 πœƒ is 𝑒 πœƒlo , 𝑒 πœƒup
– So no need to worry about finding transforms of πœƒ
where confidence intervals perform well
• Which is necessary in practice with bootstrap-t CI
• And with the standard CI (e.g. Fisher corrcoeff trans.)
• Percentile CI is transformation respecting
Theoretical advantages of BCa
• Accuracy
𝛼
– We want πœƒlo s.t. β„™ πœƒ < πœƒlo = 𝛼
– But a practical πœƒlo is an approximation where
𝛼
β„™ πœƒ < πœƒlo ≅ 𝛼
– BCa (and bootstrap-t) endpoints are “second order
accurate”, where
1
𝛼
β„™ πœƒ < πœƒlo = 𝛼 + 𝑂
𝑛
– This is in contrast to the standard and percentile
1
methods which only converge at rate (“first order
𝑛
accurate”)  errors one order of magnitude greater
But BCa is expensive
• The use of direct bootstrapping to calculate
delicate statistics such as 𝑧0 and π‘Ž requires a
large 𝐡 to work satisfactorily
• Fortunately, BCa can be analytically
approximated (with a Taylor expansion, for
differentiable 𝑑 π‘₯ ) so that no Monte Carlo
simulation is required
• This is the ABC method which retains the good
theoretical properties of BCa
The ABC method
• Only an introduction (Chapter 22)
• Discusses the “how”, not the “why”
• For additional details see Diciccio and Efron
1992 or 1996
The ABC method
• Given the estimator in resampling form
πœƒ=𝑇 𝑃
– Recall 𝑃, the “resampling vector”, is an 𝑛 dimensional
random variable with components 𝑃𝑗 ≔ β„™ π‘₯𝑗 = π‘₯1∗
– Recall
𝑃0
≔
1 1
1
, ,…,
𝑛 𝑛
𝑛
• Second-order Taylor analysis of the estimate
– as a function of the bootstrap resampling
methodology
𝑇 𝑃
𝑇 𝑃
𝑇𝑖 ≔ 𝐽𝑖𝑖
, 𝑇𝑖 ≔ 𝐻𝑖𝑖𝑖
0
0
𝑃=𝑃
𝑃=𝑃
The ABC method
• Can approximate all the BCa parameter estimates (i.e.
estimate the parameters in a different way)
1
𝑛2
– 𝜎=
– π‘Ž=
1
6
𝑛
2
𝑇
𝑖=1 𝑖
1
2
𝑛
3
𝑖=1 𝑇𝑖
2
𝑛
2 3
𝑖=1 𝑇𝑖
– 𝑧0 = π‘Ž − 𝛾, where
𝑏
• 𝛾 ≔ 𝜎 − π‘π‘ž
1
• 𝑏 ≔ 2𝑛2
𝑛
𝑖=1 𝑇𝑖
• π‘π‘ž ≔something akin to a Hessian component but along a specific
direction not perpendicular to any natural axis (the “least favorable
family” direction)
The ABC method
• And the ABC interval endpoint
πœƒπ΄π΅πΆ
πœ†π›Ώ
1−𝛼 ≔𝑇 𝑃 +
𝜎
0
• Where
–πœ†β‰”
–𝛿≔
πœ”
1−π‘Žπœ” 2
𝑇 𝑃0
with πœ” ≔ 𝑧0 + 𝑧
1−𝛼
• Simple and to the point, aint it?
Download