Bootstrapping Bruce Dunham∗ 1 Marie Auger-Méthé† Bootstrapping motivation In some situations it is difficult to make assumptions about the sampling distribution of a statistic. In particular, there may be no obvious way to robustly assess the variance of an estimator. Example 1. Sonoda et al. (2011) report experiments involving a remarkable labrador dog, named Marine, that apparently has been trained to detect the presence or absence of bowel and prostate cancer in humans. Samples of breath and faeces were taken from a pool of 48 people with confirmed bowel cancer and 258 disease–free controls. In one study, Marine sniffed five breath samples, only one of which was from a cancer patient. To identify the cancerous sample, she was required to sit in front of it and ignore the others. She did this correctly in 33 out of 36 trials. How accurate is Marine? Example 2. Sniffer dogs in Germany have been trained to detect the presence or absence of lung cancer in humans, including humans without lung cancer but suffering from chronic obstructive pulmonary disease (COPD). The dogs can apparently detect lung cancer in human breath. In part of the study, the dogs were able to detect correctly the absence of cancer 372 times out of 400 subjects who did not have lung cancer. How accurate are these dogs? 1.1 Introducing the bootstrap Suppose we have a sample of size n, which we denote Sn = {x1 , x2 , . . . , xn } . For this sample, let t denote the value of a statistic T of interest. For instance, t could be the sample mean, median or variance. We draw M simple random samples with replacement from Sn . That is, each observation in a bootstrap sample is taken by picking one of {x1 , x2 , . . . , xn } at random, then replacing that observation before selecting the next observation. For each sample selected, the statistic t is found. In this way a set of M values of T are drawn, which we denote {t∗1 , t∗2 , . . . , t∗M } . These M values comprise the empirical bootstrap distribution (EBD) for T. The empirical bootstrap distribution provides an approximation to the bootstrap distribution, which is the distribution of values of T that would arise if all possible samples with replacement were taken from Sn . In fact there are 2n − 1 n such samples, so it is only feasible to compute the bootstrap distribution for small n. As long as M is large, the EBD provides a close approximation to the bootstrap distribution. Computer software can make implementing bootstrapping relatively simple. Example 3. Prof Statto is concerned about his systolic blood pressure. He has been told by his doctor that hypertension is diagnosed when systolic blood pressure reaches 140 mmHg. Prof. Statto measures his blood pressure eight times in a day, and observes the following measurements: ∗ 99.99% written by Bruce!!, Department of Statistics, University of British Columbia of Statistics, Institute for the Oceans and Fisheries, University of British Columbia † Department 1 139, 135, 144, 134, 139, 132, 132, 137 Prof Statto wonders whether a 95% confidence interval for the mean would contain the value 140 mmHg. Based on his data, he considers creating a confidence interval using Normal distribution theory and the t distribution, but he recognizes that the approach may be unsound if the data are non–Normal or if the values are not independent from one another. Due to the small sample size, it is hard to assess Normality, and it is unlikely that blood pressure readings from the same day are independent from one another. Thus, he opts for bootstrapping, and resamples from his data one hundred times, computing the mean each time as follows: Sample 1: Sample 2: .. . .. . Sample 100: 1 132 139 132 2 134 135 139 3 135 137 132 4 132 132 132 5 139 135 132 6 132 132 144 7 139 132 132 8 144 134 139 Mean 135.88 134.50 .. . .. . 135.25 Based on the above, the empirical bootstrap distribution for x̄ is illustrated by the histogram below: 25 Frequency 20 15 10 5 0 134 136 138 140 Bootstrap mean (in mmHg) The term “bootstrap” was coined by Bradley Efron, to whom the idea is usually attributed. See Efron (1979) for the first paper on bootstrap sampling. The methods we describe here are nonparametric, in that no explicit model is ever assumed for our data. Parametric bootstrapping, where we repeatedly sample from a model, is not discussed, but see Davison and Hinkley (1997) for a fuller description of bootstrapping methods. 1.2 Properties of the Empirical Bootstrap Distribution (EBD) Some general properties of the EBD are summarised below: 2 1. The EBD is centred on the sample value t. 2. The mean of the EBD is an estimate of the mean of the sampling distribution of T over the bootstrap distribution. 3. The standard deviation of the EBD estimates the standard deviation of T. 4. For α ∈ (0, 1), the 100α/2th and 100 (1 − α/2)th percentile points of the EBD give a 100(1 − α) % bootstrap confidence for the parameter estimated by the statistic T. Example 4. Prof Statto finds the average of his 100 bootstrap samples is 136.56 mmHg, very close to the sample mean of 136.50 mmHg. The standard deviation of the means is 1.29 mmHg, which is smaller here than the estimated standard deviation of x̄ that would be found using 4.11 s √ = √ = 1.45. n 8 To obtain a 95% bootstrap interval for Prof Statto’s mean blood pressure, he sorts his 100 bootstrap sample means into order. The 2 12 − percentile is taken as the mean of the second and third smallest values, which is 134.19 mmHg. Similarly the 97 12 −percentile of the EBD is found by averaging the 97th and 98th in order, which in this case is 139.06 mmHg. Hence the 95% bootstrap interval is (134.19, 139.06) . This interval does not contain 140 mmHg. The bootstrap approach is most effective when it is difficult to specify the sampling distribution of the statistic T. As such, unless we have a very small sample (like in Prof Statto’s case), the bootstrap may not be so useful when T is the sample mean. For in that case, sampling theory gives good information about the sampling distribution of x̄, and its variance in particular. Remark 1. Those taking STAT 344 may learn that when sampling with replacement from a finite population Sn of size n with mean X̄ and standard deviation S, the mean x̄ of a sample of size n is unbiased for X̄. Moreover, the variance of x̄ is S2 1 S2 Var (x̄) = 1− ≈ . n n n In the case when Sn is itself a random sample from some distribution with variance σ 2 , then S 2 is an unbiased estimator of σ 2 , and we can conclude that S 2 /n would be an unbiased estimator of σ 2 /n, the variance of X̄. So when the variation in the sample mean is of interest, there may be no great benefit from using the bootstrap as long as n is not very small. Example 5. Let us return to the example of the dog Marine, who correctly identified the presence of cancer in five breath samples 33 out of 36 times. A simple model for these data takes each trial as Bernoulli variable (1 for success, 0 for failure) with some unknown probability p of success. The total number of successes would then follow the B (36, p) distribution, assuming independence between each trial. Inference for a Binomial distribution could make use of the Normal approximation, in that the sample proportion p̂ approximately follows a Normal distribution: ! r p (1 − p) p̂ ∼ N p, . 36 The estimate p̂ may be used to replace p in the above to construct confidence intervals for p. We may be uncomfortable using that approach here though, largely since p is evidently quite close to 1 and the sampling distribution of p̂ must be skewed, as p ∈ (0, 1) . An alternative approach uses the bootstrap. We can repeatedly sample from the data with replacement, and record the sample proportion for each bootstrap sample. Suppose we take 200 samples with replacement from the data set. This provides an EBD for p̂. We could look at the 5th and 195th values to find a 95% bootstrap confidence interval for p. In one such bootstrap procedure, the 95% confidence interval was found to be (0.806, 1.00) . 3 1.3 Bootstrap hypothesis tests It is possible to use a suitable EBD to create a bootstrap alternative to a classical hypothesis test. The theory about how to choose the test statistic in bootstrap hypothesis test is not simple, so we restrict attention here to a case where the bootstrap test is similar to a well–known test. Recall the one–sample t test: a sample Sn = {x1 , x2 , . . . , xn } is taken at random from a distribution with mean µ. In testing the null hypothesis H0 : µ = µ0 , we construct the test statistic t= x̄ − µ0 √ s/ n in which x̄ and s are the sample mean and standard deviation respectively. Under the assumption that the data are from a Normal distribution, t follows the tn−1 distribution when H0 is true. The null hypothesis will be rejected when t is appreciably different from zero. Although this test is quite robust to departures from the assumptions, it may not be justifiable to use this approach when the sample size is small. There is a bootstrap version of the one–sample t–test. The studentized test statistic is found for each bootstrap sample as follows: 1. Draw a simple random sample with replacement from Sn . 2. Compute the mean and standard deviation, x̄∗ and s∗ from the bootstrap sample. 3. Compute the studentized test statistic t∗ = x̄∗ − x̄ √ . s∗ / n Repeating the above steps M times, the EBD for t is created. Note the difference in the definitions between t∗ and t: while t is centred on µ0 , t∗ is centred on x̄. The p–value for this bootstrap test is determined by the EBD and the alternative hypothesis. The cases are as follows: 1. When Ha : µ ̸= µ0 , the p–value is the proportion of t∗ values greater than |t| and less than –|t|. 2. When Ha : µ > µ0 , the p–value is the proportion of t∗ values greater than t. 3. When Ha : µ < µ0 , the p–value is the proportion of t∗ values less than t. Example 6. *Recall that Prof Statto is interested in his mean systolic blood pressure. He is worried that his mean blood pressure may have risen to 140 mmHg. In performing a test for the underlying mean, we are testing H0 : µ = 140 against Ha : µ < 140. The t–test statistic for the sample is t= 136.50 − 140 √ = −2.408 6. 4.11/ 8 We can take bootstrap samples from the data and compute the studentized test statistic t∗ = x̄∗ − 136.50 √ s∗ / 8 and see how many are less than −2.4086. An empirical bootstrap distribution based on 200 resamples gave 9 studentized test statistics smaller than −2.4086. The EBD for t∗ is shown below:* 4 60 Frequency 50 40 30 20 10 0 −6 −4 −2 0 2 4 Studentized t* statistic The bootstrap P–value for this test is 9 /200 = 0.045 and we would reject at the 5% significance level the null hypothesis that Prof Statto’s mean blood pressure is 140 mmHg. 1.4 Bootstrapping in R A package named bootstrap contains a command for bootstrapping for a statistic from a data set. Once the package is installed, the bootstrap command has the following syntax library(bootstrap) bootstrap(x, nboot, theta) in which xcontains the data set to be sampled from, nboot is the number of bootstrap samples to be taken, and theta is the statistic to be computed for each sample. The value thetastar gives the nboot bootstrap values of the statistic theta. Often theta is chosen to be a built–in R function, such as sd, median, and mean. To demonstrate the bootstrap function, let’s create a random sample of 100 from the standard Normal distribution. We use rnorm command (default parameter values are for the standard Normal) to create a vector x as follows: x <- rnorm(100) Now, we can create 500 bootstrap samples for the median from x using boot1 <- bootstrap(x, 500, median) This line of code creates an object called boot1. We can extract the actual bootstrap sample values by entering 5 boot1$thetastar ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## [1] [6] [11] [16] [21] [26] [31] [36] [41] [46] [51] [56] [61] [66] [71] [76] [81] [86] [91] [96] [101] [106] [111] [116] [121] [126] [131] [136] [141] [146] [151] [156] [161] [166] [171] [176] [181] [186] [191] [196] [201] [206] [211] [216] [221] [226] [231] [236] [241] [246] [251] -0.249847429 -0.034726387 -0.116482384 -0.026650675 0.008498139 -0.034726387 -0.034726387 -0.106285779 0.092941158 -0.002192376 0.008498139 0.008498139 -0.138445418 0.014190211 -0.066809926 -0.069402153 -0.143452839 -0.069402153 0.100657648 0.044988321 -0.111474963 -0.002192376 -0.066809926 -0.249847429 -0.043988558 -0.034726387 -0.026309857 -0.121671568 -0.138445418 -0.086547852 -0.018574963 -0.002192376 0.076779901 0.008498139 0.133977391 0.014190211 -0.186745547 0.011344175 -0.005038412 -0.111474963 -0.034726387 -0.018574963 -0.143452839 -0.098040571 -0.068106040 -0.069402153 -0.121671568 -0.066809926 0.100657648 -0.106285779 -0.087843966 0.092941158 -0.143452839 -0.186745547 -0.262059264 0.092941158 -0.116664147 -0.277306422 0.020937711 -0.005038412 -0.244294268 0.014190211 -0.068106040 0.068335133 0.014190211 -0.111474963 -0.075695267 -0.091737037 0.018091675 -0.018574963 -0.069402153 -0.013114124 -0.126678989 0.008498139 0.062291430 -0.111474963 -0.066809926 0.038240821 -0.050768156 0.140009457 -0.068106040 -0.106285779 -0.068106040 -0.026650675 -0.184227710 -0.121671568 -0.106285779 -0.262059264 0.068335133 -0.005038412 0.008498139 -0.018574963 -0.126678989 -0.050768156 0.100657648 -0.249847429 -0.069402153 -0.066809926 -0.288061515 -0.160226689 -0.005038412 -0.068106040 -0.034726387 -0.042692444 -0.249847429 -0.005038412 0.048010172 -0.005038412 0.038240821 -0.126678989 -0.249847429 -0.143452839 0.011344175 -0.262059264 -0.111474963 -0.005038412 -0.281740159 -0.186745547 0.014190211 -0.111474963 0.008498139 -0.013114124 -0.111474963 -0.034726387 -0.066809926 -0.160226689 -0.018574963 -0.246812106 -0.002192376 0.008498139 -0.201001560 -0.121671568 0.044988321 0.008498139 0.062291430 0.004555124 0.014190211 -0.010268088 0.120436617 -0.068106040 -0.042692444 -0.050768156 -0.002192376 -0.018574963 -0.034726387 -0.068106040 -0.018574963 0.065313282 -0.116664147 0.011344175 -0.265094587 -0.087843966 -0.106285779 6 0.014190211 -0.106285779 -0.106285779 -0.252882753 -0.005038412 -0.087843966 0.018091675 -0.005038412 0.008498139 0.128881385 0.008498139 0.076779901 -0.003520588 0.038240821 -0.052064270 -0.052064270 0.027685211 0.035394784 -0.013114124 -0.050768156 0.128881385 0.027685211 -0.050768156 -0.066809926 -0.246812106 -0.018574963 0.027685211 0.076779901 -0.126678989 -0.086547852 0.020937711 0.008498139 -0.050768156 -0.069402153 0.068335133 -0.050768156 -0.326778608 -0.068106040 -0.069402153 -0.069402153 0.011344175 -0.052064270 -0.116664147 -0.111474963 0.081474539 -0.201001560 0.011344175 0.027685211 -0.246812106 0.068335133 0.038240821 0.020937711 0.109601030 -0.111474963 -0.066809926 -0.111474963 0.288324026 -0.106285779 -0.052064270 -0.018574963 0.062291430 -0.066809926 -0.111474963 0.027685211 0.027685211 0.008498139 -0.277306422 -0.068106040 0.011344175 0.085224669 -0.241776431 0.027685211 -0.034726387 -0.034726387 -0.111474963 -0.026650675 -0.042692444 -0.096744457 -0.042692444 -0.121671568 -0.116664147 0.068335133 0.048010172 -0.018574963 -0.005038412 0.084496390 -0.249847429 0.048010172 0.008498139 -0.069402153 0.020937711 -0.026650675 -0.241776431 0.109601030 0.011344175 0.100657648 -0.002192376 -0.086547852 -0.246812106 -0.246812106 -0.116664147 -0.052064270 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## [256] [261] [266] [271] [276] [281] [286] [291] [296] [301] [306] [311] [316] [321] [326] [331] [336] [341] [346] [351] [356] [361] [366] [371] [376] [381] [386] [391] [396] [401] [406] [411] [416] [421] [426] [431] [436] [441] [446] [451] [456] [461] [466] [471] [476] [481] [486] [491] [496] 0.027685211 0.133977391 -0.116664147 -0.005038412 -0.241776431 0.020937711 -0.069402153 -0.066809926 0.044988321 -0.111474963 0.084496390 -0.093033150 -0.018574963 -0.126678989 -0.050768156 0.100657648 -0.066809926 -0.247329592 0.044988321 0.027685211 -0.281740159 0.062291430 0.011344175 0.014190211 0.014190211 -0.246812106 -0.066809926 -0.043988558 -0.282328729 0.027685211 -0.116664147 0.080831301 -0.050768156 0.062291430 0.179361267 0.008498139 0.056454940 0.011344175 -0.013114124 -0.265094587 -0.086547852 -0.138445418 0.027685211 0.092941158 -0.068106040 -0.143452839 -0.010268088 -0.174031105 -0.113518307 0.020937711 -0.247329592 -0.111474963 -0.121671568 0.008498139 -0.019562357 -0.018574963 -0.121671568 -0.050768156 -0.087843966 0.276341480 0.038240821 -0.050768156 0.011344175 -0.201001560 -0.069402153 -0.111474963 0.092941158 0.014190211 -0.026650675 -0.277306422 -0.203519397 0.044988321 -0.069402153 0.008498139 0.011344175 0.035394784 0.020937711 -0.098040571 -0.066809926 -0.066809926 -0.282034444 -0.005038412 0.068335133 0.038240821 -0.043988558 -0.052064270 -0.050768156 -0.116664147 -0.066809926 0.014190211 0.014190211 -0.184227710 -0.013114124 -0.066809926 0.136597874 -0.034726387 -0.087843966 -0.018574963 0.092941158 -0.026650675 0.065313282 -0.116664147 0.008498139 0.014190211 -0.013114124 -0.201001560 0.011344175 0.068335133 -0.018574963 0.199654546 0.044988321 -0.068106040 -0.018574963 -0.005038412 -0.111474963 0.008498139 -0.005038412 0.038240821 -0.052064270 -0.282328729 0.120436617 -0.111474963 -0.116482384 -0.143452839 -0.098040571 -0.070506083 0.117317519 0.044988321 -0.241776431 -0.093033150 0.219947824 0.038240821 0.014190211 -0.265094587 0.011344175 -0.052064270 0.014190211 -0.066809926 0.068335133 -0.244294268 0.020937711 -0.034726387 -0.262059264 0.100657648 0.068335133 0.014190211 -0.093033150 A histogram of these values can be created by entering 7 -0.013114124 -0.106285779 0.020937711 -0.277306422 -0.005038412 -0.068106040 -0.034726387 0.073758050 0.085224669 -0.091737037 -0.034726387 -0.126678989 -0.050768156 -0.186745547 -0.121671568 -0.282034444 0.044988321 0.175949684 -0.121671568 0.018091675 0.062291430 -0.121671568 0.065313282 -0.042692444 -0.133256234 -0.042692444 -0.066809926 -0.002192376 -0.121671568 -0.005038412 -0.206554721 -0.116664147 -0.126678989 0.064171429 -0.138445418 -0.066809926 0.273274670 -0.030452007 -0.111474963 0.109601030 -0.069402153 -0.026650675 -0.052064270 -0.018574963 -0.087843966 -0.069402153 -0.018574963 0.014190211 0.156669329 0.035394784 -0.121671568 -0.026650675 -0.186745547 -0.010268088 -0.050768156 0.038240821 -0.043988558 -0.018574963 -0.005038412 -0.068106040 -0.034726387 0.065313282 0.014190211 0.065313282 0.073758050 -0.086547852 -0.206554721 -0.087843966 -0.068106040 0.175949684 -0.005038412 -0.116482384 -0.066809926 0.008498139 0.008498139 -0.026650675 -0.111474963 -0.241776431 -0.069402153 -0.246812106 -0.116664147 0.008498139 -0.062430371 0.199654546 -0.160226689 -0.050768156 -0.005038412 0.018091675 -0.026650675 -0.111474963 0.065313282 -0.133256234 -0.066809926 -0.265094587 -0.068106040 -0.106285779 0.065313282 0.020937711 hist(boot1$thetastar) 60 40 0 20 Frequency 80 100 Histogram of boot1$thetastar −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 boot1$thetastar The commands sort and quantile can be useful for finding confidence intervals from an EBD. In some cases, it may be necessary to define a function for the statistic theta to be computed from each bootstrap sample. Example 7. If we enter Prof Statto’s blood pressure data into R in a vector called “StattoBlood”, the bootstrap t test above can be performed by defining a function as follows: StattoBlood <- c(139, 135, 144, 134, 139, 132, 132, 137) tteststat <- function(x){(mean(x)-136.50)/(sd(x)/sqrt(8))} results <- bootstrap(StattoBlood, 200, tteststat) A histogram of the results can be created via hist(results$thetastar, main = "Empirical Bootstrap Distribution", xlab = "Studentized t statistic") 8 40 30 0 10 20 Frequency 50 60 70 Empirical Bootstrap Distribution −6 −4 −2 0 2 4 Studentized t statistic 2 Exercises 1. Return to Prof Statto’s blood pressure readings. Suppose he is interested now in the standard deviation of his blood pressure, by which we refer to a measure of how variable his recorded blood pressure may be across repeated readings. Create an empirical bootstrap distribution for the sample standard deviation, based on 400 resamples. Find the mean of your ESD, and a 95% bootstrap confidence interval for the true standard deviation. 2. The following are readings (in picocuries) in eleven home radon detectors, after the detectors were exposed to 100 picocuries of radon. 93.9, 103.4, 98.2, 101.2, 110.3, 98.4, 120.2, 115.6, 108.8, 102.6 98.1, a. Create a graphic to display the data. Comment on what you observe. b. Construct a 95% confidence for the mean by using the t distribution. Would you be comfortable using this approach for these data? c. By taking 400 resamples, construct an empirical bootstrap distribution for the sample mean here. Comment on its shape. d. Find a 95% bootstrap confidence interval for the mean. Compare your interval with that found in (b). e. Construct a bootstrap hypothesis test to test the hypothesis that the mean radon level in the home is 100 picocuries. f. Suppose instead the population median was of interest. Why might bootstrapping for the sample median not work well here, or more generally when the sample size is small? 9 3. The 2011 Baylor University Religion Survey polled 1717 US residents on their religious and philosophical views. The survey has 300 items, one of which investigates the level of agreement to the statement “Everyone starts with the same chance in life”. Of the 1717 respondents, 456 claim to believe the statement. a. What do you think was the target population for this study? b. The study claims a “margin of error ±4%”. How do you think this “margin of error” was determined? c. Create a 95% bootstrap confidence interval for the population parameter of interest for this item on the survey. Comment on your findings. 4. As described here, bootstrap estimation is not in general unbiased. In certain situations the standard bootstrap procedure given is unlikely to work well. Suppose we have data from the U (0, θ) distribution, where θ is unknown. Denote the data x1 , x2 , . . . , xn . We wish to estimate θ. One natural approach to estimation in this case leads to the sample maximum being the estimator, that is, θ̂ = max {x1 , x2 , . . . , xn } . a. Explain why θ̂ may be a sensible estimator for θ. b. In taking a bootstrap sample, what is the probability that the sample contains θ̂? c. Using software or otherwise, investigate how the probability you found in (b) behaves as n grows large. What does this tell us about a bootstrap estimator in this situation? 3 References [1] Baylor University Religion Survey 2011. http://www.baylor.edu/content/services/document.php/153501. pdf (acccessed 19th October 2012) [2] Davison, A. C. and Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. [3] Efron, B. (1979): Bootstrap methods: Another look at the jackknife. Annals of Statistics 7, 1-26. [4] R. Ehmann, E. Boedeker, U. Friedrich, J. Sagert, J. Dippon, G. Friedel, and T. Walles (2011): Canine scent detection in the diagnosis of lung cancer: revisiting a puzzling phenomenon. European Respiratory Journal 39, 669-676. [5] Hideto Sonoda, Shunji Kohnoe, Tetsuro Yamazato, Yuji Satoh, Gouki Morizono, Kentaro Shikata, Makoto Morita, Akihiro Watanabe, Masaru Morita, Yoshihiro Kakeji, Fumio Inoue, Yoshihiko Maehara (2011): Colorectal cancer screening with odour material by canine scent detection. Gut 60 No.6, 814-819. 10