Bootstrapping. A way to estimate a sampling distribution. Example: Recall the pigeons data. Birds were released on cloudy days and sunny days. The measurement was the absolute error angle as the bird disappeared over the horizon. cloudy 8 10 sunny 6 48 7 51 38 9 52 43 17 53 18 55 45 18 56 57 22 57 73 28 58 76 32 63 83 35 72 36 83 105 42 91 112 42 97 126 42 141 48 Research hyp: Birds released on cloudy days will tend to have larger error angles than those released on sunny days. Mann-Whitney Test and CI: cloudy, sunny N 13 28 cloudy sunny Median 73.00 45.00 Point estimate for ETA1-ETA2 is 25.00 95.2 Percent CI for ETA1-ETA2 is (0.00,54.00) W = 344.0 Test of ETA1 = ETA2 vs ETA1 > ETA2 is significant at 0.0241 Recall further: W in the Minitab output is the sum of ranks of the first named column (cloudy in this case). 85% CI-Boxplot 140 120 comb 100 80 60 40 20 0 cloudy sunny subs Hence, W = sum of ranks of cloudy data = 344. There are other forms for the MWW which were discussed in class. As an exercise, go back to those formulas and show that an estimate of p = P(C > S) is given by: (n 1) U W 344 (14) p P(C S ) 1 .695 n1 n2 n1 n2 2n 2 13x 28 56 where n1 = 13 (cloudy) and n2 = 28 (sunny). This is an estimate and, hence, has a sampling distribution. We do not assume a null hypothesis is true in this case. A 90% confidence interval is determined by finding the .05 quantile and the .95 quantile of the sampling distribution to capture the middle 90% of the sampling distribution. Again from earlier in the course we approximated the sampling distribution of the MWW statistic (under the null hypothesis) using permutations of the combined data. This will no longer work. Since we don’t assume the null hyp, we can’t assume that all permutations are equally likely. (Permutation Prinicple) How does the sampling distribution of MWW arise? a. We generate a sample from the population of cloudy birds. b. We generate a sample from the population of sunny birds. c. We compute MWW and store it. d. Repeat a, b, and c B = 1000 times and then we have 1000 values of the MWW statistic. e. The histogram of the 1000 values of MWW is an approximation to the sampling distribution. The difficulty is that we do not have the two populations to take samples from. Solution: We let the sample of cloudy bird data represent (estimate) the cloudy bird population. So we can sample from the cloudy bird sample (with replacement). This called a bootstrap sample. In Minitab the command is in the menu: Calc>random data>sample from columns: Sample 13 'cloudy' c22; Replace. Similarly for the sunny data. Hence, we approximate the sampling distribution of MWW by carrying out a-e above with sampling from populations replaced by sampling from the cloudy and sunny samples. The following macro implements the bootstrap for P(Y > X): MTB > %bootp ‘cloudy’ ‘sunny’ c21 (c21 contains the bootstrap values of p) ESTIMATE OF P(Y > X) AND 90% CONFIDENCE INTERVAL Data Display .05-quantile Pr(Y>X) .95-quantile 0.530220 0.692308 0.837912 Histogram of P(Y>X) 90 80 Frequency 70 60 50 40 30 20 10 0 40 0. 48 0. 53 56 0. 0. 64 69 72 0. 0. 0. P(Y>X) 80 0. 84 0. 88 0. 96 0. Exercise: Get the coins data from website. Get the CI-Boxplots. Get a 90% confidence interval for P(first > fourth). Get the histogram approximation of the sampling distribution of p and mark confidence interval and estimate on the histogram.