STDEVCIBOOT This macro is designed to calculate bootstrap confidence intervals about a population standard deviation. RUNNING THE MACRO Calling statement stdevciboot c1 ; siglev k1 (95) ; nboot k1 (2000); stdevs c1 ; quantiles c1-c3. Input Input to the macro must be a single column, containing only numerical values. Discrete or continuous data are allowed. Missing data is allowed. Subcommands siglev The significance level of the confidence interval, expressed as a percentage. The default is 95 (corresponding to 95% significance); other standard choices are 90, 98 or 99. nboot The number of bootstrap samples used. The default is 2000. It is not recommend to use less than 1000 for the construction of confidence intervals. medians Specify a column in which to store bootstrap sample medians. quantiles Specify three columns in which to store ranks corresponding to the lower and upper confidence interval limits, for standard percentile method (column 1), the BC method (column 2) and the BCa method (column 3). tvalues Specify a column in which to bootstrap sample t-statistics. Six ranks are given; the first two ranks correspond to the ranks for the lower and upper confidence limits for the standard percentile and bootstrap-t confidence intervals. The next two ranks correspond to the ranks for the bias-corrected (BC) percentile intervals, the final two ranks correspond to ranks for the accelerated bias-corrected (BCa) percentile intervals. Output Basic information (number of data points, significance level, number of bootstrap samples) Sample standard deviation Bootstrap standard deviation about the estimated standard deviation Estimated bias correction (for BC and BCa methods) Estimated acceleration (for BCa method) Confidence interval using chi-squared approximation Bootstrap confidence intervals using : Estimate -/+ bootstrap standard deviation, Efron percentile method, Hall percentile method, BC method, BCa method. Speed of macro : Fast. ALTERNATIVE PROCEDURES Standard procedure : No built-in Minitab function, but the macro incorporates a confidence interval obtained using the following approximation based upon the chi-squared distribution : The standard 100(1 – alpha) confidence interval for a standard deviation has limits of sqrt{(n – 1) * sample variance / appropriate quantiles of the chi-squared n-1 distribution}, where n is sample size. For a 95% confidence interval, the 2.5% and 97.5% quantiles are used. This interval is based on the fact that, if data is normally distributed, the quantity Sample variance * (n – 1) / Population variance has a chi-squared distribution with n – 1 degrees of freedom. References MANLY, F.J. (1997) Randomization, bootstrap and Monte Carlo methods in biology, Chapman and Hall, London (Chapter 3). EFRON, B. & TIBSHIRANI, J. (1993) An introduction to the Bootstrap, Chapman and Hall, London (Chapters 12-14). WORKED EXAMPLE FOR STDEVCIBOOT Data EXPONENTIAL (see MEANCIBOOT) Aims of analysis To create confidence intervals for the population standard deviation. Randomization procedure MTB > Retrieve "N:\resampling\Examples\Exponential.MTW". Retrieving worksheet from file: N:\resampling\Examples\Exponential.MTW # Worksheet was saved on 23/08/01 12:16:52 Results for: Exponential.MTW MTB > % N:\resampling\library\stdevciboot c1 ; SUBC> siglev 95 ; SUBC> nboot 2000 ; SUBC> stdevs c3 ; SUBC> quantiles c5-c7. Executing from file: N:\resampling\library\stdevciboot.MAC Data Display BOOTSTRAP CONFIDENCE INTERVALS FOR A POPULATION VARIANCE Histogram simsort General information Data Display (WRITE) Number of data values 20 Observed variance 1.0597 Significance level for confidence intervals Number of bootstrap samples 2000 Bootstrap standard deviation about the variance 95 0.2474 2 Estimated bias-correction (for BC, BCa) 0.1231 Estimated acceleration (for BCa) 0.1009 Confidence intervals Data Display (WRITE) Standard chi-squared based interval 0.7829 1.504 Estimate -/+ 1.96*bootstrap SE 0.5749 1.545 Efron percentile method Hall percentile method 0.4842 0.6836 1.436 1.635 BC percentile method BCa percentile method 0.5137 0.5742 1.474 1.549 Distribution of variances from bootstrap resamples 80 70 Frequency 60 50 40 30 20 10 0 0.2 0.7 1.2 1.7 Sample variance Modified worksheet C3 A column containing 2000 sample standard deviations, one for each bootstrap resample C5 Upper and lower rank positions for percentile confidence limits using the Efron method C6 Upper and lower rank positions for percentile confidence limits using the Efron method C7 Upper and lower rank positions for percentile confidence limits using the Efron method Columns c5 - c7 each contain 2 values. Discussion The different methods produce substantially different results. The confidence interval based upon standard methods is substantially shorter than any of the bootstrap confidence intervals. Manly (1997) performs a simulation study to investigate the coverage of the different methods for a sample of 20 from an exponential distribution with parameter 1. The coverages are extremely poor for all of the methods. Against a nomial coverage level of 95%, the bootstrap methods achieve coverages of between 65.9% (Efron method) and 72.7% (Hall method), whilst the standard method has a coverage of 72.7%. Some improvement to the standard and Hall methods can be obtained by taking logarithms, but the coverage remains poor. We see that bootstrap distribution of the standard deviation (see figure above) is very lumpy, and this may explain the poor performance of the methods. 3 ADDITIONAL SAMPLE DATASET FOR STDEVCIBOOT Name of dataset SPATIAL Description For each of 26 neurologically impaired children, the results of two tests of spatial perception, A and B, are recorded. Source EFRON, B. & TIBSHIRANI, J. (1993) An introduction to the Bootstrap, Chapman and Hall, London. Data Number of observations = 26 Number of variables = 2 For each child, A scores (top) and B scores (bottom) are shown. 48 36 20 29 42 42 20 42 22 41 45 14 6 0 33 28 34 4 32 42 33 16 39 38 36 15 33 20 43 34 22 7 15 34 29 41 13 38 24 47 41 24 26 30 41 25 27 41 28 14 28 40 Aims of analysis Efron and Tibshirani (1993) produce confidence intervals for test A scores. 4