This interval is based on the fact that, if data is normally distributed

advertisement
STDEVCIBOOT
This macro is designed to calculate bootstrap confidence intervals about a population standard deviation.
RUNNING THE MACRO
Calling statement
stdevciboot c1 ;
siglev k1 (95) ;
nboot k1 (2000);
stdevs c1 ;
quantiles c1-c3.
Input
Input to the macro must be a single column, containing only numerical values. Discrete or continuous data
are allowed. Missing data is allowed.
Subcommands
siglev
The significance level of the confidence interval, expressed as a percentage.
The default is 95 (corresponding to 95% significance); other standard choices are
90, 98 or 99.
nboot
The number of bootstrap samples used. The default is 2000. It is not recommend to use
less than 1000 for the construction of confidence intervals.
medians
Specify a column in which to store bootstrap sample medians.
quantiles Specify three columns in which to store ranks corresponding to the lower and upper
confidence interval limits, for standard percentile method (column 1), the BC method
(column 2) and the BCa method (column 3).
tvalues
Specify a column in which to bootstrap sample t-statistics.
Six ranks are given; the first two ranks correspond to the ranks for the lower and upper
confidence limits for the standard percentile and bootstrap-t confidence intervals.
The next two ranks correspond to the ranks for the bias-corrected (BC) percentile intervals,
the final two ranks correspond to ranks for the accelerated bias-corrected (BCa) percentile
intervals.
Output
 Basic information (number of data points, significance level, number of bootstrap samples)
 Sample standard deviation
 Bootstrap standard deviation about the estimated standard deviation
 Estimated bias correction (for BC and BCa methods)
 Estimated acceleration (for BCa method)
 Confidence interval using chi-squared approximation
 Bootstrap confidence intervals using : Estimate -/+ bootstrap standard deviation, Efron percentile
method, Hall percentile method, BC method, BCa method.
Speed of macro : Fast.
ALTERNATIVE PROCEDURES
Standard procedure : No built-in Minitab function, but the macro incorporates a confidence interval
obtained using the following approximation based upon the chi-squared distribution :
The standard 100(1 – alpha) confidence interval for a standard deviation has limits of
sqrt{(n – 1) * sample variance / appropriate quantiles of the chi-squared n-1 distribution},
where n is sample size. For a 95% confidence interval, the 2.5% and 97.5% quantiles are used.
This interval is based on the fact that, if data is normally distributed, the quantity
Sample variance * (n – 1) / Population variance
has a chi-squared distribution with n – 1 degrees of freedom.
References
MANLY, F.J. (1997) Randomization, bootstrap and Monte Carlo methods in biology,
Chapman and Hall, London (Chapter 3).
EFRON, B. & TIBSHIRANI, J. (1993) An introduction to the Bootstrap, Chapman and Hall, London
(Chapters 12-14).
WORKED EXAMPLE FOR STDEVCIBOOT
Data
EXPONENTIAL (see MEANCIBOOT)
Aims of analysis
To create confidence intervals for the population standard deviation.
Randomization procedure
MTB > Retrieve "N:\resampling\Examples\Exponential.MTW".
Retrieving worksheet from file: N:\resampling\Examples\Exponential.MTW
# Worksheet was saved on 23/08/01 12:16:52
Results for: Exponential.MTW
MTB > % N:\resampling\library\stdevciboot c1 ;
SUBC> siglev 95 ;
SUBC> nboot 2000 ;
SUBC> stdevs c3 ;
SUBC> quantiles c5-c7.
Executing from file: N:\resampling\library\stdevciboot.MAC
Data Display
BOOTSTRAP CONFIDENCE INTERVALS FOR A POPULATION VARIANCE
Histogram simsort
General information
Data Display (WRITE)
Number of data values 20
Observed variance 1.0597
Significance level for confidence intervals
Number of bootstrap samples 2000
Bootstrap standard deviation about the variance
95
0.2474
2
Estimated bias-correction (for BC, BCa)
0.1231
Estimated acceleration (for BCa)
0.1009
Confidence intervals
Data Display (WRITE)
Standard chi-squared based interval
0.7829
1.504
Estimate -/+ 1.96*bootstrap SE
0.5749
1.545
Efron percentile method
Hall percentile method
0.4842
0.6836
1.436
1.635
BC percentile method
BCa percentile method
0.5137
0.5742
1.474
1.549
Distribution of variances
from bootstrap resamples
80
70
Frequency
60
50
40
30
20
10
0
0.2
0.7
1.2
1.7
Sample variance
Modified worksheet
C3
A column containing 2000 sample standard deviations, one for each bootstrap resample
C5
Upper and lower rank positions for percentile confidence limits using the Efron method
C6
Upper and lower rank positions for percentile confidence limits using the Efron method
C7
Upper and lower rank positions for percentile confidence limits using the Efron method
Columns c5 - c7 each contain 2 values.
Discussion
The different methods produce substantially different results. The confidence interval based upon
standard methods is substantially shorter than any of the bootstrap confidence intervals. Manly (1997)
performs a simulation study to investigate the coverage of the different methods for a sample of 20 from
an exponential distribution with parameter 1. The coverages are extremely poor for all of the methods.
Against a nomial coverage level of 95%, the bootstrap methods achieve coverages of between 65.9%
(Efron method) and 72.7% (Hall method), whilst the standard method has a coverage of 72.7%. Some
improvement to the standard and Hall methods can be obtained by taking logarithms, but the coverage
remains poor. We see that bootstrap distribution of the standard deviation (see figure above) is very
lumpy, and this may explain the poor performance of the methods.
3
ADDITIONAL SAMPLE DATASET FOR STDEVCIBOOT
Name of dataset
SPATIAL
Description
For each of 26 neurologically impaired children, the results of two tests of spatial perception, A and B,
are recorded.
Source
EFRON, B. & TIBSHIRANI, J. (1993) An introduction to the Bootstrap, Chapman and Hall, London.
Data
Number of observations = 26
Number of variables = 2
For each child, A scores (top) and B scores (bottom) are shown.
48 36 20 29 42 42 20 42 22 41 45 14 6 0 33 28 34 4 32
42 33 16 39 38 36 15 33 20 43 34 22 7 15 34 29 41 13 38
24 47 41 24 26 30 41
25 27 41 28 14 28 40
Aims of analysis
Efron and Tibshirani (1993) produce confidence intervals for test A scores.
4
Download