SAMPLE SIZE CALCULATION PROGRAMS Vanderbilt University http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize PS: Power and Sample Size Calculation Get PS (5.2 MB) version 2.1.31, 2004 Release Notes by William D. Dupont and Walton D. Plummer, Jr. PS is an interactive program for performing power and sample size calculations. It can be used for studies with dichotomous, continuous, or survival response measures. The alternative hypothesis of interest may be specified either in terms of differing response rates, means, or survival times, or in terms of relative risks or odds ratios. Studies with dichotomous or continuous outcomes may involve either a matched or independent study design. The program can determine the sample size needed to detect a specified alternative hypothesis with the required power, the power with which a specific alternative hypothesis can be detected with a given sample size, or the specific alternative hypotheses that can be detected with a given power and sample size. The PS program can produce graphs to explore the relationships between power, sample size and detectable alternative hypotheses. It is often helpful to hold one of these variables constant and plot the other two against each other. The program can generate graphs of sample size versus power for a specific alternative hypothesis, sample size versus detectable alternative hypotheses for a specified power, or power versus detectable alternative hypotheses for a specified sample size. Linear or logarithmic axes may be used for either axes. Multiple curves can be plotted on a single graphic. Downloading the Software The PS program is freely available on the Internet. To obtain this software on your computer click PS (5.2 MB). Instruct your browser to download the file to a folder on your computer. A file called pssetup.exe will be downloaded to this location. Run pssetup.exe to extract the needed files and install the program. The program runs on the Microsoft Windows operating systems (Windows 95 and later). We have not tested the PS program with Microsoft Vista, though we have had a few reports that it does work. Sometimes, the help functionality does not work under Vista. There is some additional software that can be downloaded from Microsoft that will help. See http://support.microsoft.com/kb/917607 for details. To run the PS program after it has been installed, click the Start button, select Programs and then click PS. Click the Overview button for an introduction to the program and instruction on its use. PS is a self-documented program with extensive interactive help. We are interested in feedback. If you have any questions or comments about our software please send email to dale.plummer@vanderbilt.edu. It will be appreciated. Study Designs That Can Be Evaluated By This Program Survival Studies: Evaluation of independent cohorts using the log-rank test. The approach of Schoenfeld and Richter3 is used. The ratio of number of patients in the cohorts being compared may be specified by the user. Continuous Response Measures in Two Groups: Paired and independent t tests. The approach of Dupont and Plummer1 is used for paired and independent samples. The ratio of number of patients in the samples being compared may be specified by the user. This method produces results that are in close agreement with those of Pearson and Hartley. 4 Linear Regression: Tests of slopes, comparisons of slopes and intercepts from independent regressions. The methods of Dupont and Plummer2 is used. They may be used to design studies in which we wish to detect a regression slope of a given magnitude. They may also be used when we wish to determine whether the slopes or intercepts of two independent regression lines differ by a given amount. The values of the independent (x) variable(s) of the regression line(s) may either be specified by the investigator or determined observationally when the study is performed. In the latter case, the investigator must estimate the standard deviation(s) of the independent variable(s). Independent Case-Control Studies: Corrected and uncorrected chi-square contingency table tests, Fisher's exact test. The method of Schlesselman5 is used for studies with independent case and control groups that will be analyzed using an uncorrected chisquare test; the method of Casagrande et al.6 is used for independent studies that will be analyzed using continuity corrected chi-square statistics or Fisher's exact test. When the case and control sample sizes are unequal, PS uses the generalization of Casagrande's method proposed by Fleiss.7 The alternative hypotheses may be specified in terms of odds ratios or exposure prevalence rates. Matched Case-Control Studies: McNemar's Test. The method of Dupont8 is used for studies with paired or matched cases and controls. The alternative hypotheses may be specified in terms of odds ratios or exposure prevalence rates. Cohort Studies With Dichotomous Outcomes: Independent contingency table tests, McNemar's test. The methods of Schlesselman,5 Casagrande,6 Fleiss7 and Dupont8 are available. The alternative hypotheses may be specified in terms of relative risks or outcome probabilities. References Dupont WD, Plummer WD, Jr: Power and Sample Size Calculations: A Review and Computer Program. Controlled Clinical Trials 11:116-128, 1990 Dupont WD, Plummer WD, Jr: Power and Sample Size Calculations for Studies Involving Linear Regression. Controlled Clinical Trials 19:589-601, 1998 Schoenfeld DA, Richter JR: Nomograms for calculating the number of patients needed for a clinical trial with survival as an endpoint. Biometrics 38:163-170, 1982 Pearson ES, Hartley HO: Biometrika Tables for Statisticians Vol. I 3rd Ed. Cambridge: Cambridge University Press, 1970 Schlesselman JJ: Case-Control Studies: Design, Conduct, Analysis. New York: Oxford University Press, 1982 Casagrande JT, Pike MC, Smith PG: An improved approximate formula for calculating sample sizes for comparing two binomial distributions. Biometrics 34:483-486, 1978 Fleiss JL: Statistical Methods for Rates and Proportions. 2nd Ed. New York: John Wiley & Sons, 1981 Dupont WD: Power calculations for matched case-control studies. Biometrics 44:1157-1168, 1988 National Statistical Service : Australia http://www.nss.gov.au/nss/home.NSF/pages/Sample+Size+Calculator+Description?OpenDocum ent http://www.nss.gov.au/nss/home.nsf/NSS/0A4A642C712719DCCA2571AB00243DC6?opendocu ment Sample Size Calculator What does it do? The sample size calculator on the next page allows you to calculate the required sample size, standard error, RSE, and a confidence interval (95% or 99%) for a proportion estimate, using just one of these criteria as an input. For example, if you know the minimum standard error you require to ensure the precision of your estimate, you can find out the sample size required to achieve that; if you know the likely size of the responding sample you can estimate the standard error of your estimate, and a confidence interval for it. The Statistical Clearing House recommends that you set the level of precision that will meet needs of the users of your data. The level of precision should be set in conjunction with the users of the data. You should not set the accuracy levels too high, as you will incur higher costs and place additional burden on the community. You should also not set the accuracy levels too low, as your data will not be approriate for your users. Depending on the intended uses of the information, precision may not be the only concern. Consideration also needs to be be given to cost, turnaround and respondent burden. When deciding whether to increase precision, returns to scale must be considered. A small increase in precision that incurs a large cost may not be justified. The sample size calculator assumes simple random sampling. The results generated here are intended only as rough guidelines and should only be used as such - they are by no means the definitive "rule" about the size of a sample. How do I use it? Simply follow the steps outlined below. Select the confidence level you want to work at. If you are sampling from a finite population (one that isn't very large), enter the size of the population here. If you already roughly know the proportion you're estimating, or want to check the RSE of an existing estimate, fill in the proportion. If left blank it will be assumed to be 0.5. You must fill in one of Confidence Interval Range, Standard Error, Relative Standard Error or Sample Size. Make sure the bullet point corresponding to the one you wish to specify is selected. Press Calculate to perform the calculation, or Clear to start again. What do the categories mean? Confidence Level This is the chance that the true value will be inside the confidence interval calculated. You can select 95% or 99%. Population Size This option allows you to specify the size of the population of interest. This option can be left blank, in which case it will be assumed to be very large (typically, populations of size more than 100,000 are considered very large). Proportion This option allows you to specify the estimated proportion, if it is approximately known. This assists in calculating the estimate standard errors which are appropriate for your situation. The proportion may be sourced from previous cycles of the survey or by a educated guess. Confidence Interval +/The Sample Size Calculator allows you to express the precision in terms of "some value plus or minus an amount". For example, if you want your result to be accurate to within 5% (ie. plus or minus 5%) then you should specify 0.05 here. Note that the value must be entered as a proportion, not as a percentage. Upper and Lower These are the upper and lower bounds of the confidence interval. You cannot enter them, but they will be displayed once the calculation is made. Standard Error This is the standard error of the estimate. Standard error is a measure of the variation of any estimate that is produced by sampling a given population. This gives us an idea of the likelihood that the estimate is near the true value. The standard error is expressed in the same units as the estimate (in the case of any calculations done with this calculator, it is a proportion). A higher standard error means the estimate is more variable. Relative Standard Error (RSE) This is the Standard Error expressed as a percentage of the estimate itself. For example if the estimate is 0.5 and the standard error is 0.05, then the RSE will be 10%. RSE is often used in preference to standard error when comparing the variability of samples of different magnitudes. The RSE places the Standard Error in the context of the estimate. For example, for an estimate of 0.01, a standard error of 0.1 would be of much greater issue than for an estimate of 0.5. In the first case, the RSE is 1000%, while in the second case it is much smaller (20%). Sample Size This is the sample size required for the standard error or confidence intervals displayed. You can also specify the sample size to have standard error and the confidence interval calculated for you. University of Iowa Lenth, R. V. (2001), ``Some Practical Guidelines for Effective Sample Size Determination,'' The American Statistician, 55, 187-193. http://www.stat.uiowa.edu/~rlenth/Power/ Download to run locally The file piface.jar may be downloaded so that you can run these applications locally. [Note: Some mail software (that thinks it is smarter than you) renames this file piface.zip. If this happens, simply rename it piface.jar; do not unzip the file.] You may also want the icon file piface.ico if you put it on your desktop or a toolbar. You will need to have the Java Runtime Environment (JRE) or the Java Development Kit (JDK) installed on your system. You probably already have it; but if not, these are available for free download for several platforms from Sun. If you have JDK or JRE version 1.2 or later, then you can probably run the application just by double-clicking on piface.jar. Otherwise, you may run it from the command line in a terminal or DOS window, using a command like java -jar piface.jar This will bring up a selector list similar to the one in this web page. A particular dialog can also be run directly from the command line, if you know its name (can be discovered by browsing piface.jar with a zip file utility such as WinZip). For example, the two-sample t-test dialog may be run using java -cp piface.jar rvl.piface.apps.TwoTGUI Links to other sites Interactive page - Michael Friendly (ANOVA designs) Interactive page - David Schoenfeld (clinical trials designs; menu based on study type and measurement type) Sample-size calculator for k-stage designs (by James Kepner) UnifyPow - A SAS module for sample-size analysis by Ralph O'Brien. SSize - ECHIP sample size calculator for Palm devices (freeware) by Bob Wheeler. A review of power software for PCs (article, pdf format) by Len Thomas and Charles Krebs (of limited use now as it was published in 1997).