Statistical Power And Sample Size Calculations ............................................................................................ 1 When Do You Need Statistical Power Calculations, And Why? ............................................................... 1 Preparation For The Question “What Is Statistical Power?” ...................................................................... 1 Statistical Hypothesis Testing .................................................................................................................... 1 When Ho Is True And You Reject It, You Make A Type I Error. .............................................................. 2 When Ho is False And You Fail To Reject It, You Make A Type II Error................................................. 2 The Definition Of Statistical Power ........................................................................................................... 2 Calculating Statistical Power ...................................................................................................................... 3 How Do We Measure Effect Size? ............................................................................................................. 3 Cohen's Rules Of Thumb For Effect Size ................................................................................................... 3 Calculating Cohen’s d ............................................................................................................................ 4 Calculating Cohen’s d from a t test ........................................................................................................ 4 Conventions And Decisions About Statistical Power ................................................................................. 5 Considering Statistical Power When Reviewing Scientific Research ........................................................ 6 Statistical Power Analysis In Minitab ........................................................................................................ 7 Summary: Factors That Influence Power ................................................................................................... 7 Footnote ...................................................................................................................................................... 8 Using Minitab To Calculate Power And Minimum Sample Size ................................................................... 8 Example 1: Statistical power of a t-test on scores in 2 groups ................................................................... 8 Example 2: Required sample size for a given power: 2 group comparison ................................................ 9 Example 3: Example of calculating power for a one-way ANOVA..........................................................10 Power and Sample Size Calculations for Other Designs and Tests ...........................................................11 Sample Size Equations – Background Theory ..............................................................................................13 Determining The Necessary Sample Size For Estimating A Single Population Mean Or A Single Population Total With A Specified Level Of Precision.................................................................................14 Table Of Standard Normal Deviates (Zα) for Various Confidence Levels ................................................15 Example: ....................................................................................................................................................16 Example: ....................................................................................................................................................17 Determining The Necessary Sample Size For Detecting Differences Between Two Means With Temporary Sampling Units. .............................................................................................................................................17 Table of standard normal deviates for Zα ..................................................................................................18 Table of standard normal deviates for Zß...................................................................................................18 Example: ....................................................................................................................................................18 Example: ....................................................................................................................................................20 Determining The Necessary Sample Size For Detecting Differences Between Two Means When Using Paired Or Permanent Sampling Units. ...........................................................................................................21 Table of standard normal deviates for Zα ..................................................................................................22 Table of standard normal deviates for Zß...................................................................................................22 Example .....................................................................................................................................................23 Example .....................................................................................................................................................24 Example .....................................................................................................................................................25 Determining The Necessary Sample Size For Estimating A Single Population Proportion With A Specified Level Of Precision. ........................................................................................................................................26 Table of standard normal deviates (Zα) for various confidence levels.......................................................27 Example: ....................................................................................................................................................27 Example: ....................................................................................................................................................28 Determining The Necessary Sample Size For Detecting Differences Between Two Proportions With Temporary Sampling Units. ..........................................................................................................................30 Table of standard normal deviates for Zα ..................................................................................................30 Table of standard normal deviates for Zß...................................................................................................31 Example: ....................................................................................................................................................31 Example: ....................................................................................................................................................33 Caveat ............................................................................................................................................................34 Bibliography ..................................................................................................................................................34 Sample Size Calculations in Clinical Research by Shein-Chung Chow, Jun Shao and Hansheng Wang 34 Sample size estimation: How many individuals should be studied? By Eng John ....................................35 i Power and Sample Size Estimation in Research by Ajeneye Francis ........................................................35 Sample size determination by Dell RB, Holleran S, Ramakrishnan R ......................................................35 Statistical power and estimation of the number of required subjects for a study based on the t-test: A surgeon's primer by Livingston EH, Cassidy L .......................................................................................36 Sample Size Correction Table for Single Parameter Estimates .....................................................................37 ii Statistical Power And Sample Size Calculations When Do You Need Statistical Power Calculations, And Why? People embarking on scientific projects in new fields need statistical power analysis in order to design their studies - particularly to decide on how many cases are needed. A prospective power analysis is used before collecting data, to consider design sensitivity - that is, the ability to detect what you are looking for at a time when you can do something about it. For example, you can increase the design sensitivity by increasing the sample size, or by taking measures to decrease the error variance, e.g., by controlling extraneous variables. Thus, prospective power analysis is what you use when you are planning your own project. Readers and reviewers of the scientific literature may need to draw on retrospective power analysis, in order to know whether the studies they are interpreting were well enough designed - especially if these report failures to reach statistical significance. For example, suppose you read a paper about a study in which the authors had conducted an experiment and the data analysis did not reveal any statistically significant results. You would need to know whether that study had ever had a chance of coming up with a significant result - e.g., whether the investigators gathered a big enough sample. To do this you would need to estimate whether the study had sufficient statistical power. You can use Minitab to perform both prospective and retrospective power studies. Preparation For The Question “What Is Statistical Power?” The answer to this question depends on your having a clear understanding of the following technical terms: the null hypothesis, (“Ho”), significance level, α, Type I error, Type II error. If you are very unsure about these, please refer to your own statistics notes and to your usual statistics textbook - but let's briefly review these concepts as a preliminary to understanding statistical power: Statistical Hypothesis Testing When you perform a statistical hypothesis test, there are four possible outcomes. These outcomes depend on: 1 whether the null hypothesis (Ho) is true or false, and whether you decide either to reject, or else to retain, provisional belief in Ho. These outcomes are summarised in the following table: Decision Ho is really true i.e., there is Ho is really false i.e., there really no effect to find really is an effect to be found Retain Ho correct decision: prob = 1 - α Type II error: prob = β Reject Ho Type I error: prob = α correct decision: prob = 1 - β When Ho Is True And You Reject It, You Make A Type I Error. (Translation: when there really is no effect, but the statistical test comes out significant by chance, you make a Type I error.) When Ho is true, the probability of making a Type I error is called alpha (α). This probability is the significance level associated with your statistical test. When Ho is False And You Fail To Reject It, You Make A Type II Error. (Translation: When, in the population, there really is an effect, but your statistical test comes out nonsignificant, due to inadequate power and/or bad luck with sampling error, you make a Type II error.) When Ho is false, (so that there really is an effect there waiting to be found) the probability of making a Type II error is called beta (β). The Definition Of Statistical Power Statistical power is the probability of not missing an effect, due to sampling error, when there really is an effect there to be found. In technical terms: power is the probability (prob = 1 - β) of correctly rejecting Ho when it really is false. 2 Calculating Statistical Power Power depends on: 1. the sample size(s), 2. the level of statistical significance required, and (here's the tricky bit!) on 3. the minimum size of effect that it is reasonable to expect. How Do We Measure Effect Size? For a comparison of two groups, e.g., an experimental group with a control group, the measure of effect size will probably be Cohen's d. This handy measure is defined as the difference between the means for the two groups, divided by an estimate of the standard deviation in the population - often we use the average of the standard deviations of the samples as a rough guide for the latter. The reason why the issue of effect size is tricky is that, all too often in Psychology, we don't know how big an effect we should expect. One of the good things about the recent development of power analysis is that it throws us back to thinking about the appropriate psychological theory, which ought to tell us how big an effect to expect. In Physics, people don't just predict that one variable will have a statistically significant effect on another variable: they develop theories that predict how big the effect should be. It's good for us to be forced to think how big the effect should be. But, as Psychology is so much more complex than Physics, we often cannot do much more than guess at expected effect sizes. Often we end up saying “Well, we can only afford to test so many subjects, so we will probably only be able to pick up an effect if it is big.” - So, we make some guess at what would be a “big” effect size. Even this is still likely to be useful, however, if the study is not powerful enough to pick up even a big effect, it is very unlikely to pick up a small one. Cohen (1992) gives useful rules of thumb about what to regard as a “big”, “medium” or “small” effect. Cohen's Rules Of Thumb For Effect Size Effect size Correlation Difference between means coefficient “Small effect” r = 0.1 d = 0.2 standard deviations “Medium effect” r = 0.3 d = 0.5 standard deviations “Large effect” r = 0.5 d = 0.8 standard deviations 3 The two commonest effect-size measures are Pearson's r and Cohen's d. Cohen's tutorial paper gives similar rules of thumb for differences in proportions, partial correlations and ANOVA designs. Cohen, J., (1992). A Power Primer. Psychological Bulletin 112: 155-159. Cohen, J., (1977). Statistical power analysis for the behavioural sciences. San Diego, CA: Academic Press. Calculating Cohen’s d Notation x x d 1 2 s Pooled d Cohen’s d effect size x Mean s Standard deviation Subscript refers to the two conditions being compared Notation s Pooled n1 1s12 n2 1s22 n1 n2 2 s Standard deviation n Sample size Subscript refers to the two conditions being compared Calculating Cohen’s d from a t test Notation n n2 n1 n2 d t 1 n1n1 n1 n2 2 d Cohen’s d effect size t t statistic Subscript refers to the two conditions being compared If the two sample sizes are approximately equal this becomes d If standard errors rather than standard deviations are available then 4 2t n2 where n n1 n2 . Notation s SE n s Standard deviation SE Standard error n Number of subjects Conventions And Decisions About Statistical Power For statistical significance, there is a convention that we will usually accept a 1 in 20 risk of making a Type I error. Thus, we usually start thinking of findings as being statistically significant if they attain a significance level of 0.05 or lower (i.e., risk of 1 in 20). There is a similar rule of thumb for statistical power, but the acceptable risk of making a Type II error is usually set rather higher. There is an overall implicit consensus that Type I errors, that lead to false confirmations of incorrect predictions, are about four times as dangerous as Type II errors. Type II errors lead to a false disbelief in effects that are actually real, but there seems more chance that these mistakes will be corrected in due course, at less cost than the results of Type I errors. The outcome of such considerations is that the conventional acceptable risk of a Type II error is often set at 1 in 5, i.e., a probability of 0.2. The conventionally uncontroversial value for “adequate” statistical power is therefore set at 1 - 0.2 = 0.8. Another way of expressing this is to say that people often regard the minimum acceptable statistical power for a proposed study as being an 80% chance of an effect that really exists showing up as a significant finding. If anyone (e.g., an ethical committee) asks you “What is the proposed power of your study”, you are on fairly safe ground if you can reply “0.8”. But this is just a convention and it does depend, just like the setting of significance levels, on weighing the relative costs of doing the study, plus the costs of the various forms of getting the wrong answers, against the benefits of getting the right answer. When you are deciding on acceptable values for α and β, for a given study, you need to consider the seriousness of each type of error. The more serious the error, the less often you will be willing to allow it to occur. Therefore, you should demand smaller probability values for risks of more serious errors. Ideally, you want to have high power to detect effects that you care about, and low power for any small effect that would be meaningless. This will affect decisions about, e.g., the numbers of subjects to test in a psychological experiment. 5 Example proposed in Minitab's Help pages: suppose you want to claim that children in your school scored higher than the general population on a standardised achievement test. You need to decide how much higher than the general population your test scores need to be so you are not making claims that are misleading. If your mean test score is a mere 0.7 points higher than the general population, on a 100 point test, do you really want to detect this as a significant difference? Probably not. (But for a note of caution, see Rosenthal (1991), p. 263 - outlined below). Considering Statistical Power When Reviewing Scientific Research In this situation, we are usually considering the implications of a null finding. The researchers report, perhaps, that A did not correlate with B significantly, or that there was no significant difference between the experimental group and the control group. They are unlikely to say this about their main findings, or they would probably never have got their paper published. But if any substantive argument is ever being made on the basis of not finding any significant effect (e.g., “no evidence that X is dangerous to health”, we should certainly be alert to whether or not the power of the study was ever adequate for Ho to be rejected. More often, researchers make positive claims on the basis of null results when discussing any checks they may have made concerning extraneous, or perhaps even potentially confounding, variables - e.g., whether the experimental group and the control group showed statistically significant differences in intelligence scores, social class, etc. In deciding whether to accept the fact that there was no significant difference between groups as any kind of evidence that they were similar, we need to think about whether the comparison had adequate statistical power. 6 Statistical Power Analysis In Minitab Minitab provides power and sample size calculations under its main STAT menu. It caters for the following procedures: Stat > Power and Sample Size > 1-Sample Z 1-Sample t 2-Sample t 1 Proportion 2 Proportions One-Way ANOVA 2-Level Factorial Design Plackett-Burman Design These facilities in Minitab are very easy to use, as should be evident from the accompanying Minitab examples, Summary: Factors That Influence Power The following 3 sets of factors influence power: Sample Size. As sample size increases, power increases. Alpha, the probability that you are prepared to accept for making a Type I error (i.e., the level of significance that you are prepared to use, e.g., 0.05.). As α, the probability of a Type I error, increases, β, the probability of a Type II error, decreases. Since power is 1 - β, as α increases, and the significance level gets less stringent, so statistical power to find a real effect power also increases. Note that if you demand a very stringent level of significance, you are less likely to get a significant result and your statistical power decreases. In Psychology, unduly high levels of stringency in the significance testing of post-ANOVA comparisons has probably been a major cause of failure in finding predicted effects that have really been there in the population (Rosenthal, 1991). Rosenthal (1991) Psychosomatic Medicine 53: 247-271 7 s, the standard deviation, which gives an estimate of variability in the population. As s increases, power decreases, because effects get lost in the “noise”. the real size of the effect that we are looking for in the population. . As the size of the effect in the population decreases, power decreases. Footnote Note of caution: Be aware that not all small effects are meaningless - e.g. in the study of changes in serious risks, most effects are small, but they can still be very important. Rosenthal (1991) gives a telling example in a study with very high statistical power (due to its sample comprising 22,071 cases) on the effect of aspirin on the incidence of coronary heart disease in American physicians. The correlation between taking aspirin and not having a heart attack came out as 0.034 (and that is the value of Pearson's r, not the significance level!) but that was equivalent to a reduction of 4% in the incidence of heart attacks - a “weak” effect that was far too strong to ignore. In fact, the American Medical Association closed the trial down prematurely on the strength of these interim results, because it had shown that the risk associated with being in the control group and not taking prophylactic aspirin was unacceptably large. The participants in the study were members of the AMA. Using Minitab To Calculate Power And Minimum Sample Size Example 1: Statistical power of a t-test on scores in 2 groups Suppose we have two samples, each with n = 13, and we that propose to use the 0.05 significance level, Difference between means is 0.8 standard deviations (i.e., Cohen's d = 0.8) 1. Click on the Stat Menu, select “Power and Sample size”, and from that select “2-sample t”. : A dialogue box appears: If you now click on “Help”, you won't really need to read this document but carry on doing so if you want to find out now how easy it is. 2. Go to the top section of the dialogue box, “Calculate power from sample size” Against “Sample size:” enter 13. Against “Difference:” enter 0.8. 3. Go to the bottom left section of the dialogue box, “Sigma”, which will hold your estimate of the standard deviation in the population. Check that it contains the value 1.0. 8 4. Click on the “Options” button and check that the default significance level of 0.05 is shown. Click the Options Box “OK” button. 5. Now that you have specified sample sizes of 13 and an effect size of 0.8 standard deviations, and alpha = 0.05, you click on “OK”. You get the following output in the Session Window:- MTB > Power; SUBC> TTwo; SUBC> Sample 13; SUBC> Difference 0.8; SUBC> Sigma 1.0. Power and Sample Size 2-Sample t Test Testing mean 1 = mean 2 (versus not =) Calculating power for mean 1 = mean 2 + difference Alpha = 0.05 Assumed standard deviation = 1 Difference 0.8 Sample Size 13 Power 0.499157 The sample size is for each group. Minitab responds that the power will be 0.4992. If, in the population, there really is a difference of 0.8 between the members of the two categories that would be sampled in the two groups, then using sample sizes of 13 each will have a 49.92% chance of getting a result that will be significant at the 0.05 level. The power value of approximately 0.5 is probably unacceptably low. We see from the Minitab output that this is based on a 2-tailed test. If we are using power analysis, we probably know enough about what we are doing to have a theory that predicts which group should have the higher scores, so perhaps a one-tailed test is called for. To repeat the analysis, basing it on a 1-tailed test, we repeat the procedure but after clicking to obtain the “Options” dialogue box, we change the “Alternative Hypothesis”. Example 2: Required sample size for a given power: 2 group comparison Suppose we had reason to expect that, as above, in the population, there is an effect waiting to be found, with a magnitude of 0.8 standard deviations between groups. Suppose we intend doing a one-tailed test, with significance level is 0.05. 1. As above, pull down the Minitab Stat Menu, select “Power and Sample size”, and from that select “2-sample t”. 9 2. Click the radio button: “Calculate sample size for each power”. Against “Power values”, enter 0.8. Against “Difference”, enter 0.8. (It is a coincidence that in this example, both values are the same.) 3. Go to the bottom left section of the dialogue box, “Sigma”, which will hold your estimate of the standard deviation in the population. Check that it contains the value 1.0. 4. As before, click the dialogue box “Options” button and in the Options dialogue box select the “Greater than” Alternative hypothesis radio button. Click the OK buttons. MTB > Power; SUBC> TTwo; SUBC> Difference 0.8; SUBC> Power 0.8; SUBC> Sigma 1.; SUBC> Alternative 1. Power and Sample Size 2-Sample t Test Testing mean 1 = mean 2 (versus >) Calculating power for mean 1 = mean 2 + difference Alpha = 0.05 Assumed standard deviation = 1 Difference 0.8 Sample Size 21 Target Power 0.8 Actual Power 0.816788 The sample size is for each group. Minitab’s output shows that for a target power of at least 0.8, there must be at least 21 cases in each of the two groups tested. This will give an actual power of 0.8168. Interestingly, this result does not tally precisely with the power estimate given in Table 2 of Cohen’s 1992 paper in Psychological Bulletin. Maybe Cohen was proposing a 2-tailed test. Calculating the number of cases required for a power of 0.8, when the difference between group means is 0.8 standard deviations, if the t-test is to be 2-tailed, is left as an exercise for the reader. Example 3: Example of calculating power for a one-way ANOVA Suppose you are about to undertake an investigation to determine whether or not 4 treatments affect the yield of a product using 5 observations per treatment. You know that the mean of the control group should be around 8, and you would like to find significant differences of +4. Thus, the maximum difference you are considering is 4 units. Previous research suggests the population σ is 1.64. 10 1. As above, pull down the Minitab Stat Menu, select “Power and Sample size”, and from that select “One-way ANOVA”. 2. In Number of levels, enter 4. 3. In Sample sizes, enter 5. 4. In Values of the maximum difference between means, enter 4. 5. In Standard deviation, enter 1.64. Click the OK button. MTB > Power; SUBC> OneWay 4; SUBC> Sample 5; SUBC> MaxDifference 4; SUBC> Sigma 1.64. Power and Sample Size One-way ANOVA Alpha = 0.05 SS Means 8 Sample Size 5 Assumed standard deviation = 1.64 Power 0.826860 Number of Levels = 4 Maximum Difference 4 The sample size is for each level. To interpreting the results, if you assign five observations to each treatment level, you have a power of 0.83 to detect a difference of 4 units or more between the treatment means. Minitab can also display the power curve of all possible combinations of maximum difference in mean detected and the power values for oneway ANOVA with 5 samples per treatment. The symbol on the curve represents the difference value you specified. Power and Sample Size Calculations for Other Designs and Tests The methods are very similar for all the options that Minitab offers, such as differences between proportions and ANOVA designs. It is pretty self-evident what to do once you know how to do it for the t-test. The on-line help that you get by clicking the “Help” button in the power calculation dialogue box is very good. 11 Cohen (1992) remains a very useful introductory guide to power and effect size, in less then 5 pages. It includes a table with many useful power and effect size calculations already done for the reader 12 Sample Size Equations – Background Theory Five different sample size equations are presented in this section: Each separate description is designed to stand-alone from the others. Each discussion includes the sample size equation, a description of each term in the equation, a table of appropriate coefficients, and a worked example. The examples included all refer to monitoring with a quadrat-based sampling procedure. The equations and calculations also work with other kinds of monitoring data such as measurements of plant height, number of flowers, or measures of cover. For the equations that deal with comparing different sample means, all comparisons shown are for two-tail tests. If a one-tail test is desired, double the false-change (Type I) error rate (α) and look up the new doubled α value in the table of coefficients (e.g., use α = 0.20 instead of α = 0.10 for a one-tailed test with a false-change (Type I error rate of α = 0.10). The coefficients used in all of the equations are from a standard normal distribution (Zα and Zß) instead of the t-distribution (ta and tß). These two distributions are nearly identical at large sample sizes but at small sample sizes (n < 30) the Z coefficients will slightly underestimate the number of samples needed. The correction procedure described for the first example (using the sample size correction table, below) already adjusts the sample size using the appropriate t-value. For the other equations, ta and tß values can be obtained from a t-table and used in place of the Zα and Zß coefficients that are included with the sample size equations. The appropriate ta coefficient for the false-change (Type I) error rate can be taken directly from the 2α column of a t-table at the appropriate degrees of freedom (v). For example, for a false-change error rate of 0.10 use the 2α = 0.10 column. The appropriate tß coefficient for a specified missed-change error level can be looked up by calculating 2(1-power) and looking up that value in the appropriate 2α column. For example, for a power of 0.90, the calculations for tß would be 2(1-.90) = 0.20. Use the 2α = 0.20 column at the appropriate degrees of freedom (v) to obtain the appropriate t-value. 13 Determining The Necessary Sample Size For Estimating A Single Population Mean Or A Single Population Total With A Specified Level Of Precision. Estimating a sample mean vs. total population size. The sample size needed to estimate confidence intervals that are within a given percentage of the estimated total population size is the same as the sample size needed to estimate confidence intervals that are within that percentage of the estimated mean value. The instructions below assume you are working with a sample mean. Determining sample size for a single population mean or a single population total is a two or three-step process. (1) The first step is to use the equation provided below to calculate an uncorrected sample size estimate. (2) The second step is to consult the Sample Size Correction Table appearing below these instructions to come up with the corrected sample size estimate. The use of the correction table is necessary because the equation below under-estimates the number of samples that will be needed to meet the specified level of precision. The use of the table to correct the underestimated sample size is simpler than using a more complex equation that does not require correction. (3) The third step is to multiply the corrected sample size estimate by the finite population correction factor if more than 5% of the population area is being sampled. (1) Calculate an initial sample size using the following equation: n Where: 14 Z 2 s 2 B2 n The uncorrected sample size estimate. Zα The standard normal coefficient from the table below. s The standard deviation. B The desired precision level expressed as half of the maximum acceptable confidence interval width. This needs to be specified in absolute terms rather than as a percentage. For example, if you wanted your confidence interval width to be within 30% of your sample mean and your sample mean = 10 plants/quadrat then B = 0.30 x 10 = 3.0. Table Of Standard Normal Deviates (Zα) for Various Confidence Levels Confidence level Alpha (α) level 0.20 80% 0.10 90% 0.05 95% 0.01 99% Zα 1.28 1.64 1.96 2.58 (2) To obtain the adjusted sample size estimate, consult the correction table of these instructions. n is the uncorrected sample size value from the sample size equation. n* is the corrected sample size value. (3) Additional correction for sampling finite populations. The above formula assumes that the population is very large compared to the proportion of the population that is sampled. If you are sampling more than 5% of the whole population then you should apply a correction to the sample size estimate that incorporates the finite population correction factor (FPC). This will reduce the sample size. The formula for correcting the sample size estimate with the FPC for confidence intervals is: n n* 1 n* N Where: n' The new FPC-corrected sample size. n* The corrected sample size from the sample size correction table. N The total number of possible quadrat locations in the population. To calculate N, determine the total area of the population and divide by the size of one quadrat. 15 Example: Management objective: Restore the population of species Y in population Z to a density of at least 30 plants/quadrat by the year 2001 Sampling objective: Obtain estimates of the mean density and population size of 95% confidence intervals within 20% of the estimated true value. Results of pilot sampling: Mean ( x ) = 25 plants/quadrat. Standard deviation (s) = 7 plants. Given: The desired confidence level is 95% so the appropriate Zα from the table above is 1.96. The desired confidence interval width is 20% (0.20) of the estimated true value. Since the estimated true value is 25 plants/quadrat, the desired confidence interval (B) is 25 x 0.20 = 5 plants/quadrat. Calculate an unadjusted estimate of the sample size needed by using the sample size formula: n Z 2 s 2 B2 1.96 2 7 2 52 7.53 Round 7.53 plots up to 8 plots for the unadjusted sample size. To adjust this preliminary estimate, go to the sample size correction table and find n = 8 and the corresponding n* value in the 95% confidence level portion of the table. For n = 8, the corresponding value is n* = 15. The corrected estimated sample size needed to be 95% confident that the estimate of the population mean is within 20% (±5 plants) of the true mean is 15 quadrats. Additional correction for sampling finite populations: The above formula assumes that the population is very large compared to the proportion of the population that is sampled. If you are sampling more than 5% of the whole population area then you should apply a correction to the sample size estimate that incorporates the finite population correction factor (FPC). This will reduce the sample size. The formula for correcting the sample size is as follows: 16 The formula for correcting the sample size estimate with the FPC for confidence intervals is: n* n 1 n* N Where: n' The new FPC-corrected sample size. n* The corrected sample size from the sample size correction table. N The total number of possible quadrat locations in the population. To calculate N, determine the total area of the population and divide by the size of one quadrat. Example: If the pilot data described above was gathered using a 1m x 10m (10 m2) quadrat and the total population being sampled was located within a 20m x 50m macroplot (1000 m2) then N = 1000m2/10m2 = 100. The corrected sample size would then be: n n* 1 * n N 15 13.04 15 1 100 The new, FPC-corrected, estimated sample size to be 95% confident that the estimate of the population mean is within 20% (±5 plants) of the true mean is 13 quadrats. Determining The Necessary Sample Size For Detecting Differences Between Two Means With Temporary Sampling Units. n 2s 2 Z Z 2 MDC 2 Where: 17 n The uncorrected sample size estimate. s sample standard deviation. Zα Z-coefficient for the false-change (Type I) error rate from the table below. Zß Z-coefficient for the missed-change (Type II) error rate from the table below. MDC Minimum detectable change size. This needs to be specified in absolute terms rather than as a percentage. For example, if you wanted to detect a 20% change in the sample mean from one year to the next and your first year sample mean is 10 plants/quadrat then MDC is 0.20 x 10 = 2 plants/quadrat. Table of standard normal deviates for Zα False-change (Type I) error rate (α) Zα 0.40 0.84 0.20 1.28 0.10 1.64 0.05 1.96 0.01 2.58 Table of standard normal deviates for Zß Missed-change (Type II) error rate (ß) Power Zß 0.40 0.60 0.25 0.20 0.80 0.84 0.10 0.90 1.28 0.05 0.95 1.64 0.01 0.99 2.33 Example: Management objective: Increase the density of species F at Site Y by 20% between 1999 and 2004. Sampling objective: I want to be 90% certain of detecting a 20% in mean plant density and I am willing to accept a 10% chance that I will make a false-change error (conclude that a change took place when it really did not). Results from pilot sampling: 18 Mean (x) = 25 plants/quadrat Standard deviation (s) = 7 plants. Given: The acceptable False-change error rate (α) is 0.10 so the appropriate Zα from the table is 1.64. The desired Power is 90% (0.90) so the Missed-change error rate (ß) is 0.10 and the appropriate Zß, coefficient from the table is 1.28. The Minimum Detectable Change (MDC) is 20% of the 1993 value or .20 x 25 = 5 plants/quadrat. Calculate the estimated necessary sample size using the equation provided above: n 2s 2 Z Z 2 MDC 2 2 7 2 1.64 1.282 52 33.42 Round up 33.42 to 34 plots. Final estimated sample size needed to be 90% confident of detecting a change of 5 plants between 1993 and 1994 with a false-change error rate of 0.10 is 34 quadrats. The sample size correction table is not needed for estimating sample sizes for detecting differences between two population means. Correction for sampling finite populations: The above formula assumes that the population is very large compared to the proportion of the population that is sampled. If you are sampling more than 5% of the whole population area then you should apply a correction to the sample size estimate that incorporates the finite population correction factor (FPC). This will reduce the sample size. The formula for correcting the sample size estimate is as follows: The formula for correcting the sample size estimate with the FPC for confidence intervals is: n n* 1 n* N Where: 19 n' The new sample size based upon inclusion of the finite population correction factor n* The corrected sample size from the sample size correction table. N The total number of possible quadrat locations in the population. To calculate N, determine the total area of the population and divide by the size of the sampling unit. Example: If the pilot data described above was gathered using a 1m x 10m (10 m2) quadrat and the total population being sampled was located within a 20m x 50m macroplot (1000 m2) then N = 1000m2/10m2 = 100. The corrected sample size would then be: n n* 1 * n N 34 25.37 34 1 100 Round up 25.37 to 26. The new, FPC-corrected estimated sample size needed to be 90% certain of detecting a change of 5 plants between 1993 and 1994 with a false-change error rate of 0.10 is 26 quadrats. Note on the statistical analysis for two sample tests from finite populations. If you have sampled more than 5% of an entire population then you should also apply the finite population correction factor to the results of the statistical test. This procedure involves dividing the test statistic by the square root of the finite population factor (1-n/N). For example, if your t-statistic from a particular test turned out to be 1.645 and you sampled n = 26 quadrats out of a total N = 100 possible quadrats, then your correction procedure would look like the following: t t n 1 N 1.645 1.912 26 1 100 Where: 20 t The t-statistic from a t-test. t' The corrected t-statistic using the FPC. n The sample size from the equation above. N The total number of possible quadrat locations in the population. To calculate N, determine the total area of the population and divide by the size of each individual sampling unit. You would need to look up the p-value of t' = 1.912 in a t-table at the appropriate degrees of freedom to obtain the correct p-value for this statistical test. Determining The Necessary Sample Size For Detecting Differences Between Two Means When Using Paired Or Permanent Sampling Units. When paired sampling units are being compared or when data from permanent quadrats are being compared between two time periods, then sample size determination requires a different procedure than if samples are independent of one another. The equation for determining the number of samples necessary to detect some “true” difference between two sample means is: n s 2 Z Z 2 MDC 2 Where: n The uncorrected sample size estimate. s sample standard deviation. Zα Z-coefficient for the false-change (Type I) error rate from the table below. Zß Z-coefficient for the missed-change (Type II) error rate from the table below. MDC Minimum detectable change size. This needs to be specified in absolute terms rather than as a percentage. For example, if you wanted to detect a 20% change in the sample mean from one year to the next and your first year sample mean is 10 plants/quadrat then MDC is 0.20 x 10 = 2 plants/quadrat. 21 Table of standard normal deviates for Zα False-change (Type I) error rate (α) Zα 0.40 0.84 0.20 1.28 0.10 1.64 0.05 1.96 0.01 2.58 Table of standard normal deviates for Zß Missed-change (Type II) error rate (ß) Power Zß 0.40 0.60 0.25 0.20 0.80 0.84 0.10 0.90 1.28 0.05 0.95 1.64 0.01 0.99 2.33 If the objective is to track changes over time with permanent sampling units and only a single year of data is available, then you will not have a standard deviation of differences between the paired samples. If you have an estimate of the likely degree of correlation between the two years of data, and you assume that the among sampling units standard deviation is going to be the same in the second time period, then you can use the equation below to estimate the standard deviation of differences. s diff s1 2 1 corrdiff Where: sdiff Estimated standard deviation of the differences between paired samples. s1 Sample standard deviation among sampling units at the first time period. corrdiff Correlation coefficient between sampling unit values in the first time period and sampling unit values in the second time period. 22 Example Management Objective: Achieve at least a 20% higher density of species F at Site Y in unburned areas compared to burned areas in 1999. Sampling objective: I want to be able to detect a 90% difference in mean plant density in unburned areas and adjacent burned areas. I want to be 90% certain of detecting that difference, if it occurs, and I am willing to accept a 10% chance of detecting that difference, if it occurs, and I am willing to accept a 10% change that I will make a false-change error (conclude that a difference exists when it really did not). Results from pilot sampling: Five paired quadrats were sampled where one member of the pair was excluded from burning and the other member of the pair was burned. number of plants/quadrat Difference between Quadrat number burned unburned 1 2 3 1 2 5 8 3 3 4 9 5 4 7 12 5 5 3 7 4 x =4.20 s =1.92 x =7.80 s =3.27 x =3.60 s =1.67 MTB > DATA> DATA> DATA> DATA> MTB > MTB > MTB > MTB > SUBC> SUBC> burned and unburned set c1 2 5 4 7 3 set c2 3 8 9 12 7 name c1 'burned' name c2 'unburned' let c3 = 'unburned' - 'burned' name c3 'difference' Describe 'burned' 'unburned' 'difference'; Mean; StDeviation. Descriptive Statistics: burned, unburned, difference Variable burned unburned difference Mean 4.200 7.80 3.600 StDev 1.924 3.27 1.673 Given: The sampling objective specified a desired minimum detectable difference (i.e., equivalent to the MDC) of 20%. Taking the larger of the two mean values and multiplying by 20% leads to: 7.80 x 0.20 = MDC = 1.56 plants quadrat. 23 The appropriate standard deviation to use is 1.67, the standard deviation of the differences between the pairs. The acceptable False-change error rate (α) is 0.10 so the appropriate Zα from the table is 1.64. The desired Power is 90% (0.90) so the Missed-change error rate (ß) is 0.10 and the appropriate Zß coefficient from the table is 1.28. Calculate the estimated necessary sample size using the equation provided above: n s 2 Z Z MDC 2 2 1.67 2 1.64 1.282 1.56 2 9.77 Round up 9.77 to 10 plots. Final estimated sample size needed to be 90% certain of detecting a true difference of 1.56 plants/quadrat between the burned and unburned quadrats with a false-change error rate of 0.10 is 10 quadrats. Example Management objective: Increase the density of species F at Site Q by 20% between 1999 and 2002. Sampling objective: I want to be able to detect a 20% difference in mean plant density of species F at Site Q between 1999 and 2001. I want to be 90% certain of detecting that change, if it occurs, and I am willing to accept a 10% chance that I will make a false-change error (conclude that a difference exists when it really did not). The procedure for determining the necessary sample size for this example would be very similar to the previous example. Just replace “burned” and “unburned” in the data table with “1999” and “2002” and the rest of the calculations would be the same. Because the sample size determination procedure needs the standard deviation of the difference between two samples, you will not have the necessary standard deviation term to plug into the equation until you have two years of data. The standard deviation of the difference can be estimated in the first year if some estimate of the correlation coefficient between sampling unit values in the first time period and the sampling unit values in the second time period is available (see the sdiff equation above). 24 Correction for sampling finite populations: The above formula assumes that the population is very large compared to the proportion of the population that is sampled. If you are sampling more than 5% of the whole population area then you should apply a correction to the sample size estimate that incorporates the finite population correction factor (FPC). This will reduce the sample size. The formula for correcting the sample size estimate is as follows: n n* 1 n* N Where: n' The new sample size based upon inclusion of the finite population correction factor. n* The corrected sample size from the sample size correction table. N The total number of possible quadrat locations in the population. To calculate N, determine the total area of the population and divide by the size of the sampling unit. Example If the pilot data described above was gathered using a 1m x 10m (10 m2) quadrat and the total population being sampled was located within a 10m x 50m macroplot (500 m2) then N = 500m2/10m2 = 50. The corrected sample size would then be: n n* 1 * n N 10 8.33 10 1 50 Round up 8.33 to 9. The new, FPC-corrected estimated sample size needed to be 90% confident of detecting a true difference of 1.56 plants/quadrat between the burned and unburned quadrats with a false-change error rate of 0.10 is 9 quadrats. Note on the statistical analysis for two sample tests from finite populations. If you have sampled more than 5% of an entire population then you should also apply the finite population correction factor to the results of the statistical test. This procedure involves dividing the test statistic by the square root of (1-n/N). For 25 example, if your t-statistic from a particular test turned out to be 1.782 and you sampled n = 9 quadrats out of a total N = 50 possible quadrats, then your correction procedure would look like the following: t t n 1 N 1.782 1.968 9 1 50 Where: t The t-statistic from a t-test. t' The corrected t-statistic using the FPC. n The sample size from the equation above. N The total number of possible quadrat locations in the population. To calculate N, determine the total area of the population and divide by the size of each individual sampling unit. You would need to look up the p-value of t' = 1.968 in a t-table at the appropriate degrees of freedom to obtain the correct p-value for this statistical test. Determining The Necessary Sample Size For Estimating A Single Population Proportion With A Specified Level Of Precision. Determining the necessary sample size for estimating a single population proportion with a specified level of precision. The equation for determining the sample size for estimating a single proportion is: n pqZ 2 d2 Where: 26 n Estimated necessary sample size. Zα The coefficient from the table of standard normal deviates below. p The value of the proportion as a decimal percent (e.g., 0.45). q 1-p d The desired precision level expressed as half of the maximum acceptable confidence interval width. This is also expressed as a decimal percent (e.g., 0.15) and this represents an absolute rather than a relative value. For example, if your proportion value is 30% and you want a precision level of ± 10% this means you are targeting an interval width from 20% to 40%. Use 0.10 for the d-value and not 0.30 x 0.10 = 0.03. Table of standard normal deviates (Zα) for various confidence levels Confidence level Alpha (α) level (Zα) 80% 0.20 1.28 90% 0.10 1.64 95% 0.05 1.96 99% 0.01 2.58 Example: Management objective: Maintain at least a 40% frequency (in 1m2 quadrats) of species Y in population Z over the next 5 years. Sampling objective: Estimate percent frequency with 95% confidence intervals no wider than ± 10% of the estimated true value. Results of pilot sampling: The proportion of quadrats with species Z is estimated to be p = 65% (0.65). Because q = (1-p), q = 1-.65 = 0.35. Given: The desired confidence level is 95% so the appropriate Zæ from the table above is 1.96. The desired confidence interval width (d) is specified as 10% (0.10). 27 Using the equation provided above: n pqZ 2 d2 0.65 0.35 1.96 2 0.10 2 87.39 Round up 87.39 to 88. The estimated sample size needed to be 95% confident that the estimate of the population percent frequency is within 10% (±0.10) of the true percent frequency is 88 quadrats. This sample size formula works well as long as the proportion is more than 0.20 and less than 0.80. If you suspect the population proportion is less than 0.20 or greater than 0.80, use 2.20 or 0.8, respectively, as a conservative estimate of the proportion. Correction for sampling finite populations: The above formula assumes that the population is very large compared to the proportion of the population that is sampled. If you are sampling more than 5% of the whole population area then you should apply a correction to the sample size estimate that incorporates the finite population correction factor (FPC). This will reduce the sample size. The formula for correcting the sample size estimate is as follows: n n* 1 n* N Where: n' The new sample size based upon inclusion of the finite population correction factor. n The corrected sample size from the sample size correction table. N The total number of possible quadrat locations in the population. To calculate N, determine the total area of the population and divide by the size of the sampling unit. Example: If the pilot data described above was gathered using a 1m x 1m (1 m2) quadrat and the total population being sampled was located within a 25m x 25m macroplot (625 m2) then N = 625m2/1m2 = 625. The corrected sample size would then be: 28 n n* 1 * n N 88 77.13 88 1 625 Round up 77.13 to 78. The new, FPC-corrected, estimated sample size needed to be 95% confident that the estimate of the population percent frequency is within 10% (± 0.10) of the true percent frequency is 78 quadrats. Alternately including a term for Type II errors gives n p(1 p)( z / 2 z ) 2 d2 . Where: n Estimated necessary sample size. zα/2 The coefficient of the standard normal deviates, conventionally α = .05. The probability of a Type I error. The coefficient of the standard normal deviates, conventionally β = .2. zβ The probability of a Type II error, meaning 1−β is power. p The value of the proportion as a decimal percent (e.g. 0.45). d The desired precision level expressed as half of the maximum acceptable confidence interval width. This is also expressed as a decimal percent (e.g. 0.15) and this represents an absolute rather than a relative value. For example, if your proportion value is 30% and you want a precision level of ±10% this means you are targeting an interval width from 20% to 40%. Use 0.10 for the d-value and not 0.30 x 0.10 = 0.03. For example suppose that the response rate of a patient population under study after treatment is expected to be around 50% (i.e. p = 0.50). At α = 0.05, the required sample size for having an 80% power (i.e. β = 0.2) for correctly detecting a difference between the post-treatment response rate and the reference value of 30% (i.e. d = 0.2) is n p(1 p)( z / 2 z ) 2 d2 0.5(1 0.5)(1.96 0.84) 2 49.05 50 0.2 2 A cross check via Minitab MTB > Power; 29 SUBC> SUBC> SUBC> SUBC> SUBC> POne; PCompare .7; Power .8; PNull .5; GPCurve. Power and Sample Size Test for One Proportion Testing p = 0.5 (versus not = 0.5) Alpha = 0.05 Comparison p 0.7 Sample Size 47 Target Power 0.8 Actual Power 0.803325 Determining The Necessary Sample Size For Detecting Differences Between Two Proportions With Temporary Sampling Units. n Z Z 2 p1q1 p 2 q 2 p 2 p1 2 Where: n Estimated necessary sample size. Zα Z-coefficient for the false-change (Type I) error rate from the table below. Zß Z-coefficient for the missed-change (Type II) error rate from the table below. p1 The value of the proportion for the first sample as a decimal (e.g., 0.65). q1 1 - p1. p2 The value of the proportion for the second sample as a decimal (e.g., 0.45). q2 1 - p2. Table of standard normal deviates for Zα False-change (Type I) error rate (α) Zα 0.40 0.84 0.20 1.28 0.10 1.64 0.05 1.96 0.01 2.58 30 Table of standard normal deviates for Zß Missed-change (Type II) error rate (ß) Power Zß 0.40 0.60 0.25 0.20 0.80 0.84 0.10 0.90 1.28 0.05 0.95 1.64 0.01 0.99 2.33 Example: Management objective: Decrease the frequency of invasive weed F at Site G by 20% between 1999 and 2001. Sampling objective: I want to be 90% certain of detecting an absolute change of 20% frequency and I am willing to accept a 10% chance that I will make a false-change error (conclude that a change took place when it really did not). Note that the magnitude of change for detecting change over time for proportion data is expressed in absolute terms rather than in relative terms (relative terms where used in earlier examples that dealt with sample means values). The reason absolute terms are used instead of relative terms relates to the type of data being gathered (percent frequency is already expressed as a relative measure). Think of taking your population area and dividing it into a grid where the size of each grid cell equals your quadrat size. When you estimate a percent frequency, you are estimating the proportion of these grid cells occupied by a particular species. If 45% of all the grid cells in the population are occupied by a particular species then you hope that your sample values will be close to 45%. If over time the population changes so that now 65% of all the grid cells are occupied, then the true percent frequency has changed from 45% to 65%, representing a 20% absolute change. Results from pilot sampling: The proportion of quadrats with species Z in 1999 is estimated to be p1 = 65% (0.65). Because q1 = 1-p1, q1 = 1-.65 = 0.35. 31 Because we are interested in detecting a 20% shift in percent frequency, we will assign p2 = 0.45. This represents a shift of 20% frequency from 1999 to 2001. A decline was selected instead of an increase (e.g., from 65% frequency to 85% frequency) because sample size requirements are higher at the mid-range of frequency values (i.e., closer to 50%) than they are closer to 0 or 100. Sticking closer to the mid-range gives us a more conservative sample size estimate. Because q1 = 1-q2, q1 = 1-0.45 = 0.55. Given: The acceptable False-change error rate (α) is 0.10 so the appropriate Zα from the table is 1.64. The desired Power is 90% (0.90) so the Missed-change error rate (p) is 0. 10 and the appropriate Zß coefficient from the table is 1.28. Using the equation provided above: n Z Z 2 p1q1 p 2 q 2 1.64 1.282 0.65 0.35 0.45 0.55 101.25 p 2 p1 0.45 0.652 Round up 101.25 to 102. The estimated sample size needed to be 90% sure of detecting a shift of 20% frequency with a starting frequency of 65% and a false-change error rate of 0.10 is 102 quadrats. Correction for sampling finite populations: The above formula assumes that the population is very large compared to the proportion of the population that is sampled. If you are sampling more than 5% of the whole population area then you should apply a correction to the sample size estimate that incorporates the finite population correction factor (FPC). This will reduce the sample size. The formula for correcting the sample size estimate is as follows: n n* 1 n* N Where: n' The new sample size based upon inclusion of the finite population correction factor. n The corrected sample size from the sample size correction table. 32 N The total number of possible quadrat locations in the population. To calculate N, determine the total area of the population and divide by the size of the sampling unit. Example: If the pilot data described above was gathered using a 1m x 1m (1m2) quadrat and the total population being sampled was located within a 10m x 30m macroplot (300 m2) then N = 300m2/1m2 = 300. The corrected sample size would then be: n n* 1 * n N 102 76.11 102 1 300 Round up 76.11 to 77. The new, FPC-corrected estimated sample size needed to be 90% sure of detecting an absolute shift of 20% frequency with a starting frequency of 65% and a false-change error rate o 0.10 - 77 quadrats. Note on the statistical analysis for two sample tests from finite populations. If you have sampled more than 50% of an entire population then you should also apply the finite population correction factor to the results of the statistical test. For proportioning data this procedure involves dividing the test statistic by ( 1 nN ). For example, if your χ2 -statistic from a particular test turned out to be 2.706 and you sampled n-77 quadrats out of a total N = 300 possible quadrats, then your correction procedure would look like the following: 2 2 n 1 N 2.706 77 1 300 Where: χ2 2 The χ2 - statistic from a χ2 - statistic -test. The corrected χ2- statistic using the FPC. n The sample size from the equation above. N The total number of possible quadrat location in the population. To calculate N, determine the total area of the population and divide by the size of each individual sampling unit. You would need to look up the p-value of χ2 = 3.640 in a χ2 table for the appropriate degrees of freedom to obtain the corrected p-value for this statistical test. 33 Caveat It is well known that statistical power calculations can be valuable in planning an experiment. There is also a large literature advocating that power calculations be made whenever one performs a statistical test of a hypothesis and one obtains a statistically non-significant result. Advocates of such post-experiment power calculations claim the calculations should be used to aid in the interpretation of the experimental results. This approach, which appears in various forms, is fundamentally flawed. We document that the problem is extensive and present arguments to demonstrate the flaw in the logic. The abuse of power: The pervasive fallacy of power calculations for data analysis, Hoenig JM, Heisey DM American Statistician, 55:1, 19-24, 2001 See also Statistical methods in psychology journals - Guidelines and explanations, Wilkinson, L; Task Force Stat Inference, American Psychologist 54:8 594-604, 1999 The Incompleteness of Probability Models and the Resultant Implications for Theories of Statistical Interference.Preview, Macdonald, Ranald R. Understanding Statistics, 1:3, 167-189, 2002 Some Myths and Legends in Quantitative Psychology.Preview, Grayson, Dave. Understanding Statistics, 3:2 101-134, 2004 Some practical guidelines for effective sample size determination, Lenth RV American Statistician, 55:3, 187-193, 2001 Bibliography Sample Size Calculations in Clinical Research by Shein-Chung Chow, Jun Shao and Hansheng Wang Sample size calculation is usually conducted through a pre-study power analysis. The purpose is to select a sample size such that the selected sample size will achieve a desired power for correctly detection of a pre-specified clinically meaningful difference at a given level of significance. In 34 clinical research, however, it is not uncommon to perform sample size calculation with inappropriate test statistics for wrong hypotheses regardless what study design is employed. This book provides formulas and/or procedures for determination of sample size required not only for testing equality, but also for testing non-inferiority/superiority, and equivalence (similarity) based on both untransformed (raw) data and log-transformed data under a parallel-group design or a crossover design with equal or unequal ratio of treatment allocations. It provides not only a comprehensive and unified presentation of various statistical procedures for sample size calculation that are commonly employed at various phases of clinical development, but also a well-balanced summary of current regulatory requirements, methodology for design and analysis in clinical research, and recent developments in the area of clinical development. Chapman & Hall/CRC Biostatistics Series Volume: 20 2007 ISBN: 9781584889823 ISBN 10: 1584889829 Sample size estimation: How many individuals should be studied? By Eng John Radiology 227 (2): 309-313 May 2003 The number of individuals to include in a research study, the sample size of the study, is an important consideration in the design of many clinical studies. This article reviews the basic factors that determine an appropriate sample size and I provides methods for its calculation in some simple, yet common, cases. Sample size is closely tied to statistical power, which is the ability of a study to enable detection of a statistically significant difference when there truly is one. A trade-off exists between a feasible sample size and adequate statistical power. Strategies for reducing the necessary sample size while maintaining a reasonable power will also be discussed. Power and Sample Size Estimation in Research by Ajeneye Francis The Biomedical Scientist November 2006 988-990 Sample size determination by Dell RB, Holleran S, Ramakrishnan R ILAR JOURNAL 43(4) 207-213 2002 Note the correction to Sample size determination (vol 43, pg 207, 2002) by Dell RB, Holleran S, Ramakrishnan R 35 ILAR JOURNAL 44 (3): 239-239 2003 Statistical power and estimation of the number of required subjects for a study based on the t-test: A surgeon's primer by Livingston EH, Cassidy L Journal Of Surgical Research 126 (2): 149-159 Jun 15 2005 The underlying concepts for calculating the power of a statistical test elude most investigators. Understanding them helps to know how the various factors contributing to statistical power factor into study design when calculating the required number of subjects to enter into a study. Most journals and funding agencies now require a justification for the number of subjects enrolled into a study and investigators must present the principals of powers calculations used to justify these numbers. For these reasons, knowing how statistical power is determined is essential for researchers in the modern era. The number of subjects required for study entry, depends on the following four concepts: 1) The magnitude of the hypothesized effect (i.e., how far apart the two sample means are expected to differ by); 2) the underlying variability of the outcomes measured (standard deviation); 3) the level of significance desired (e.g., &alpha; = 0.05); 4) the amount of power desired (typically 0.8). If the sample standard deviations are small or the means are expected to be very different then smaller numbers of subjects are required to ensure avoidance of type 1 and 2 errors. This review provides the derivation of the sample size equation for continuous variables when the statistical analysis will be the Student's t-test. We also provide graphical illustrations of how and why these equations are derived. 36 Sample Size Correction Table for Single Parameter Estimates Sample size correction table for adjusting “point-in-time” parameter estimates. n is the uncorrected sample size value from the sample size equation. n* is the corrected sample size value. This table was created using the algorithm reported by Kupper and Haffier (1989) for a one-sample tolerance probability of 0.90. For more information consult Kupper, L.L. and K.B. Hafner. 1989. How appropriate are popular sample size formulas? The American Statistician (43) 101-105. 80% Confidence Level n* n 90% Confidence Level n* n n* n n* n 95% Confidence Level n* n n* n n* n 99% Confidence Level n n* n n* n n* n n* n n* 1 5 51 65 101 120 1 5 51 65 101 120 1 5 51 66 101 121 1 6 51 67 101 122 2 6 52 66 102 121 2 6 52 66 102 122 2 7 52 67 102 122 2 8 52 68 102 123 3 7 53 67 103 122 3 8 53 67 103 123 3 8 53 68 103 123 3 9 53 69 103 124 4 9 54 68 104 123 4 9 54 69 104 124 4 10 54 69 104 124 4 11 54 70 104 125 5 10 55 69 105 124 5 11 55 70 105 125 5 11 55 70 105 125 5 12 55 72 105 126 6 11 56 70 106 125 6 12 56 71 106 126 6 12 56 71 106 126 6 14 56 73 106 128 7 13 57 71 107 126 7 13 57 72 107 127 7 14 57 72 107 128 7 15 57 74 107 129 8 14 58 73 108 128 8 15 58 73 108 128 8 15 58 74 108 129 8 16 58 75 108 130 9 15 59 74 109 129 9 16 59 74 109 129 9 16 59 75 109 130 9 18 59 76 109 131 10 17 60 75 110 130 10 17 60 75 110 130 10 18 60 76 110 131 10 19 60 77 110 132 11 18 61 76 111 131 11 18 61 76 111 131 11 19 61 77 111 132 11 20 61 78 111 133 12 19 62 77 112 132 12 20 62 78 112 132 12 20 62 78 112 133 12 22 62 79 112 134 13 20 63 78 113 133 13 21 63 79 113 133 13 21 63 79 113 134 13 23 63 80 113 135 14 22 64 79 114 134 14 22 64 80 114 134 14 23 64 80 114 135 14 24 64 82 114 136 15 23 65 80 115 135 15 23 65 81 115 135 15 24 65 81 115 136 15 25 65 83 115 138 16 24 66 82 116 136 16 25 66 82 116 136 16 25 66 83 116 137 16 26 66 84 116 139 17 25 67 83 117 137 17 26 67 83 117 137 17 26 67 84 117 138 17 28 67 85 117 140 18 27 68 84 118 138 18 27 68 84 118 138 18 28 68 85 118 139 18 29 68 86 118 141 19 28 69 85 119 140 19 28 69 85 119 140 19 29 69 86 119 141 19 30 69 87 119 142 37 80% Confidence Level n n* n n* n 90% Confidence Level n* n n* n n* n 95% Confidence Level n* n n* n n* n 99% Confidence Level n* n n* n n* n n* 20 29 70 86 120 141 20 29 70 86 120 141 20 30 70 87 120 142 20 31 70 88 120 143 21 30 71 87 121 142 21 31 71 88 121 142 21 31 71 88 121 143 21 32 71 89 121 144 22 31 72 88 122 143 22 32 72 89 122 143 22 32 72 89 122 144 22 34 72 90 122 145 23 33 73 89 123 144 23 33 73 90 123 144 23 34 73 90 123 145 23 35 73 92 123 146 24 34 74 90 124 145 24 34 74 91 124 145 24 35 74 91 124 146 24 36 74 93 124 147 25 35 75 91 125 146 25 35 75 92 125 147 25 36 75 92 125 147 25 37 75 94 125 148 26 36 76 93 126 147 26 37 76 93 126 148 26 37 76 94 126 148 26 38 76 95 126 149 27 37 77 94 127 148 27 38 77 94 127 149 27 38 77 95 127 149 27 39 77 96 127 150 28 38 78 95 128 149 28 39 78 95 128 150 28 39 78 96 128 150 28 41 78 97 128 151 29 40 79 96 129 150 29 40 79 96 129 151 29 41 79 97 129 151 29 42 79 98 129 153 30 41 80 97 130 151 30 41 80 97 130 152 30 42 80 98 130 152 30 43 80 99 130 154 31 42 81 98 131 152 31 42 81 99 131 153 31 43 81 99 131 154 31 44 81 100 131 155 32 43 82 99 132 154 32 44 82 100 132 154 32 44 82 100 132 155 32 45 82 101 132 156 33 44 83 100 133 155 33 45 83 101 133 155 33 45 83 101 133 156 33 46 83 103 133 157 34 45 84 101 134 156 34 46 84 102 134 156 34 46 84 102 134 157 34 48 84 104 134 158 35 47 85 102 135 157 35 47 85 103 135 157 35 48 85 103 135 158 35 49 85 105 135 159 36 48 86 104 136 158 36 48 86 104 136 158 36 49 86 104 136 159 36 50 86 106 136 160 37 49 87 105 137 159 37 49 87 105 137 159 37 50 87 105 137 160 37 51 87 107 137 161 38 50 88 106 138 160 38 50 88 106 138 161 38 51 88 106 138 161 38 52 88 108 138 163 39 51 89 107 139 161 39 52 89 107 139 162 39 52 89 107 139 162 39 53 89 109 139 164 40 52 90 108 140 162 40 53 90 108 140 163 40 53 90 108 140 163 40 55 90 110 140 165 41 53 91 109 141 163 41 54 91 110 141 164 41 54 91 110 141 164 41 56 91 111 141 166 42 55 92 110 142 164 42 55 92 111 142 165 42 56 92 111 142 165 42 57 92 112 142 167 43 56 93 111 143 165 43 56 93 112 143 166 43 57 93 112 143 166 43 58 93 114 143 168 44 57 94 112 144 166 44 57 94 113 144 167 44 58 94 113 144 168 44 59 94 115 144 169 45 58 95 113 145 168 45 58 95 114 145 168 45 59 95 114 145 169 45 60 95 116 145 170 46 59 96 115 146 169 46 60 96 115 146 169 46 60 96 116 146 170 46 61 96 117 146 171 47 60 97 116 147 170 47 61 97 116 147 170 47 61 97 117 147 171 47 62 97 118 147 172 48 61 98 117 148 171 48 62 98 117 148 171 48 62 98 118 148 172 48 64 98 119 148 173 49 62 99 118 149 172 49 63 99 118 149 172 49 63 99 119 149 173 49 65 99 120 149 174 50 64 100 119 150 173 50 64 100 119 150 173 50 65 100 120 150 174 50 66 100 121 150 175 38