Introduction to Sample Size Determination: “How powerful do I need to be, anyway?” Dennis G. Fisher, Ph.D. Center for Behavioral Research and Services California State University, Long Beach Power and Design The One-Group Pretest-Posttest Design. Must have 2 points in time (about 6 months apart for the administration of the instrument). Must have method of linking time 1 responses to time 2 responses. Three levels of measurement considerations. Interval (Ratio) Measurement Equal intervals (interval) with true zero (ratio). Dependent sample t-test. “On how many occasions during the last 30 days have you had alcoholic beverages to drink (more than just a few sips)?” Ratio scale. Dependent-samples t-test Ho: µd = 0 Ha: µd ≠ 0 α=.05 d=difference scores (between time 1 and time 2) sd = standard error of difference scores. d t sd Sample Size Determination for Dependent-Samples t-test Formula for sample size determination. 2 z z n difference (how do you know this?) δ=hypothesized σ=hypothesized standard deviation of difference Zα Zβ are alpha and beta levels. If p=.05 then Zα = 1.96 for two-tailed. If power = .8 then Zβ=1.28. Ordinal level of measurement “How much do you think people risk harming themselves (physically or in other ways) if they take one or two drinks of alcohol nearly every day?” No risk, Slight risk, Moderate risk, Great risk Ordering but not equal intervals. Wilcoxon paired-sample test (aka signedrank test) Wilcoxon Paired-Sample Test Ho: Perceived risk at time 1 same as perceived risk at time 2. Ha: Perceived risk at time 1 is not the same as perceived risk at time 2. 1 n n 1 We w 2 2 Z W1 We w 2n 1 We / 6 Wilcoxon Signed-Rank Test W1 = Smaller of rank sums. We = Expected sum of rank scores. σw = Standard deviation of rank scores. Ties are eliminated from analysis. Nominal Scale of Measurement McNemar’s Chi-Square Test “I plan to get drunk sometime in the next year.” False True False Time 1 False True f11 f12 f21 f22 Time 2 2 f12 f 21 f12 f 21 2 True McNemar’s Chi-Square Power calculation (Miettinen, 1968) 2 3 z1 / s z1 4 n 2 2 Computer Programs nQuery Advisor Power and Precision Statistica Power Analysis Power and Sample Size (PASS) SAS – The SAS Power and Sample Size Application SPSS – SamplePower – Stand-alone product What do you need to know before you use the computer program? What is alpha? (What p value do you want? Usual value .05) What is beta? (Actually 1-beta or what power do you want? Usual values .8, .85, .9) What is your estimate of effect? (e.g. difference between means etc.) How do you find this information? What is your estimate of variance? (or SD etc.) Obtain approximately 150% of required sample at time 1 to account for loss to follow-up. How to Increase Statistical Power 1. Add Subjects Simple and direct, but also expensive. 2. Add more subjects to group which is cheaper, easier. If you can only add to one group, then do it even though it will not be as efficient as keeping sample sizes equal between the two groups. Efficiency of this approach drastically reduces after 2x in larger group. Choose Less Stringent Alpha Level Using a one-tailed test is the equivalent of changing alpha from .05 to .10 for a twotailed test. If you specify a priori for a one-tailed test (and your thesis chair agrees) you can greatly increase power. Increase Effect Size 1. By strengthening the intervention increase dose, increase number of sessions, use multiple modalities etc. 2. By weakening the comparison group – use no-treatment control. 3. Use extreme groups. Use as Few Groups as Possible The more groups, the more the total sample will be split into smaller cell sizes. The more groups, the smaller the number of subjects for any specific comparison or contrast. Student-Newman-Keuls is more powerful in these situations, than Tukey HSD. Use Covariates or Blocking Variables If the blocking variable is correlated with the dependent variable, then the power will increase with the size of the correlation. Use Cross-Over, Repeated Measures, Within-Subject Design These designs can greatly increase power if there is a high correlation between the adjacent measures. For example, if the time 1 measure is highly correlated with the time 2 measure, then power is increased by using this kind of a design. For n-way ANOVA, Hypothesize Main Effects Instead of Interactions The Main Effects tests have more power than the Interaction tests. Measurement Issues The Dependent Variable should be sensitive to change as a result of the intervention. The greater the reliability of the DV, the lower the model error, the greater the power. This means that assessing the reliability is important, as well as quality control procedures to reduce administration variability. Direct Measures Instead of Indirect The use of proximal instead of distal measures will increase power. For instance, if an intervention increases knowledge that hopefully will lead to behavior change, that will lead to change in physiological measures, there will be more power to assess the intervention if the dv is a change in measure of knowledge. References Kuzma, J. W., & Bohnenblust, S. E. (2001). Basic statistics for the health sciences. Mountain View, CA: Mayfield. Miettinen, O.S. (1968). On the matched-pairs design in the case of all-or-none responses. Biometrics, 24, 339-352. Norman, G. R., & Streiner, D. L. (1998). Biostatistics: The bare essentials. Hamilton, Ontario: B. C. Decker. Zar, J. H. (1984). Biostatistical analysis, second edition. Englewood Cliffs, NJ: Prentice-Hall.