Introduction to Sample Size Determination

Introduction to Sample Size
Determination: “How
powerful do I need to be,
Dennis G. Fisher, Ph.D.
Center for Behavioral
Research and Services
California State University,
Long Beach
Power and Design
The One-Group Pretest-Posttest Design.
Must have 2 points in time (about 6
months apart for the administration of the
Must have method of linking time 1
responses to time 2 responses.
Three levels of measurement
Interval (Ratio) Measurement
Equal intervals (interval) with true zero
Dependent sample t-test.
“On how many occasions during the last
30 days have you had alcoholic beverages
to drink (more than just a few sips)?”
Ratio scale.
Dependent-samples t-test
Ho: µd = 0
Ha: µd ≠ 0 α=.05
d=difference scores (between time 1 and
time 2)
sd = standard error of difference
Sample Size Determination for
Dependent-Samples t-test
Formula for sample size determination.
 z  z  
 difference
 (how do you know this?)
σ=hypothesized standard deviation of difference
Zα Zβ are alpha and beta levels.
If p=.05 then Zα = 1.96 for two-tailed.
If power = .8 then Zβ=1.28.
Ordinal level of measurement
“How much do you think people risk
harming themselves (physically or in other
ways) if they take one or two drinks of
alcohol nearly every day?”
No risk, Slight risk, Moderate risk, Great
Ordering but not equal intervals.
Wilcoxon paired-sample test (aka signedrank test)
Wilcoxon Paired-Sample Test
Ho: Perceived risk at time 1 same as
perceived risk at time 2.
Ha: Perceived risk at time 1 is not the
same as perceived risk at time 2.
 1  n  n  1  
We   
 2
W1  We
 2n  1 We / 6
Wilcoxon Signed-Rank Test
W1 = Smaller of rank sums.
We = Expected sum of rank scores.
σw = Standard deviation of rank scores.
Ties are eliminated from analysis.
Nominal Scale of Measurement
McNemar’s Chi-Square Test
“I plan to get drunk
sometime in the next
year.” False True
Time 1
Time 2
f12  f 21 
f12  f 21
McNemar’s Chi-Square
Power calculation (Miettinen, 1968)
  3   
 z1 / s   z1   
Computer Programs
nQuery Advisor
Power and Precision
Statistica Power Analysis
Power and Sample Size (PASS)
SAS – The SAS Power and Sample Size
SPSS – SamplePower – Stand-alone
What do you need to know before
you use the computer program?
What is alpha? (What p value do you want?
Usual value .05)
What is beta? (Actually 1-beta or what power do
you want? Usual values .8, .85, .9)
What is your estimate of effect? (e.g. difference
between means etc.) How do you find this
What is your estimate of variance? (or SD etc.)
Obtain approximately 150% of required sample
at time 1 to account for loss to follow-up.
How to Increase Statistical Power
1. Add Subjects
Simple and direct, but also expensive.
2. Add more subjects to group which is cheaper,
If you can only add to one group, then do it
even though it will not be as efficient as keeping
sample sizes equal between the two groups.
Efficiency of this approach drastically reduces
after 2x in larger group.
Choose Less Stringent Alpha Level
Using a one-tailed test is the equivalent of
changing alpha from .05 to .10 for a twotailed test. If you specify a priori for a
one-tailed test (and your thesis chair
agrees) you can greatly increase power.
Increase Effect Size
1. By strengthening the intervention increase dose, increase number of
sessions, use multiple modalities etc.
2. By weakening the comparison group –
use no-treatment control.
3. Use extreme groups.
Use as Few Groups as Possible
The more groups, the more the total
sample will be split into smaller cell sizes.
The more groups, the smaller the number
of subjects for any specific comparison or
contrast. Student-Newman-Keuls is more
powerful in these situations, than Tukey
Use Covariates or Blocking
If the blocking variable is correlated with
the dependent variable, then the power
will increase with the size of the
Use Cross-Over, Repeated
Measures, Within-Subject Design
These designs can greatly increase power
if there is a high correlation between the
adjacent measures. For example, if the
time 1 measure is highly correlated with
the time 2 measure, then power is
increased by using this kind of a design.
For n-way ANOVA, Hypothesize
Main Effects Instead of Interactions
The Main Effects tests have more power
than the Interaction tests.
Measurement Issues
The Dependent Variable should be
sensitive to change as a result of the
The greater the reliability of the DV, the
lower the model error, the greater the
power. This means that assessing the
reliability is important, as well as quality
control procedures to reduce
administration variability.
Direct Measures Instead of Indirect
The use of proximal instead of distal
measures will increase power.
For instance, if an intervention increases
knowledge that hopefully will lead to
behavior change, that will lead to change
in physiological measures, there will be
more power to assess the intervention if
the dv is a change in measure of
Kuzma, J. W., & Bohnenblust, S. E. (2001).
Basic statistics for the health sciences. Mountain
View, CA: Mayfield.
Miettinen, O.S. (1968). On the matched-pairs
design in the case of all-or-none responses.
Biometrics, 24, 339-352.
Norman, G. R., & Streiner, D. L. (1998).
Biostatistics: The bare essentials. Hamilton,
Ontario: B. C. Decker.
Zar, J. H. (1984). Biostatistical analysis, second
edition. Englewood Cliffs, NJ: Prentice-Hall.