Nick Barrowman, PhD
Senior Statistician
Clinical Research Unit, CHEO Research Institute
March 29, 2010
• Example: lowering blood pressure
• Introduction to some statistical issues in sample size determination
• Two simple approximate formulas
• Descriptions of sample size calculations from the literature
• Physicians design an intervention to reduce blood pressure in patients with high blood pressure
• But does it work? Need a study.
• How many participants are required?
• Too few: may not detect an effect even if there is one.
• Too many: may unnecessarily expose patients to risk.
• For intervention studies, the null hypothesis usually this: on average there is no effect.
is
• “Innocent until proven guilty”
• The physicians who designed the intervention believe the null hypothesis is false.
• The study is designed to test the null hypothesis.
• Often write H
0 for the null hypothesis.
• The population is considered to be all people who might be eligible for the intervention (might depend on age, other medical conditions, etc.)
• Study participants are viewed as a this population.
sample from
• Suppose for each study participant we measure blood pressure at baseline, and after 6 weeks of intervention
• Outcome is change in blood pressure
• H
0 is that mean change in BP is 0.
vs.
Population Random sample
Population mean of the change in blood pressure
Calculation
Inference
Sample mean of the change in blood pressure
Probability distributions
Population distribution of change in blood pressure mean Recall that variance is the square of the standard deviation, often written as s 2
± 1 standard deviation
Population distribution of change in blood pressure
Sampling distribution of mean change in blood pressure (N=1)
Sampling distribution of mean change in blood pressure (N=2)
Sampling distribution of mean change in blood pressure (N=5)
Sampling distribution of mean change in blood pressure (N=10)
Increasing sample size reduces the variability of the sample mean.
standard error
SE =
SD
N standard deviation
• As we’ve seen, increasing the sample size is akin to reducing the variance
• Equivalently, reducing the variance (e.g. using a more precise measurement device) can reduce the sample size requirements
Hypothesis test
Sampling distribution of the mean under the null hypothesis, a.k.a. the null distribution
Hypothesis test
Reject the null hypothesis if the observed mean is far in the tails of the null distribution, i.e. we have ruled out chance
Observed mean
Rejection region
Based on the study findings we infer either … that the intervention has no effect
(accept H
0
) or that the intervention has an effect
(reject H
0
)
Based on the study findings we infer either … that the intervention has no effect
(accept H
0
) or that the intervention has an effect
(reject H
0
)
In reality , either … the intervention has no effect (H
0 is true) or the intervention has an effect (H
0 is false)
Four
In reality , either … the intervention has no effect (H
0 is true) or the intervention has an effect (H
0 is false)
Based on the study findings we infer either … that the intervention has no effect
(accept H
0
) or that the intervention has an effect
(reject H
0
)
In reality , either … the intervention has no effect (H
0 is true) or the intervention has an effect (H
0 is false)
Based on the study findings we infer either … that the intervention has no effect
(accept H
0
) or that the intervention has an effect
(reject H
0
)
Correctly accept H
0
In reality , either … the intervention has no effect (H
0 is true) or the intervention has an effect (H
0 is false)
Based on the study findings we infer either … that the intervention has no effect
(accept H
0
) or that the intervention has an effect
(reject H
0
)
Correctly accept H
0
Correctly reject H
0
In reality , either … the intervention has no effect (H
0 is true) or the intervention has an effect (H
0 is false)
Based on the study findings we infer either … that the intervention has no effect
(accept H
0
) or that the intervention has an effect
(reject H
0
)
Correctly accept H
0
Type-I error Correctly reject H
0
In reality , either … the intervention has no effect (H
0 is true) or the intervention has an effect (H
0 is false)
Based on the study findings we infer either … that the intervention has no effect
(accept H
0
) or that the intervention has an effect
(reject H
0
)
Correctly accept H
0
Type-I error
Type-II error
Correctly reject H
0
If the null hypothesis is true, the rejection region of the test represents type-I error .
The probability of type-I error is the area of the red region below, and is denoted by .
• Type-II error is failing to reject the null hypothesis when it is false.
• The probability of type-II error is denoted .
• It depends on how big the true effect is
• Sample size calculations require specification of an alternative hypothesis , which indicates the size of effect we would like to detect
Relationship between type-I and type-II error
(alpha=0.05)
Relationship between type-I and type-II error
(alpha=0.10)
Relationship between type-I and type-II error
(alpha=0.20)
Relationship between type-I and type-II error
• Sample size calculations depend on the tradeoff between type-I and type-II error.
• We usually fix the probability of type-I error
(alpha) at 5% and then try to minimize the probability of type-II error (beta).
• Define Power = 1 – beta
• We want to maximize power
• One way to do this is by increasing the sample size
• Suppose the variance in the change in blood pressure, sigma 2 , is the same for the null and alternative hypotheses
• Suppose alpha is fixed at 0.05 and we use twosided tests (allowing for the possibility that blood pressure could be either increased or decreased by the intervention)
• Then we will have approximately 80% power to detect a mean change in blood pressure if we enroll N participants, where delta
N = 8 sigma 2 / delta 2
(approximately)
• Suppose the standard deviation of the change in blood pressure is anticipated to be 7 mmHg (so the variance is 49)
• Suppose we fix alpha at 0.05 and we’d like to have approximately 80% power to detect a mean change of 5 mmHg
• Then we would need about 16 participants
• So far, the example has used a single group of study participants
• Usually we want to compare two groups: a control group that receives “standard of care” or placebo, and an experimental group receives a new intervention that
• This is how most randomized controlled trials are set up
• In this case, delta is the difference between the means of the two groups.
• For simplicity, assume that the variance is the same in the two groups.
• A similar approximate formula applies, again assuming alpha=0.05 and power=80%:
N per group
= 16 sigma 2 / delta 2
(approximately)
• Careful! This is the required sample size per group .
• Also, note that the constant is double what is was for the case of a single group.
• So the total sample size is 4 times as large .
• Suppose we want to compare patients randomized to placebo with patients randomized to a new intervention
• Suppose the standard deviation is anticipated to again be 7 mmHg (so the variance is 49)
• Suppose we fix alpha at 0.05 and we’d like to have approximately 80% power to detect a change of 5 mmHg
• Then we would need about 32 participants per group, for a total of about 64 participants
Required sample size …
• increases with variance
• decreases with size of effect to detect
• decreases with probability of type-I error, alpha
• decreases with probability of type-II error, beta
• Different types of outcomes: dichotomous
(e.g. mortality), time-to-event (e.g. survival time), etc.
• Different designs: observational studies
(e.g. case-control), surveys, prevalence studies
• Practical considerations: e.g. costs, feasibility of recruitment
Review: A comedy of errors …
α = Probability of type-I error
Probability of a false conviction
(Rejecting the null hypothesis when it is in fact true.)
Power = 1 – β
Probability of a true conviction
(Rejecting the null hypothesis when it is in fact false.)