Susan Stewart, Ph.D. UC Davis School of Medicine November, 2014

advertisement
Susan Stewart, Ph.D.
UC Davis School of Medicine
November, 2014




Intro to sample size determination
Basic concepts
Estimating sample size parameters
Response variables
◦ Continuous
◦ Categorical
◦ Time-to-event

Components of sample size estimation


Why is it a good idea to do a sample size
calculation?
Why shouldn’t you just pick a size that’s
convenient?


Because the sample might be too small to
help you answer your research question,
Or the sample might be much larger than you
need.


Primary objective of a clinical trial: to evaluate
the efficacy and safety of an intervention.
Efficacy evaluation
◦ Compare the average response in the intervention
and control groups in the study sample.
◦ Decide whether the difference between the groups
indicates a true difference between treatments.

Usually the efficacy evaluation is performed in
the context of a hypothesis test.

Problem: Determine whether or not the
population means of the intervention and
control groups truly differ with respect to the
outcome of interest.
◦ We regard the intervention and control samples as
being drawn from the target population.

Solution: Assume that the two groups do not
differ, and see if the sample data disagree
with this assumption. That is, perform a
hypothesis test.



The null hypothesis (H0) assumes that there is
no difference in outcome between the two
groups.
The alternative hypothesis (HA) assumes that
one group has a more favorable outcome
than the other.
The research hypothesis is usually the
alternative hypothesis.

To do a hypothesis test:
◦ Calculate a test statistic from the data.
◦ Determine whether the value of the test
statistic is likely or unlikely under the null
hypothesis.
◦ If the value is very unlikely, reject the null
hypothesis.

Problem: we might reject the null hypothesis
when it is true.
◦ That is, we might commit Type I error.

Solution: Construct the test so that there is
only a 5% chance of incorrectly rejecting the
null hypothesis.
◦ That is, the level of the test (alpha) is 0.05.

Hypothesis tests can be 1-sided or 2-sided
◦ 1-sided: tests for differences in one direction only
 e.g., higher response rate in the intervention group
than in the control group
◦ 2-sided: tests for differences in both directions
 e.g., either higher or lower response rate in the
intervention group than in the control group

Even if you are primarily interested in one
direction, it is customary to do a 2-sided test

The p-value is the probability under the null
hypothesis of obtaining data as extreme as
that of the sample.
◦ That is, the p-value is the strength of the evidence
against the null hypothesis.

For a level 0.05 test, we reject the null
hypothesis if the p-value is 0.05 or less.

Problem: we might fail to reject the null
hypothesis when the alternative is true.
◦ That is, we might commit Type II error.

Solution: Select a large enough sample so
that there is an 80% chance of rejecting the
null hypothesis if the alternative is true.
◦ Then the power to detect the alternative is 80%.






Specify null and alternative hypotheses, type I
error rate, and power.
Define the population under study.
Gather information relevant to parameters.
If measuring time to failure, model recruitment
process and choose length of follow-up period.
Calculate sample size over range of
parameters.
Select sample size to use.
Epidemiol Rev, 2002; 24(1):39-53

Parameters include
◦ Variability of the response
◦ Level of the response variable in the control group
◦ Difference anticipated or judged clinically relevant

May also need to consider
◦ Loss to follow-up
◦ Noncompliance

Sources of information
◦ Pilot studies: external or internal
◦ Literature: what others have found in similar studies

When a response variable is normally
distributed, the difference between the means
of two independent samples is assessed with
a 2-sample t-test.
◦ The t-test is robust to departures from normality.
◦ May need to transform the response variable (e.g.,
log transform) to obtain approximate normality.

The sample size for a z-test usually can be
used to estimate the sample size for a t-test.
◦ A z-test assumes that the sample standard
deviation is known.
𝑧𝑧 =
𝑥𝑥̅ − 𝑦𝑦�
𝜎𝜎 2/𝑛𝑛
 𝑥𝑥̅ = intervention group mean
 𝑦𝑦� = control group mean
 𝜎𝜎 2 = common variance in each group
 𝑛𝑛 = sample size in each group
Epidemiol Rev, 2002; 24(1):39-53 (eq. 1)
z
𝑛𝑛 =
2𝜎𝜎 2
𝑧𝑧1−𝛼𝛼/2 + 𝑧𝑧1−𝛽𝛽 /∆𝐴𝐴
2
 𝜎𝜎 2 = common variance in each group
 𝑧𝑧1−𝛼𝛼/2 = critical value for 2-sided level 𝛼𝛼 test
 𝑧𝑧1−𝛽𝛽 = value of a standard normal variable with
cumulative probability equal to 1 − 𝛽𝛽 (power)
 ∆𝐴𝐴 = difference corresponding to alternative
hypothesis
Epidemiol Rev, 2002; 24(1):39-53 (eq. 2)



Randomized, age-matched
Healthy post-menopausal Chinese women
within 10 years of menopause onset
Exclusion criteria
◦ Regular participation in exercise
◦ Hormone replacement therapy or drug treatment
affecting bone density
◦ Hypo- or hyper-parathyroidism, hypo- or hyperthyroidism, renal or liver disease
◦ History of fractures
◦ BMI over 30
Arch Phys Med Rehabil 2004; 85:717-22



Intervention: Supervised TCC exercise (Yang
style) 50 minutes a day, 5 times a week, for
12 months
Control: Retained sedentary lifestyle
Primary outcome: Change in bone mineral
density over 12 months
◦ Areal BMD at lumbar spine and proximal femur
measured by dual x-ray absorptiometry (DXA)
◦ Volumetric BMD in distal tibia measured by
multislice peripheral quantitative computed
tomography (pQCT)

Null hypothesis
◦ Rate of bone mineral loss is the same in both study
arms.

Alternative hypothesis
◦ Rate of bone mineral loss is different (i.e., lower) in
the intervention (TCC) group.

Level of the test: 0.05 (2-sided)
Power: 80%
Mean bone loss in control group: 2.8%

Mean bone loss in intervention group: 1.4%

Standard deviation in each group


◦ Average annual trabecular bone loss in previous study in
same population
◦ 50% reduction
◦ Based on previous study, ~same as mean
 3.0% in control group, 1.5% in intervention group (say)
◦ Compute pooled SD=2.37%

Dropout: 25% in one year
 𝜎𝜎 2 = common variance in each group = 2.372=5.62
 𝑧𝑧1−𝛼𝛼/2 = critical value for 2-sided level 𝛼𝛼 test = 1.96
 𝑧𝑧1−𝛽𝛽 = value of a standard normal variable with
cumulative probability equal to 1 − 𝛽𝛽 (power) = 0.842
 ∆𝐴𝐴 = difference corresponding to alternative
hypothesis = 1.4
2
𝑛𝑛 = 2𝜎𝜎 2 𝑧𝑧1−𝛼𝛼/2 + 𝑧𝑧1−𝛽𝛽 /∆𝐴𝐴
= 2(5.62) 1.96 + 0.842 /1.4 2
=45 per group
=0.75 (60 per group), accounting for dropouts
Actual enrollment n=132 total
https://stattools.crab.org/



When a response variable is categorical, a
chi-square test of independence is often used
to compare two groups.
When there are only 2 categories, this is the
same as testing for a difference in
proportions.
Need to specify the response proportion in
the control group and
◦ The response proportion in the intervention group,
or
◦ The odds ratio
𝑛𝑛 =




𝑧𝑧1−𝛼𝛼/2 2𝜋𝜋� 1 − 𝜋𝜋� + 𝑧𝑧1−𝛽𝛽 𝜋𝜋𝑐𝑐 1 − 𝜋𝜋𝑐𝑐 + 𝜋𝜋𝑡𝑡 1 − 𝜋𝜋𝑡𝑡
𝑛𝑛′ =
𝑛𝑛
4
𝜋𝜋𝑐𝑐 − 𝜋𝜋𝑡𝑡
1+ 1+
2
4
𝑛𝑛 𝜋𝜋𝑐𝑐 − 𝜋𝜋𝑡𝑡
2
2
𝜋𝜋𝑐𝑐 = probability of event in control group
𝜋𝜋𝑡𝑡 = probability of event in intervention group
𝜋𝜋� = average probability of event
𝑛𝑛′ = number needed in each group
Epidemiol Rev, 2002; 24(1):39-53 (eq. 7B, 7C)




Study aim: test an outreach and counseling
intervention to reduce cervical cancer
incidence & mortality in low income women
Setting: Highland General Hospital (HGH)
Time frame: 3 years
Outcome measure: proportion of women who
received initial follow-up at Highland within 6
months of an abnormal Pap test
Prev Med 2005; 41: 741-8

Null hypothesis
◦ Rate of follow-up of abnormal Pap tests is the same
in both study arms.

Alternative hypothesis
◦ Rate of follow-up of abnormal Pap tests is different
(i.e., greater) in the intervention group.




Assume 60% follow-up in control group
based on previous research
Assume 75% follow-up in intervention group,
a clinically important difference achieved in
similar interventions
To detect this difference at the 0.05 level (2sided) with 80% power: n=165 per arm
No loss to follow-up—outcome ascertained
through medical records
𝑛𝑛 =




1.96 2(0.675) 0.325 +0.842 0.6 0.4 +0.75 0.25
𝑛𝑛′
=
152
4
0.60−0.75
1+ 1+
2
4
152 0.60−0.75
2
2
=152
= 165
𝜋𝜋𝑐𝑐 = probability of event in control group = 0.60
𝜋𝜋𝑡𝑡 = probability of event in intervention group = 0.75
𝜋𝜋� = average probability of event = 0.675
𝑛𝑛′ = number needed in each group = 165
https://stattools.crab.org/






The log rank test is often used to compare
two survival curves.
Most sample size calculations assume an
exponential survival distribution.
𝑆𝑆 𝑡𝑡 = 𝑒𝑒 −λ𝑡𝑡 , where
𝑡𝑡 = time,
𝑆𝑆 𝑡𝑡 = probability of survival to time 𝑡𝑡, and
λ = hazard rate = risk of an event per time
unit




Hazard rate: number of events per 100
person years
Median survival time=𝑙𝑙𝑙𝑙𝑙𝑙𝑒𝑒 (2)/(hazard rate)
Hazard rate=𝑙𝑙𝑙𝑙𝑙𝑙𝑒𝑒 (2)/(median survival time)
Hazard rate=-𝑙𝑙𝑙𝑙𝑙𝑙𝑒𝑒 (𝑆𝑆 𝑡𝑡 )/t,
where 𝑆𝑆 𝑡𝑡 =probability of surviving to time t
=expected proportion without an event by t
(𝑧𝑧1−α/2 + 𝑧𝑧1−β )2 [ϕ λ𝐶𝐶 + ϕ λ𝐼𝐼 ]
𝑛𝑛 =
(λ𝐼𝐼 − λ𝐶𝐶 )2
where ϕ(λ)
=
λ2
1−[𝑒𝑒 −𝜆𝜆 𝑇𝑇−𝑇𝑇0 −𝑒𝑒 −λ𝑇𝑇 ]�λ𝑇𝑇0
𝑛𝑛 =number per group
λ𝐼𝐼 =hazard rate in intervention group
λ𝐶𝐶 =hazard rate in control group
𝑇𝑇 =total time of trial (first entry to end of study)
𝑇𝑇0 =recruitment time (first entry to last entry)
(𝑧𝑧1−α/2 + 𝑧𝑧1−β )2
𝐷𝐷 =
𝑝𝑝(1 − 𝑝𝑝)(ln(𝜆𝜆𝐶𝐶 /λ𝐼𝐼 ))2
where
𝐷𝐷 =number of events required to detect the hazard
ratio with power 1-β at level α (2-sided)
λ𝐼𝐼 =hazard rate in intervention group
λ𝐶𝐶 =hazard rate in control group
𝑝𝑝 =proportion of participants in the control group


Primary research goal: Determine whether performing
surgery of the primary tumor followed by systemic therapy
improves survival in a certain patient population,
compared with systemic therapy only.
Patient population: Patients with synchronous unresectable
metastases of colorectal cancer and few or absent
symptoms

Primary outcome: Overall survival

Study design: Multi-center randomized phase III trial.
BMC Cancer 2014; 14:741

Null hypothesis
◦ Overall survival is not affected by surgery of the
primary tumor before systemic therapy in this
patient population.

Alternative hypothesis
◦ Surgery of the primary tumor improves overall
survival in this patient population.




Level of the test: 0.05 (2-sided)
Power: 80%
Median survival in control group: 13 months
Median survival in intervention group: 19
months
◦ Minimal difference to justify a surgical procedure



Recruitment period: 30 months
Minimum follow-up: 8 months
Total sample size: 360
where ϕ(λ)
=
(𝑧𝑧1−α/2 + 𝑧𝑧1−β )2 [ϕ λ𝐶𝐶 + ϕ λ𝐼𝐼 ]
𝑛𝑛 =
(λ𝐼𝐼 − λ𝐶𝐶 )2
λ2
1−[𝑒𝑒 −𝜆𝜆 𝑇𝑇−𝑇𝑇0 −𝑒𝑒 −λ𝑇𝑇 ]�λ𝑇𝑇0
α=0.05; 𝑧𝑧1−α/2 =1.96; β=0.20; 𝑧𝑧1−β =0.842
λ𝐼𝐼 =hazard rate in intervention group = ln(2)/(19/12)=0.438
λ𝐶𝐶 =hazard rate in control group = ln(2)/(13/12)=0.640
hazard ratio = 19/13=1.46
𝑇𝑇 =total time of trial (first entry to end of study) =38/12=3.167
𝑇𝑇0 =recruitment time (first entry to last entry) = 2.5
𝝓𝝓 𝝀𝝀𝑪𝑪 =0.607; 𝝓𝝓 𝝀𝝀𝑰𝑰 =0.351; 𝟐𝟐𝒏𝒏 =368; required # of events = 218
https://stattools.crab.org/




𝛼𝛼(level): larger → smaller sample size
1-𝛽𝛽 (power): larger → larger sample size
Variance: larger → larger sample size
◦ Binary variable: 𝜋𝜋 (probability of event) = 0.5 has
largest variance
Difference to detect: larger → smaller sample
size


Problem: Sometimes the sample size
required is too large.
Solutions:
◦ Be content to detect with less power (allow more
type II error).
◦ Increase the level of the test (allow more type I
error).
◦ Pick a more extreme alternative.
% Response in
Intervention Group
Level
Power
60%
65%
5%
90%
538
239
5%
80%
407
182
10%
80%
325
146

Parameters used to estimate sample size are
estimates
◦ Often based on small studies




Effectiveness of the intervention
◦ May be based on a different population
◦ May be overestimated
Inclusion and exclusion criteria may change
Control group participants may do better
than expected
Mathematical models for sample size
calculations are approximate



www.statpages.org
www.swogstat.org/statoolsout.html
https://stattools.crab.org/
Download