Sample Size & Power An Introduction to Clinical Trials (7)

advertisement
Sample Size & Power
An Introduction to Clinical Trials (7)
October 4, 2007
Inkyung Jung, Ph.D.
Background
„
Quantitative properties of clinical trial designs
{
{
{
„
Correct precision or power
Best sample size
Optimal study duration
Concern over sample size & power is important,
especially for CTE and SA studies
Methods for Determining
Sample Size
„
Frequentist approaches:
{
{
„
„
Confidence interval
Hypothesis testing
Likelihood ratio based methods
Bayesian approaches
Quantitative Design Parameters
in Clinical Trials
„
„
„
„
„
„
„
„
„
„
„
„
„
„
Power: 1-β
β level: Type II error probability
α level: Type I error probability
Likelihood ratio: Relative strength of evidence
Sample size: Number of experimental subjects
Effect size: Treatment difference expressed as number of s.d.
Number of events: Number of experimental subjects with outcome
Study duration: Interval from beginning of trial to end of f/u
Percent censoring: Percentage of study participants left w/o an event by the
end of f/u
Allocation ratio: Ratio of sample sizes in the treatment groups
Accrual rate: New subjects entered per unit of time
Loss to follow-up: Rate at which study participants are lost before outcome
can be observed
Follow-up period: Interval from end of accrual to end of study
Δ: Smallest treatment effect of interest based on clinical considerations
Principles
„
Precision
{
{
„
„
Indirect specification through CI (absolute or
relative width of CI)
Power of a statistical hypothesis test
Power
Evidence
{
Likelihood ratio (LR): relative strength of
evidence
Sample Size & Power
„
The calculations are approximation.
{
{
„
Equations based on approximation
Predicting #subjects depends on guesses about
parameters
The relationship between power/precision
and sample size is quadratic.
{
SS increases as the square of s.d. of tx difference
and the normal quantiles for type I & II error rates
Translational Trials
„
„
„
Small, usually smaller than 20 subjects
The size can be motivated formally
Issue: estimate reliably the mean of
some measurement
Translational Trials
Measurements : samples from N ( μ , σ 2 )
X : UE of μ ,
(
)
X − μ : absolute error
Pr X − μ ≤ d ≥ 1 − α
⎛ X −μ
⎞
d
⎟
Pr X − μ ≤ d = Pr ⎜
≤
⎜σ / n σ / n ⎟
⎝
⎠
⎛ −d ⎞
⎛ d ⎞
= Φ⎜
⎟
⎟ − Φ⎜
⎝σ / n ⎠
⎝σ / n ⎠
(
)
⎛ d ⎞
= 2Φ ⎜
⎟ −1 ≥ 1 − α
⎝σ / n ⎠
⎛ d ⎞
Φ⎜
⎟ ≥ 1−α / 2
⎝σ / n ⎠
d
≥ Z1−α / 2 ; Φ (Z γ ) = γ
σ/ n
σ⎞
⎛
n ≥ ⎜ Z1−α / 2 ⎟
d⎠
⎝
2
Translational Trials
Bernoulli trials
p : success probability, r : # successes, n : # trials
r ~ B ( n, p )
⎞
⎛r
Pr ⎜⎜ − p ≤ d ⎟⎟ ≥ 1 − α
⎠
⎝n
Using the normal approximation with σ 2 = p (1 − p )
⎛
n ≥ ⎜ Z1−α / 2
⎜
⎝
p(1 − p ) ⎞
⎟
⎟
d
⎠
2
Safety and Activity Studies
„
„
Single cohort study: treatment effect is
compared to a standard
Estimate the unconditional probability of
benefit (or lack of benefit), and thus form
a basis for deciding whether or not to
investigate the treatment in a larger,
lengthier, more expensive trial with an
internal control (i.e. randomized trial)
Safety and Activity Studies
„
Goal of SA studies: to estimate a
clinical endpoint with a specified
precision
{
{
e.g. average blood or tissue levels of a
drug, the proportion of patients
responding, population failure rates
Dichotomous outcome: summarized as a
proportion
Dichotomous Outcome
100(1 - α/2)% CI for pˆ : pˆ ± Z1−α / 2
p (1 − p )
; Φ ( Z1−α / 2 ) = 1 − α / 2
n
p unknown : Substitute pˆ for p
Not accurate for extreme value of pˆ and small sample size
Example: Consider a trial in which patients with esophageal cancer
are treated with chemotherapy prior to surgical resection. A
complete response is defined as the absence of macroscopic tumor
at the time of surgery. We suspect that this might occur 35% of the
time and would like the 95% CI of our estimate to be ±15%.
0.15 = 1.96 × 0.35(1 − 0.35) / n ; n = 39
Exact Binomial Confidence Limits
„
Chance of r or fewer successes in n trials
(lower tail probability)
⎛n⎞ k
n−k
Pr[X ≤ r ] = ∑ ⎜⎜ ⎟⎟ p (1 − p )
k =0 ⎝ k ⎠
r
„
Lower & Upper bounds of p for 100(1-α/2)% CI
α
⎛n⎞ k
α n ⎛n⎞ k
n−k
n−k
= ∑ ⎜⎜ ⎟⎟ p (1 − p ) ⇒ pL ,
= ∑ ⎜⎜ ⎟⎟ p (1 − p ) ⇒ pU
2 k =0 ⎝ k ⎠
2 k =r ⎝ k ⎠
„
r
See Tables 11.2 and 11.3 (p.261 & p.262)
Bayesian Binomial CI
Using a uniform prior for p, the posterior distribution for p is
p
u (1 − u )
∫
F ( p) =
∫ u (1 − u)
0
1
0
r
r
n−r
n−r
du
du
=
(n + 1)! p r
n−r
u
u
du
(
1
−
)
∫
0
r!(n − r )!
n +1
⎛ n + 1⎞ k
(n + 1)! pL r
n +1− k
n−r
⎜
⎟
(
)
u
u
du
p
p
=
(
1
−
)
=
1
−
∑
L
⎜ k ⎟ L
2 r!(n − r )! ∫0
k = r +1 ⎝
⎠
α
r
⎛ n + 1⎞ k
(n + 1)! 1 r
n +1− k
n−r
⎜
⎟
(
)
u
u
du
=
(
1
−
)
=
p
1
−
p
∑
U
U
⎜
⎟
2 r!(n − r )! ∫pU
k =0 ⎝ k ⎠
α
• See Tables 11.4 & 11.5 (p.264 & 265)
• A Bayesian approach can use prior information
Likelihood-based Approach
e L ( p ) = p k (1 − p ) n − k
k
p1 (1 − p1 ) n − k
e = k
p0 (1 − p0 ) n − k
Λ
p1
1 − p1
Λ = k log + (n − k ) log
p0
1 − p0
n=
⎛
p0 log⎜⎜
⎝
Λ
Λ
=
⎛ p1 /(1 − p1 ) ⎞
p1 /(1 − p1 ) ⎞
1 − p1
p1
⎟⎟ + log
⎟⎟
p0 log − (1 − p1 ) log⎜⎜
p0 /(1 − p0 ) ⎠
1 − p0
p0
⎝ p0 /(1 − p0 ) ⎠
• See Table 11.6 (p.267)
CI for a Mean
„
„
When treatment effect of interest is the
mean of a distribution
Assume the sample mean has a
normal distribution
{
{
Need to specify both mean and s.d.
No bounded range for mean and s.d.
CI for a Mean
μˆ : estimated mean from n obs
100(1 - α / 2)% CI : μˆ ± Z1−α / 2σ / n
If our tolerance for the width of the CI is
w = Z1−α / 2σ / n
⎛ Z1−α / 2σ ⎞
⇒n=⎜
⎟
⎝ w ⎠
2
CI for a Mean
Precision can be expressed relative to μ or σ
(
If w = w' ' μ , n = (
If w = w'σ , n =
) =( )
) = ( ) ( ) ( μ ≠ 0)
Z1−α / 2σ 2
w 'σ
Z1−α / 2σ 2
w '' μ
Z1−α / 2 2
w'
Z1−α / 2 2 σ 2
w ''
μ
w' , w' ': desired tolerance expressed as a fraction of s.d., of mean
σ/μ : coefficient of variation
CI for Event Rates
„
„
Time-to-event measurements with
censoring (death, recurrence, or
overall failure rate)
Common in cancer trials
{
e.g. Some new drugs are developed.
These may not shrink tumors, but might
improve survival.
CI for Event Rates
„
Assume
{
{
{
{
accrual is constant at rate a per unit time
over some interval T
a period of continued f/u is used to
observe additional event
the failure rate is constant over time
(exponential event times)
there are no losses to f/u
CI for Event Rates
d
ˆ
Estimated failure rate : λ =
∑t
i
d : number of events or failures, the sum is over all f/u times ti
Approximate CI for λ :
Z1−α / 2
λ
ˆλ ± Z
ˆ
or log( λ) ±
1−α / 2
d
d
If w is the desired width of the CI expressed as a faction of λ,
Z1−α / 2
λ
⎛ Z1−α / 2 ⎞
= wλ ⇒ d = ⎜
⎟
d
⎝ w ⎠
2
Likelihood-based Approach
Assuming a normal model for log(λ ),
(
)
⎛ log(λˆ ) − log(λ ) 2 ⎞
⎟
e L ( λ ) = exp⎜ −
⎜
⎟
2/d
⎝
⎠
λ : true hazard, λˆ : observed hazard
(
Λ=
)
2
ˆ
log(λ ) − log(λ )
2/d
2Λ
2Λ
⇒d =
=
2
2
ˆ
(
)
log(
)
Δ
log(λ ) − log(λ )
(
)
Comparative Trials
„
„
„
„
„
An approach based on a planned
hypothesis test
Convenient and frequently used for
determining sample size
H0: equivalence between the treatments
Alternative value chosen to be the smallest
difference of clinical importance between tx
Size is planned to yield a high power (1-β)
at a pre-specified α-level
Type I and II error rates
„
„
Convention: two-sided α-level at 0.05
and 80 or 90% power
In practice: should be chosen to reflect
the consequences of making the
particular type of error
Comparison of Two Means
„
„
„
Treatment comparison: testing the diff.
of the estimated means of two groups
μ1 & μ2: true means of two groups
σ: s.d. of the measurement
Δ= μ1 - μ2
H0: Δ=0 vs. Ha: |Δ|>0
Reject the null if |Δ|>c=Z1-α/2 * σΔ
Comparison of Two Means
Want to have a power of 1 - β at Δ(> 0)
[
]
⎡ Δˆ − Δ c - Δ ⎤
ˆ
| Δ⎥
1 − β = Pr Δ > c | Δ = Pr ⎢
>
σ
σ
⎥⎦
⎢⎣ Δˆ
Δˆ
c-Δ
= Z β = − Z1− β ; Φ ( Z β ) = β
σ Δˆ
− Z1− β =
Z1-α/2σ Δˆ - Δ
σ Δˆ
Z1-α/2 + Z1− β =
Δ
σ Δˆ
= Z1-α/2 −
, σ Δˆ = σ
Δ
σ Δˆ
1 1
+
n1 n2
1 1
Δ2
+ =
n1 n2 (Z1-α/2 + Z1− β )2 σ 2
r + 1 (Z1-α/2 + Z1− β )
n1 = rn2 ⇒ n2 =
r
(Δ / σ )2
2
Likelihood-based Approach
e L ( X|μ )
1
=
2π σ
⎛ ( xi − μ )2 ⎞
⎟
exp⎜⎜ −
∏
2
⎟
2σ
i =1
⎠
⎝
n
exp(− ∑ (x − μ ) / 2σ )
⎛ 1
⎞
(
=
(x − μ ) − (x − μ ) )⎟
= exp⎜
∑
exp(− ∑ ( x − μ ) / 2σ )
⎝ 2σ
⎠
n
e
Λ
Λ=
1
n
i
1
i
n(μ a − μb )
σ
2
2
2
n
a
2
b
2
2
i =1
(x − μ ab ); μ ab = (μ a + μb ) / 2
Λσ 2
n=
(μ a − μb )(x − μ ab )
2
i
b
2
i
a
Dichotomous Responses
Treatment
„
„
Success
A
B
Yes
a
b
No
c
d
Comparing the proportion of success
or failures: a/(a+c) vs. b/(b+d)
Fisher’s exact test or chi-square test
with or without continuity correction
Dichotomous Responses
• χ 2 - test without continuity correction
n2 =
(Z
1−α / 2
( r + 1)π (1 − π ) + Z1− β rπ 1 (1 − π 1 ) + π 2 (1 − π 2 )
rΔ2
π = (π 1 + rπ 2 ) / (r + 1), Δ = π 1 − π 2
When r = 1, n2
(Z
=
1−α / 2 + Z1− β ) (π 1 (1 − π 1 ) + π 2 (1 − π 2 ) )
2
Δ2
(see Tables 11.12 & 11.13)
• χ 2 - test with continuity correction
2(r + 1) ⎞⎟
n ⎛
n2 * = 2 ⎜⎜1 + 1 +
4⎝
rn2 Δ ⎟⎠
2
);
2
Hazard Comparison
„
„
„
With event time endpoints, it is
common to compare the ratio of
hazards (vs. H0: hazard ratio=1)
Power depends on #events
(recurrence or death)
Difference between #patients placed
on study and #events required for the
trial to have the intended properties
Parametric Approach
(Exponential)
• If event times are exponentially distributed,
d
MLE of the hazard λ : λˆ =
; d is # uncensored obs, t i is f/u times
∑ti
2dλ / λˆ ~ χ 2 (2d )
2d1λ1 / λˆ1 d1 Δ
⇒
=
~ F (2d1 , 2d 2 )
ˆ
ˆ
2 d 2 λ2 / λ 2 d 2 Δ
Can be used to construct tests and CIs
100(1 - α )% CI : (d 2 / d1 )Δˆ F2 d1 , 2 d 2 ,1−α / 2 < Δ < (d 2 / d1 )Δˆ F2 d1 , 2 d 2 ,α / 2
Other Parametric Approaches
• Under the null,
log(Δ) is approximately normally distributed with mean 0 and variance 1
(
Z
⇒D=4
1−α / 2 + Z1− β )
2
(log(Δ))
2
; D : total number of events required
More general form
(log(Δ) )
1 1
+
=
d1 d 2 (Z1−α / 2 + Z1− β )2
2
r + 1 (Z1−α / 2 + Z1− β )
Using r = d 2 / d1 , d1 =
r
(log(Δ) )2
2
Nonparametric Approaches
No parametric assumptions about the distribution of event times
(
Z
D=
1−α / 2 + Z1− β ) (Δ + 1)
2
(Δ − 1)
2
2
; total # events needed on the study
Example: To detect a hazard ratio of 1.75 as being statistically significantly
different from 1.0 using a two-sided 0.05 α-level test with 90% power requires
(1.96 + 1.282)2 (1.75 + 1)2
(1.75 − 1)2
= 141 events
Suppose 30% of subjects will remain event free at the end of the trial
n=
141
= 202
1 − 0.3
Noninferiority
„
„
„
„
H0: “the treatments are different”
Ha: “the treatments are the same”
Naturally one-sided
Roles of α and β are reversed
The sample size depends strongly on the
quantitative definition of equivalence
Noninferiority
• For testing difference of two proportions using χ 2 - test without continuity correction
(
Z
n=
+ Z1− β ) (π 1 (1 − π 1 ) + π 2 (1 − π 2 ) )
2
1−α / 2
Δ2
• For testing equivalence of two proportions
(Z
n=
1−α / 2 + Z1− β ) (π 1 (1 − π 1 ) + π 2 (1 − π 2 ) )
2
(δ − (π 1 − π 2 ))2
if we declare two proportions equivalent when π 1 − π 2 ≤ δ
Other approaches: confidence intervals, likelihood methods
ES Trials
„
„
„
Large safety trials (postmarketing
surveillance)
Intended to uncover and accurately
estimate the frequency of uncommon
side effects
Size depends on how low the event
rate of interest is and how powerful the
study needs to be
Poisson Distribution
Assume : the study population is large, the prob. of an event is small
all subjects are followed for approximately the same length of time
The probability of observing exactly r events
( λ m) r e − λm
Pr[D = r ] =
; λ : event rate, m : cohort size
r!
The chance of observing r or fewer events
( λ m ) k e − λm
Pr[D ≤ r ] = ∑
k!
k =0
r
Example. The chance of seeing at least one event in the Poisson distribution
β = 1 − Pr[D = 0] = 1 − e − λm ; m = − log(1 − β ) / λ
If λ=0.001 and β=0.99, then m=4605
Likelihood Approach
Relative evidence for an observed Poisson event rate λ vs. a hypothetical rate μ
r
⎛λ⎞
e Λ = ⎜⎜ ⎟⎟ e − m ( λ − μ ) , where r events are observed with cohort size m
⎝μ⎠
⎛λ⎞
Λ = r log⎜⎜ ⎟⎟ − m(λ − μ )
⎝μ⎠
⎛λ⎞
r log⎜⎜ ⎟⎟ − Λ
⎝μ⎠
⇒m=
λ−μ
λ = r/m⇒ m =
Λ
λ log(λ / μ ) + (μ − λ )
Other Considerations
„
„
„
„
Cluster randomization requires increased
sample size
It is possible to perform a simple cost
optimization using unbalanced allocation
and information about the relative cost of
two treatment groups
Increase the sample size for nonadherence
Simulation is a powerful and flexible design
alternative
Computer Programs
„
„
„
„
Power and Sample Size (PASS) by
NCSS software
nQuerry Advisor by Statistical Solutions
SAS v9.1: power procedure
And others…
Power Curves
Two sample t-test of group mean difference at α-level=0.05
Summary
„
„
„
„
Motivated by precision, power, or relative
evidence
Size quantifications are useful and
necessary for designing trials
Hypothesis-testing framework is often
adopted for sample size considerations
Important design parameters: α-level,
#events, accrual rate and duration, losses to
f/u, allocation ratio, total study duration, the
smallest tx difference of clinical importance
Summary
„
Phase I trial
{
{
The sample size needed for the trial is
usually an outcome of the study
The exact sample size cannot be usually
specified in advance
Summary
„
Developmental studies (SA trials)
{
{
{
{
Look for evidence of treatment efficacy
A fixed sample size
The sample size can be determined as a
consequence of the precision required to
estimate the response, success or failure rate
When faced with evidence of low efficacy,
investigators wish to stop a SA trials as early as
possible
Summary
„
CTE trials
{
{
{
Sample size and power depend on the
particular test statistic used to compare
the treatment groups
Sample size increases as the type I & II
error rates decrease
Sample size decreases as the treatment
difference increases or as the variance of
the treatment difference decrease
Summary
„
CTE trials
{
{
For event-time studies, the currency of
design is #events required to detect a
particular hazard ratio
Nonadherence with assigned treatment
may increase the required sample size
dramatically
Summary
„
Noninferiority designs
{
{
Often require very large sample sizes
Requires relatively narrow confidence
interval (high precision), increasing the
sample size compared to superiority
designs
Summary
„
„
„
Statistical simulation may be a useful way
Specialized, flexible computer programs are
necessary for performing the required
calculation efficiently
Depending on the shape of power curve,
small changes in parameters can have large
or small effects on power against a fixed
alternative
Download