Session Slides

advertisement
Biostatistics Case Studies 2015
Session 2:
Sample Size & Power for Inequality and
Equivalence Studies II
Youngju Pak, PhD.
Biostatistician
ypak@labiomed.org
What we have leaned in session 1?
Info Needed for Study Size: Comparing Means (Inequality test)
N=
2SD2
(1.96 + 0.842)2
Δ2
1. Effect (clinically meaningful difference)
2. Subject variability
3. Type I error (1.96 for α=0.05; 2.58 for α=0.01)
4. Power (0.842 for 80% power; 1.645 for 95% power)
 Free sample size calculations:
www.stat.uiowa.edu/~rlenth/Power
Case Study
Ophthalmology 2006; 113:70-76.
Abstract
Primary Outcome and Study Size
Primary Outcome - Page 72 middle of column 1:
Study Size - Page 72 bottom of column 1:
Needs
Consensus
PI’s Gamble
Testing inequality vs. equivalence.
•Hypotheses for testing inequality:
Ha: | mean(treatment ) - mean (control ) | ≠ 0
H0: | mean(treatment ) - mean (control ) | = 0
• Hypotheses for testing inequality:
• Ha : δ1< mean(trt1) – mean (trt2) < δ2
• H0 : mean(trt1) – mean (trt2) ≤ δ 1 Non-Inferiority
or mean(trt1) – mean (trt2) ≥ δ2
Graphical presentation of equivalence test
• With our regular t-tests, to
conclude there is a
substantial difference you
must observe a difference
large enough to conclude
it is not due to sampling
error
• To conclude there is not a
substantial difference you
must observe a difference
small enough to reject that
closeness is not due to
sampling error from
distributions centered on
large effects
Non-Inferiority Study
•
Usually a new treatment or regimen is compared with an
accepted treatment or regimen or standard of care.
•
The new treatment is assumed inferior to the standard
and the study is designed to show overwhelming
evidence that it is at least nearly as good, i.e., noninferior. It usually has other advantages, e.g., oral vs. inj.
•
A negative inferiority study fails to detect inferiority, but
does not necessarily give evidence for non-inferiority.
•
The accepted treatment is usually known to be efficacious
already, but an added placebo group may also be used.
How to determine Sample Size?
• For IOP study, we have
– Ha: mean IOP change uf – mean IOP change f < 1.5
– H0: mean IOP change uf – mean IOP change f ≥ 1.5
thus, we are only interested in the upper limit of
the difference  Non-inferiority  one-sided
T-test
• Thus we reject the H0 if Signal/ Noise < some
clinical value.
• But N for a non-inferiority test require more
complicated parameters such as the noncentrality parameter of the t-distribution (a Two
One Sided T-test is usually used for the
equivalence test ).
Let’s run a software
from
www.stat.uiowa.edu/~rlenth/Power
• Information you will need
– Equivalence Margin
• Non-Inferiority Margin(NIM) =1.5 for the IOP
study
– Assumed mean difference in change of IOP
between two groups -> usually zero difference
assumed but it is assumed 0.5 for the IOP study
– SD of changes of IOP = 3.5
– α (usually set to 2.5%) since the confidence level of
the confidence interval is (100-2 x α) %
Sample size for IOP study
Three dimensional power curve
for
a non-inferiority test
How do we determine if the fixed method is
non-inferior to the unfixed method?
Primary Outcome: IOP reduction
D= Duf – Df , where Df = mean IOP reduction with fixed therapy
Regardless of study aim – to prove treatments equivalent or to
prove them different - inference can be based on:
= 95% CI for D(= Duf – Df ) = “true (population) values for D”
Typical superiority/inferiority study:
Compare
to 0.
Non-inferiority study:
Compare
to δ2, a pre-specified margin of
equivalence (1.5 here).
Typical Analysis: Inferiority or Superiority
[Not used in this paper]
H0: Duf – Df = 0
α = 0.05 & N=2•194
H1: Duf – Df ≠ 0
Power = 80%
Aim: H1 → therapies differ
when Δ=1, SD=3.5
= 95% CI for D = “true (population) values for D”
Du – Df
Du – Df
Du – Df
0
0
0
Fixed is inferior
Fixed is superior
No difference detected
Typical Analysis: Inferiority Only
[Not used in this paper]
H0: Du – Df ≤ 0
α = 0.025 & N=2•194
H1: Du – Df > 0
Power = 80% for
Aim: H1 → fixed is inferior
when Δ=1, SD=3.5
= 95% CI for Du – Df= “true (population) values for D”
Duf – Df
0
Duf – Df
0
Duf – Df
0
Fixed is inferior
Inferiority not detected
( α = 0.05 → N=2•153 )
Non-Inferiority
[As in this paper]
H0: Du – Df ≥ 1.5
α = 0.025 & N=2•194
H1: Du – Df < 1.5
Power = 80% for
Aim: H1 → fixed is non-inferior
When Δ= 0.5, NIM=1.5
= 95% CI for Du – Df= “true (population) values for D”
Duf – Df
0
Fixed is non-inferior
1.5
Fixed is inferior
Duf – Df
Duf – Df
0
0
1.5
1.5
Non-Inferiority not detected
Inferiority and Non-Inferiority
= 95% CI for Du – Df = “true (population) values for D”
0
1.5
Fixed is non-inferior
Fixed is inferior
0
0
Duf – Df
0
1.5
1.5
1.5
Neither is detected
Fixed is “non-clinically”
inferior
Observed Results: D^uf = 9.0 D^f = 8.7 D^ = 0.3 95% CI = -0.1 to 0.7
0
1.5
Fixed is non-inferior
Conclusions: General
•
“Negligibly inferior” would be a better term than noninferior.
•
All inference can be based on confidence intervals.
•
Pre-specify the comparisons to be made. Cannot test
for both non-inferiority and superiority.
•
Power for only one or for multiple comparisons, e.g.,
non-inferiority and inferiority. Power can be different
for different comparisons.
•
Very careful consideration must be given to choice of
margin of equivalence (1.5 here). You can be risky and
gamble on what expected differences will be (0.5 here),
but the study is worthless if others in the field would
find your margin too large.
FDA Guidelines :
•
http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformati
on/guidances/ucm202140.pdf
Where,
M1= Full effect of the active control
compare with the test drug
M2= NI Margin
Self-Quiz
1. Give an example in your specialty area for a superiority
/inferiority study. Now modify it to an equivalence study.
Now modify it to a non-inferiority study.
2. T or F: The main point about non-inferiority studies is that
we are asking whether a treatment is as good or better
vs. worse than another treatment, so it uses a one-sided
test.
3. Power for a typical superiority test is the likelihood that
you will declare treatment differences (p<0.05) if
treatments really differ by some magnitude Δ. Explain
what power means for a non-inferiority study.
4. T or F: Last-value-carried-forward is a good way to
handle drop-outs in a non-inferiority study. Explain.
continued
Self-Quiz
5. T or F: In a non-inferiority study, you should first test for noninferiority with a confidence interval, and then use a t-test to
test for superiority, but only if non-inferiority was established
at the first step.
6. What is the meaning of the equivalence margin, and how do
you determine it?
continued
Self-Quiz
7. Suppose the primary outcome for a study is a serum
inflammatory marker. If it’s assay is poor (low
reproducibility), then it is more difficult to find treatment
differences in a typical superiority/inferiority study than for a
better assay, due to this noise. Would it be easier or more
difficult to find non-inferiority with this assay, compared to a
better assay?
8. Does the assumed treatment difference (0.5 here) for power
calculations have the same meaning as the difference used
for power calculations in a typical superiority/inferiority
study?
Self-Quiz
1. Give an example in your specialty area for a superiority
/inferiority study. Now modify it to an equivalence study.
Now modify it to a non-inferiority study.
Self-Quiz
1. Answer
Vaccine Testing:
Superiority: New candidate vaccine vs. placebo
Equivalence: Antigen potency between two manufacturing
plants or lots.
Non-Inferiority: New candidate vaccine vs. old one.
Self-Quiz
2. T or F: The main point about non-inferiority studies is that
we are asking whether a treatment is as good or better
vs. worse than another treatment, so it uses a one-sided
test.
Self-Quiz
2. Answer
False. That is a feature of these studies, but not their
distinguishing feature. They and equivalence studies are
used to try to prove sameness, as opposed to typical
studies that try to prove differences.
Self-Quiz
3. Power for a typical superiority test is the likelihood that
you will declare treatment differences (p<0.05) if
treatments really differ by some magnitude Δ. Explain
what power means for a non-inferiority study.
Self-Quiz
3. Answer
Power for a non-inferiority study is the likelihood that you will
declare one treatment (A) to be no worse than a prespecified magnitude δ from the other treatment(B): accept
Ha if treatments really differ by some Δ. Of course, Δ is
less than δ , and is often 0.
Self-Quiz
4. T or F: Last-value-carried-forward is a good way to
handle drop-outs in a non-inferiority study. Explain.
Self-Quiz
4. Answer
False. LVCF biases results toward less of a difference if the
projected difference is to be increased over time. This
makes typical superiority studies conservative, but
increases the chance of a falsely “proving” the aim.
True if the projected difference is to be decreased over time.
Self-Quiz
5. T or F: In a non-inferiority study, you should first test for
non-inferiority with a confidence interval, and then use a
t-test to test for superiority, but only if non-inferiority was
established at the first step.
Self-Quiz
5. Answer
False. You must specify a-priori superiority, in order to have
a legitimate claim of proving it (beyond a reasonable
(5%) doubt). The stated sequential strategy will only
allow you to claim an observed result, without a
statement about it’s certainty.
Self-Quiz
6. What is the meaning of the equivalence margin, and
how do you determine it?
Self-Quiz
6. Answer
The equivalence margin is the maximum difference
between treatments that is considered to be negligible or
unimportant.
It must be pre-specified in order to prove equivalence or
non-inferiority to that degree, rather than just noting it as
an observation. Thus, it is ideally determined by peeragreement or FDA concurrence prior to starting the
study.
Self-Quiz
7. Suppose the primary outcome for a study is a serum
inflammatory marker. If it’s assay is poor (low
reproducibility), then it is more difficult to find treatment
differences in a typical superiority/inferiority study than
for a better assay, due to this noise. Would it be easier
or more difficult to find non-inferiority with this assay,
compared to a better assay?
Self-Quiz
7. Answer
It would still be more difficult to show the aim, noninferiority here, since CIs will be wider, but there will be
no bias due to it toward either treatment.
Generally, a poorer study conduct is penalized in
superiority studies and rewarded in non-inferiority
studies, but that is not true for this type of poorer
measurement error.
Self-Quiz
10. Does the assumed treatment difference (0.5 for the
IOP study) for power calculations have the same
meaning as the difference to be detected for a power
calculation in a typical superiority/inferiority study?
Self-Quiz
10. Answer
No. Here, it is our best estimate of true treatment
differences.
For superiority studies, the difference is ideally the
minimal difference that is “clinically relevant”, not the
expected difference, closer in meaning to the
equivalence margin here. In practice, it is the smallest
difference that logistics, money, time, and effort will
allow us to detect with specified certainty.
Download