On the Determination of Overall p-Values in Adaptive

advertisement
Testing and Estimation Procedures in
Multi-Armed Designs with Treatment
Selection
Gernot Wassmer, PhD
Institut für Medizinische Statistik, Informatik und Epidemiologie
Universität zu Köln
ADDPLAN GmbH
Adaptive Design KOL Lecture Series, August 14th, 2009
Introduction
 Confirmatory adaptive designs are a generalization of group sequential
designs, where - in interim analyses - confirmatory analysis is performed
under control of the Type I error rate and data dependent changes of design
are allowed.
 Three particular applications
– Sample size reassessment
– Treatment arm selection
– Subset selection (“enrichment designs”)
 This talk shows
– how to reach a test decision in an adaptive multi-armed trial with
treatment selection at interim
– how to calculate confidence intervals and overall p-values
Confirmatory adaptive designs
can be based on
– the combination testing principle
– the conditional error approach
Combination testing principle
Combination of p-values with a specific combination function
(Bauer, 1989; Bauer & Köhne, 1994)
Inverse normal method: The test decision is based on
Zk*

w1 1(1  p1)    w k  1(1  pk )
w12    w k2
where the weights wk are prefixed
Lehmacher & Wassmer, 1999
The conditional error approach
Plan a trial with reasonable (optimum) design, including sample size
calculation and timing of interim analyses.
Calculate the conditional Type I error rate a(x1,…,xk) at any time during
the course of the trial
a(x1,…,xk) = conditional probability, under H0, of rejecting
H0 in one of the subsequent stages, given x1,…,xk
x1,…,xk: data up to stage k
Remainder of the trial can be defined as a test at level a(x1,…,xk)
where the design of this test is arbitrary.
Müller & Schäfer (2001): “CRP principle”
Brannath, Posch & Bauer (2002): “Recursice testing principle”
The situation

Consider many-to-one comparisons, e.g., G treatment arms and
one control, normal case.

Throughout this talk, we consider one-sided testing.

In an interim stage a treatment arm is selected based on data
observed so far.

Not only selection procedures, but also other adaptive strategies
(e.g., sample size reassessment) can be performed.

Application within “Adaptive seamless designs” using the
combination testing principle
Sources for alpha inflation

Interim analyses

Sample size reassessment

Multiple arms
The proposed adaptive procedure fulfils the regulatory requirements for the
analysis of adaptive trials in that it strongly controls the prespecified Type I
error rate.
This procedure will be based on the application of the closed test procedure
together with combination tests (e.g., Bauer & Kieser, 1999; Hellmich, 2001;
Posch et al., 2005, Bretz et al., 2009).
Other approaches: Thall et al., 1988; Follmann et al, 1994; Stallard and
Todd, 2003; Stallard and Friede, 2008;
Closed testing procedure
Stage II
Stage I
H01  H02  H03
H01  H02
H01  H03
H 01
H 02
?
H 02  H 03
H 03
?
H 0S
Simple “trick”: Test of intersection hypotheses are formally performed as
S
tests for H 0 .
…
Closed testing procedure
At the first interim analysis, consider a test statistic for H01  H02  H03,
e.g., the test statistic
Z1  max (Z11, Z12, Z13 ),
where Z1i  denotes the first stage t test statistic for H0i , i  1,2,3.
That is, compute Dunnett’s adjusted p-value for each intersection
hypothesis, critical values are according to

i x  ca ,G
D (ca ,G )   G

(
)  ( x )dx  1  a ,
i 1
2

1  i
where i 
ni
and  and  denote the standard normal cdf
n0  ni
and its density, respective ly.
Or compute the p-value using Dunnett’s t distribution.
Let  ( p, q ) 
w1 1(1  p )  w 2 1(1  q )
w12  w 22
Test decision for the second stage:
H 0S is rejected if
min  ( pJ , qS )  u2 ,
J S
i
where pJ is the p-value of the Dunnett test for testing  H0 ,
iJ
qS is the second stage test statistic for the selected treatment arm,
and u2 is the critical value for the second stage.
This is the use of the inverse normal method for the Dunnett
test situation.
Simple shortcut:
If the treatment arm with the largest test
statistic is selected, it suffices to combine
Example S = 3 the test for H0: 0  1  2  3 with the
test for H0: 0  3
Stage I
Stage II
0  1  2  3
0  1  2
0  1
0  1  3
0   2
0  2  3
0  3
0  3
H03 can be rejected if all combination tests exceed the critical value u2 .
…
Properties of the Procedure
 Choice of tests for intersection hypotheses is free, i.e., you might select,
e.g., Dunnett‘s test, Bonferroni-, Simes or Sidak‘s test.
 The procedure may become inconsonant and, hence, conservative.
I.e., you can reject the global hypothesis, but no single hypothesis
(Friede and Stallard, 2008).
 A hypothesis can be rejected at a later stage even it was not selected for
the current stage (and not rejected before). This can happen if, e.g., the
test statistic for the global hypothesis exceeds u2 in the second stage but
not u1 in the first stage, and the test statistic for the de-selected
hypothesis exceeds u1 in the first stage.
12
An alternative procedure
(König et al., 2008)
Compute conditional error at first stage:

CD (ca ,G , z1 )   Gi1 (

i 1  t1 x  t1 z1i   ca ,G
(1  t1)(1  2i )
)  ( x )dx,
where t1 denotes the informatio n at the interim stage, and i 
ni
.
n0  ni
In the second stage, perform a
 Conditional second-stage Dunnett test
 Separate second-stage Dunnett test
at conditional level CD (ca ,G , z1 ).
This is the application of the CRP principle (Müller & Schäfer, 2001).
It assumes the variance to be known
A comparison shows that
 the conditional second-stage Dunnett test performs best but is hardly
better if a treatment arm selection was performed (cf., Friede and
Stallard, 2008)
 it is identical with the conventional Dunnett test if no adaptations were
performed
 becomes complicated if, e.g.,
– allocation is not constant
– variance is unknown
 the inverse normal technique is not optimum but enables early stopping
and more general adaptations
 is straightforward if, e.g.,
– allocation is not constant
– variance is unknown
A comparison shows that
 the conditional second-stage Dunnett test performs best but is hardly
better if a treatment arm selection was performed (cf., Friede and
Stallard, 2008)
 it is identical with the conventional Dunnett test if no adaptations were
performed
 becomes complicated if, e.g.,
– allocation is not constant
– variance is unknown
 the inverse normal technique is not optimum but enables early stopping
and more general adaptations
 is straightforward if, e.g.,
– allocation is not constant
– variance is unknown
Overall p-values
 Defined as smallest p-value for which the test results yield rejection of the
considered (single) hypothesis
 Repeated overall p-value can be calculated at any stage of the trial.
 That is,
pkg  a

H0g can be rejected at stage k
 p-values account for the step-down nature of the closed testing principle
and are completely consistent with the test decision.
Overall confidence intervals
 Confidence intervals based on stepwise testing are difficult to construct.
This is a specific feature of multiple testing procedures and not of
adaptive testing.
 Posch et al. (2005) proposed to construct confidence intervals based on
the single step adjusted overall p-values. These can also be applied for
the conditional Dunnett test.
 The RCIs are not, in general, consistent with the test decision. It might
happen that, e.g., a hypothesis is rejected but the lower bound of the CI is
smaller 0.
 They can be provided for each step of the trial.
 In general, they may fail to become narrower for increasing sample size
(e.g., if Bonferroni or Simes intersection tests are used).
Illustration
 Two-stage design with G treatment arms
 Selection of treatment arm with highest respone, no efficacy stop at interim
 Bonferroni (or Simes) correction is used for first stage
 Lower bound lbj of 95% confidence intervals for effect dj = j - 0 of selected
treatment arm at second stage is calculated through
lbj  max{d j :  1(1  min{1, G  p1j (d j )})   1(1  p2j (d j ))  2  1.96, where
p ij (d j )
 1  (
x ij  x0i  d j
n0i n ij

n0i  n ij
),
i  1, 2
It is easy to see that
lb j 
x 1j 
ub j 
x01
x 1j 
1
1
1 n0  n j
  (1  )
, and, analogeously,
G
n01n1j
x01
1
1
1
1 n0  n j
  (1  )
G
n01n1j
1
21
Summary
 The adaptive procedures fulfil the regulatory requirements for the analysis
of adaptive trials in that they control the prespecified Type I error rate. For
regulatory purposes, the class of envisaged decisions after stage 1 should
be stated in the protocol.
 The “rules” for adaptation and stopping for futility
– not need to be pre-specified
– Adaptations may depend on all interim data including secondary and
safety endpoints.
– can make use of Bayesian principles integrating all information
available, also external to the study
– should be evaluated (e.g. via simulations) and preferred version
recommended, e.g., in DMC charter
 Software ADDPLAN MC is available for designing and analyzing these trials
22
References
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Bauer, P. (1989). Multistage testing with adaptive designs (with Discussion). Biometrie und Informatik in Medizin und
Biologie 20, 130–148.
Bauer, P., Köhne, K. (1994). Evaluation of experiments with adaptive interim analyses. Biometrics 50, 1029–1041.
Bauer, P., Kieser, M. (1999). Combining different phases in the development of medical treatments within a single trial.
Statistics in Medicine 18,1833–1848.
Brannath, W., Posch, M., Bauer, P., 2002: Recursive combination tests. J. Amer. Stat. Ass. 97, 236–244.
Follmann, D. A., Proschan, M. A., Geller, N. L., 1994: Monitoring pairwise comparisons in multi-armed clinical trials.
Biometrics 50, 325–336.
Friede, T., Stallard, N., 2008: A comparison of methods for adaptive treatment selection. Biometrical J. 50, 767–781.
Hellmich, M., 2001: Monitoring clinical trials with multiple arms. Biometrics 57, 892–898.
König, F., Brannath, W., Bretz, F., Posch, M. (2008). Adaptive Dunnett tests for treatment selection. Statistics in Medicine
27, 1612–1625.
Lehmacher, W., Wassmer, G. (1999). Adaptive sample size calculations in group sequential trials. Biometrics 55, 1286–
1290.
Müller, H.H., Schäfer, H. (2001). Adaptive group sequential designs for clinical trials, combining the advantages of adaptive
and of classical group sequential approaches. Biometrics 57,886–891.
Posch, M., König, F., Branson, M., Brannath, W., Dunger-Baldauf, C., Bauer, P. (2005). Testing and estimation in flexible
group sequential designs with adaptive treatment selection. Statistics in Medicine 24, 3697–3714.
Posch, M., Wassmer, G., Brannath, W. (2008). A note on repeated p-values for group sequential designs. Biometrika 95,
253-256.
Stallard, N., Friede, T. (2008). A group-sequential design for clinical trials with treatment selection. Statistics in Medicine
27, 6209–6227.
Stallard, N., Todd, S. (2003). Sequential designs for phase III clinical trials incorporating treatment selection. Statistics in
Medicine 22, 689-703.
Thall, P.F., Simon, R., Ellenberg, S.S. (1988). Two-stage selection and testing designs for comparative clinical trials.
Biometrika 75, 303-310.
23
Download