Read a summary of the afternoon session (DOC)

advertisement
(Summaries written by Laura Collett, Leeds Institute of Clinical Trials Research, University of Leeds)
Trials with partial nesting, presentation by Chris Roberts
Some interventions require comparisons between two interventions or treatments
given in different ways, where one arm includes clustering, and the other does not;
for example, group therapy versus a drug. This leads the clustering structure to be
incomplete, which has implications for the analysis.
Problems highlighted in the analysis of trials with partial nested designs included
between treatment heteroscedasticity of continuous outcome measures, and
consistency and small sample bias with binary outcome measures.
Parameterisation of linear mixed effect models, including random intercept, random
coefficient and heteroscedastic models, and their not nested and nested total
variance were summarised including the estimate of the ICC.
Two trials using partially nested designs were discussed including estimates of effect
size, standard error and ICC for different models.
Simulation study comparing random effects models, GEEs and Sattherthwaite’s ttest was carried out with the cluster arm as 8 groups of size 6, and the individual arm
as 48 individuals. The ratio of individual treatment variance to group variance was
set between 0.5 and 2.5, where 7300 simulations were used to estimate test size
with precision of 0.005.
Graphical summaries of the difference in ICC as the variance ratio
(individual/nested) changed between the random intercept, random coefficient,
heteroscadastic and GEE models were shown. The same models, with the addition
of the Sattherthwaite’s t-test, were compared for empirical size of a 5% level test as
the variance ratio increased, showing that GEE models had an empirical level at
around 0.075 and random intercept model level decreasing as the variance ratio
increased.
The conclusions from partial nesting and heteroscedasticity in continuous outcomes
is that where total variance of the un-clustered arm is greater the ICC will be
increased in random intercept models and decreased in random coefficient and GEE
models, an issue that is analogous to the choice between a t-test and the
Sattherthwaite t-test when comparing two means with unequal variances, and where
there is reason to believe the variances are unequal, the Sattherthwaite t-test should
be used.
For partially nested binary data, parameterisation of logistic models, including logistic
GEEs with robust standard errors, logistic random intercept and logistic random
coefficient were summarised including adjusted test for proportions including the
design effect. The estimates of the marginal proportions were summarised for each
model in the clustered and non-clustered arms. Graphical representation of the log
OR when the proportions for the clustered and non-clustered arms were equal and
as that proportion increased were shown for different values of the ICC based on
different cluster sizes and number of simulations. Graphical representation of the
small sample bias for the null treatment effect on the log odds as the proportion
increased for different simulation sample sizes were discussed. Models based on
simulations comparing 10 clusters of size 10 with 100 individuals, were also
compared for their empirical test size and it showed that most models shown were
biased for smaller and larger proportions for the lower and upper 2.5% tail
respectively, where the LRT performed the best out the models fitted.
Stata commands such as ‘sampsi’ and ‘clsampsi’ for estimating power and sample
size for clustered trial designs were described and a comparison of ‘clsampsi’ with
empirical power between different models with different methods of calculation were
made in terms of bias.
The use of unequal randomisation in favour of the clustered arm to reduce bias was
discussed and a ratio to incur maximum power was put forward, including graphical
representations of empirical test sizes between methods with different ratios of
unequal randomisation to describe their performance in terms of bias. Note that
unequal randomisation can have a detrimental effect on power in individually
randomised trial so should be avoided.
Design and implementation considerations for a Cluster Randomised Trial in
general practice using balanced incomplete blocks, presentation by Amanda
Farrin
Once research has been conducted, implementation into general practice is
required. Research can sometimes produce a lot of waste, be it waste from
addressing and incorporating inappropriate questions and designs, lack of
accessibility of results, or inherent bias gained during interpretation.
The aim of some types of research is to minimise the gap between research and
practice, as dissemination of quality research is essential but not sufficient to ensure
correct and applicable implementation of interventions. This is especially evident in
primary care settings, as it is unfeasible to invent implementation strategies for every
new guideline, so adaptable and hence generalisable strategies are needed.
ASPIRE (Action to Support Practices Implementing Research Evidence) aims to
develop and evaluate an adaptable intervention package to target implementation of
‘high impact’ clinical practice recommendations in general practice; and to focus on
recommendations where a measurable change in clinical practice can lead to
significant patient benefit.
ASPIRE is a cluster randomised controlled trial set in general practices in West
Yorkshire to evaluate an adaptable intervention package for two high impact
recommendations. The trial will use GP routine data in order to identify whether and
how interventions improve patient care, outcomes and cost-effectiveness.
The trial is a randomised controlled trial, as this is the optimum design when
evaluating behaviour change. It also uses a cluster design, as the inclusion of patient
and clinician levels are necessary in implementing clinical practice strategies, and
thus to avoid contamination from individual clinician behaviour; randomisation is at
the level of the clinician and outcome is at the level of the patient.
The trial also incorporates a balanced 2x2 incomplete block design where half of the
60 practices are randomised to an intervention of recommendation 1 and a control of
recommendation 2, and the other half are randomised to a control of
recommendation 1 and an intervention of recommendation 2. This design has been
deemed appropriate for the setting to simultaneously minimise the Hawthorne effect
(the non-specific effects arising through trial participation) and maximise power and
efficiency.
Considering the use of two recommendations, sample size is based on two
embedded RCTs, requiring 12,000 records from 60 practices, with a minimum 100
recorded per practice, per recommendation. This provides 90% power to detect 15%
difference in adherence rates using 2.5% significance level (to adjust for the use of
two outcome comparisons), assuming control adherence at 55% and an ICC of 0.1
(thought to be relatively conservative). A median effect size of 9% has been
observed in other guideline implementation studies when implementing
recommendations to practices with greater scope for improvement.
Design challenges that were highlighted were the assumptions used in calculating
sample size, and the choice of recommendations to evaluate. Issues such as cluster
size, sample size adjustment for unequal clusters, coefficient of variation, and the
design effect, were raised. In addition, assumptions based on increase in adherence
were highlighted due to high levels of adherence and variation in adherence
apparent for some recommendations. In terms of choice of recommendations it was
discussed how to identify ‘high impact’ recommendations based on various criteria.
In addition, for the use of 2x2 incomplete block design using 2 recommendations,
independence of recommendations must be assumed, as some recommendations
intrinsically overlap.
Another discussion point highlighted the way interventions were implemented and
whether opt-in or opt-out consent processes were preferable given the mechanism of
the recommendation; opt-out deemed to be the more pragmatic choice following
group discussion.
Conclusions were based around the need to keep it simple; minimise bias and
maximise generalisability, and to test assumptions as thoroughly as possible using
real data. Incomplete block designs can overcome some challenges in implementing
research in general practice.
Statistical considerations for the design and analysis of non-inferiority trials,
presentation by David Gillespie
Standard superiority trials are set up in order to detect whether or not a treatment is
different from a control in terms of a pre-specified endpoint, but if there is insufficient
evidence to conclude a difference, this does not imply that the treatment and the
control are the same, as you cannot ‘accept’ the null hypothesis of no difference.
Despite this, you may wish to obtain evidence that a treatment is ‘no worse’ than the
control, in terms of one pre-specified endpoint, given favourable evidence that other
factors are better for that treatment, such as toxicity or cost. Given these points, you
may wish to conduct a non-inferiority trial, in order to test whether a new treatment is
‘no worse’ according to a pre-specified margin.
The choice of non-inferiority margin to use is usually a combination of statistical and
clinical input, given considerations such as the trade-off between the margin of
acceptability in efficacy and other benefits such as fewer side-effects, and given that
the margin should always be smaller than the difference observed when comparing
the treatment with a placebo, including the risk of bio-creep (the process of one
treatment being found to be non-inferior, subsequently becoming the standard, until
another treatment is again found to be non-inferior, until efficacy of the ‘standard
treatment’ is reduced).
As the non-inferiority margin is almost always smaller than the corresponding
clinically important difference used in superiority trials, this requires the sample size
required in order to detect this difference to be larger – a four-fold increase in sample
size is required when the non-inferiority margin is half the size of the clinically
important difference, assuming a mean difference of zero.
When analysing a non-inferiority trial, the current recommendation of analysis set is
to perform both intention-to-treat and per-protocol analyses, and if both give
evidence of non-inferiority then non-inferiority can be concluded.
Other methods of analysis that can be included are randomisation based efficacy,
complier average causal effect, and the use of one-sided or two-sided confidence
intervals. In addition, if the original design was testing for non-inferiority, but during
analysis there is evidence for superiority; then superiority can be concluded based
on this evidence without concerns over multiplicity and power, due to the larger
sample size available due to the larger clinically important difference. The reverse
however is not usually appropriate, unless considered a priori and sample size is
enlarged accordingly.
Discussion points included the input of a statistician in defining a non-inferiority
margin, the choice of confidence interval including the appropriate level and one/twosided nature, and the choice of analysis set.
Download