(Summaries written by Laura Collett, Leeds Institute of Clinical Trials Research, University of Leeds) Trials with partial nesting, presentation by Chris Roberts Some interventions require comparisons between two interventions or treatments given in different ways, where one arm includes clustering, and the other does not; for example, group therapy versus a drug. This leads the clustering structure to be incomplete, which has implications for the analysis. Problems highlighted in the analysis of trials with partial nested designs included between treatment heteroscedasticity of continuous outcome measures, and consistency and small sample bias with binary outcome measures. Parameterisation of linear mixed effect models, including random intercept, random coefficient and heteroscedastic models, and their not nested and nested total variance were summarised including the estimate of the ICC. Two trials using partially nested designs were discussed including estimates of effect size, standard error and ICC for different models. Simulation study comparing random effects models, GEEs and Sattherthwaite’s ttest was carried out with the cluster arm as 8 groups of size 6, and the individual arm as 48 individuals. The ratio of individual treatment variance to group variance was set between 0.5 and 2.5, where 7300 simulations were used to estimate test size with precision of 0.005. Graphical summaries of the difference in ICC as the variance ratio (individual/nested) changed between the random intercept, random coefficient, heteroscadastic and GEE models were shown. The same models, with the addition of the Sattherthwaite’s t-test, were compared for empirical size of a 5% level test as the variance ratio increased, showing that GEE models had an empirical level at around 0.075 and random intercept model level decreasing as the variance ratio increased. The conclusions from partial nesting and heteroscedasticity in continuous outcomes is that where total variance of the un-clustered arm is greater the ICC will be increased in random intercept models and decreased in random coefficient and GEE models, an issue that is analogous to the choice between a t-test and the Sattherthwaite t-test when comparing two means with unequal variances, and where there is reason to believe the variances are unequal, the Sattherthwaite t-test should be used. For partially nested binary data, parameterisation of logistic models, including logistic GEEs with robust standard errors, logistic random intercept and logistic random coefficient were summarised including adjusted test for proportions including the design effect. The estimates of the marginal proportions were summarised for each model in the clustered and non-clustered arms. Graphical representation of the log OR when the proportions for the clustered and non-clustered arms were equal and as that proportion increased were shown for different values of the ICC based on different cluster sizes and number of simulations. Graphical representation of the small sample bias for the null treatment effect on the log odds as the proportion increased for different simulation sample sizes were discussed. Models based on simulations comparing 10 clusters of size 10 with 100 individuals, were also compared for their empirical test size and it showed that most models shown were biased for smaller and larger proportions for the lower and upper 2.5% tail respectively, where the LRT performed the best out the models fitted. Stata commands such as ‘sampsi’ and ‘clsampsi’ for estimating power and sample size for clustered trial designs were described and a comparison of ‘clsampsi’ with empirical power between different models with different methods of calculation were made in terms of bias. The use of unequal randomisation in favour of the clustered arm to reduce bias was discussed and a ratio to incur maximum power was put forward, including graphical representations of empirical test sizes between methods with different ratios of unequal randomisation to describe their performance in terms of bias. Note that unequal randomisation can have a detrimental effect on power in individually randomised trial so should be avoided. Design and implementation considerations for a Cluster Randomised Trial in general practice using balanced incomplete blocks, presentation by Amanda Farrin Once research has been conducted, implementation into general practice is required. Research can sometimes produce a lot of waste, be it waste from addressing and incorporating inappropriate questions and designs, lack of accessibility of results, or inherent bias gained during interpretation. The aim of some types of research is to minimise the gap between research and practice, as dissemination of quality research is essential but not sufficient to ensure correct and applicable implementation of interventions. This is especially evident in primary care settings, as it is unfeasible to invent implementation strategies for every new guideline, so adaptable and hence generalisable strategies are needed. ASPIRE (Action to Support Practices Implementing Research Evidence) aims to develop and evaluate an adaptable intervention package to target implementation of ‘high impact’ clinical practice recommendations in general practice; and to focus on recommendations where a measurable change in clinical practice can lead to significant patient benefit. ASPIRE is a cluster randomised controlled trial set in general practices in West Yorkshire to evaluate an adaptable intervention package for two high impact recommendations. The trial will use GP routine data in order to identify whether and how interventions improve patient care, outcomes and cost-effectiveness. The trial is a randomised controlled trial, as this is the optimum design when evaluating behaviour change. It also uses a cluster design, as the inclusion of patient and clinician levels are necessary in implementing clinical practice strategies, and thus to avoid contamination from individual clinician behaviour; randomisation is at the level of the clinician and outcome is at the level of the patient. The trial also incorporates a balanced 2x2 incomplete block design where half of the 60 practices are randomised to an intervention of recommendation 1 and a control of recommendation 2, and the other half are randomised to a control of recommendation 1 and an intervention of recommendation 2. This design has been deemed appropriate for the setting to simultaneously minimise the Hawthorne effect (the non-specific effects arising through trial participation) and maximise power and efficiency. Considering the use of two recommendations, sample size is based on two embedded RCTs, requiring 12,000 records from 60 practices, with a minimum 100 recorded per practice, per recommendation. This provides 90% power to detect 15% difference in adherence rates using 2.5% significance level (to adjust for the use of two outcome comparisons), assuming control adherence at 55% and an ICC of 0.1 (thought to be relatively conservative). A median effect size of 9% has been observed in other guideline implementation studies when implementing recommendations to practices with greater scope for improvement. Design challenges that were highlighted were the assumptions used in calculating sample size, and the choice of recommendations to evaluate. Issues such as cluster size, sample size adjustment for unequal clusters, coefficient of variation, and the design effect, were raised. In addition, assumptions based on increase in adherence were highlighted due to high levels of adherence and variation in adherence apparent for some recommendations. In terms of choice of recommendations it was discussed how to identify ‘high impact’ recommendations based on various criteria. In addition, for the use of 2x2 incomplete block design using 2 recommendations, independence of recommendations must be assumed, as some recommendations intrinsically overlap. Another discussion point highlighted the way interventions were implemented and whether opt-in or opt-out consent processes were preferable given the mechanism of the recommendation; opt-out deemed to be the more pragmatic choice following group discussion. Conclusions were based around the need to keep it simple; minimise bias and maximise generalisability, and to test assumptions as thoroughly as possible using real data. Incomplete block designs can overcome some challenges in implementing research in general practice. Statistical considerations for the design and analysis of non-inferiority trials, presentation by David Gillespie Standard superiority trials are set up in order to detect whether or not a treatment is different from a control in terms of a pre-specified endpoint, but if there is insufficient evidence to conclude a difference, this does not imply that the treatment and the control are the same, as you cannot ‘accept’ the null hypothesis of no difference. Despite this, you may wish to obtain evidence that a treatment is ‘no worse’ than the control, in terms of one pre-specified endpoint, given favourable evidence that other factors are better for that treatment, such as toxicity or cost. Given these points, you may wish to conduct a non-inferiority trial, in order to test whether a new treatment is ‘no worse’ according to a pre-specified margin. The choice of non-inferiority margin to use is usually a combination of statistical and clinical input, given considerations such as the trade-off between the margin of acceptability in efficacy and other benefits such as fewer side-effects, and given that the margin should always be smaller than the difference observed when comparing the treatment with a placebo, including the risk of bio-creep (the process of one treatment being found to be non-inferior, subsequently becoming the standard, until another treatment is again found to be non-inferior, until efficacy of the ‘standard treatment’ is reduced). As the non-inferiority margin is almost always smaller than the corresponding clinically important difference used in superiority trials, this requires the sample size required in order to detect this difference to be larger – a four-fold increase in sample size is required when the non-inferiority margin is half the size of the clinically important difference, assuming a mean difference of zero. When analysing a non-inferiority trial, the current recommendation of analysis set is to perform both intention-to-treat and per-protocol analyses, and if both give evidence of non-inferiority then non-inferiority can be concluded. Other methods of analysis that can be included are randomisation based efficacy, complier average causal effect, and the use of one-sided or two-sided confidence intervals. In addition, if the original design was testing for non-inferiority, but during analysis there is evidence for superiority; then superiority can be concluded based on this evidence without concerns over multiplicity and power, due to the larger sample size available due to the larger clinically important difference. The reverse however is not usually appropriate, unless considered a priori and sample size is enlarged accordingly. Discussion points included the input of a statistician in defining a non-inferiority margin, the choice of confidence interval including the appropriate level and one/twosided nature, and the choice of analysis set.