Archive Analysis: On EEF Trials School of Education Adetayo Kasim, ZhiMin Xiao, and Steve Higgins Outline 2 • Introduction • Design Methods – Simple Randomised Trial (SRT) – Multi-Site Trial (MST) – Cluster Randomised Trial (CRT) • Estimation Framework – Frequentist versus Bayesian Approach • Discussion School of Education EEF Conference Introduction 3 • This presentation is based on the data released by the FFT in December 2014. We present here 15 effect sizes from 11 randomised trials, which involve three design specifications: Design # Trials # Interv. SRT 7 10 MST 2 3 CRT 2 2 • The goal of this presentation is to facilitate discussion around analysis of educational trials using different analytical models. School of Education EEF Conference Simple Randomised Trials • These trials randomised children 4 using simple randomisation without acknowledging the nested structure of pupils within schools. • We analysed the data to investigate if: – there is any difference between Cohen’s d and Hedges’ g. – simple randomisation results in zero correlation within schools, i.e., Intra-Cluster Correlation (ICC) equals zero. School of Education EEF Conference Simple Randomised Trials 0.6 X 0.7 g d D 5 IX 0.5 VIII 0.4 VII ICC VI 0.3 V 0.2 IV 0.1 III II 0.0 I -0.1 -1.0 -0.5 0.0 0.5 Effect sizes from SRTs only School of Education 0.0 0.1 0.2 1.0 Hedges' g EEF Conference Multi-Site Trials 6 • These trials performed randomisation within schools in order to account for differences in effect between schools and the nested structure of pupils within schools. • We analysed the data to investigate: – if randomisation within schools removes ICC. – fixed versus random effects using multilevel modelling (MLM). School of Education EEF Conference Multi-Site Trials 7 No Interaction Interv. 1 2 3 With Interaction Fixed (95% CI) MLM (95% CI) Fixed MLM (95% CI) Within 0.32 (0.07, 0.70) 0.31 (0.10, 0.66) - 0.31 (0.11, 0.84) Total - 0.28 (0.08, 0.51) - 0.28 (0.07, 0.50) ICC - 0.17 - 0.17 Within 0.40 (0.18, 0.75) 0.40 (0.21, 0.74) - 0.41 (0.26, 1.00) Total - 0.32 (0.13, 0.51) ICC - 0.37 - 0.39 Within 0.08 (-0.07, 0.24) 0.08 (-0.08, 0.24) - 0.08 (-0.10, 0.25) Total - 0.08 (-0.08, 0.24) - 0.07 (-0.10, 0.23) ICC - 0.01 - 0.05 School of Education 0.32 (0.13. 0.52) EEF Conference Multi-Site Trials 8 • One advantage of the fixed effect model is that it does not require a minimum number of schools per treatment arm. However, it relies on a strong assumption of no treatment-by-school interaction, an assumption we cannot verify because most studies are not powered enough to detect such interactions. School of Education EEF Conference Multi-Site Trials 9 • The multilevel model is more robust than the fixed effect model because treatment-by-school interaction is specified as random effects. It will always result in a single effect size estimation per outcome. However, MLM may be unsuitable for studies with small number clusters. For Gaussian data, a minimum of five clusters per treatment arm has been recommended for MLM. School of Education EEF Conference Cluster Randomised Trials 10 • Randomisation to treatment is implemented at school level. All pupils in the same school receive the same intervention. • We used different sources of variability to calculate the probabilities of observing a certain effect size given the data we happened to observe. School of Education EEF Conference 11 0.4 0.8 Variance Within Between Total 0.0 Probability Cluster Randomised Trials 0.0 0.2 0.4 0.6 0.8 1.0 0.8 1.0 0.8 0.4 Variance Within Between Total 0.0 Probability Effect size ³ x 0.0 0.2 0.4 0.6 Effect size ³ x School of Education EEF Conference Cluster Randomised Trials 12 • Using total variability is the most conservative approach and least likely to result in false positives compared to the use of within or between variability. • Within variance is sometimes preferred to ensure comparability across studies. However, it could also lead to false positives if there is substantial betweenschool variability. • Between variability is very prone to false positives. This should be used with caution!! School of Education EEF Conference Frequentist versus Bayesian Methods 13 • There is a general concern about inference based on non-random samples due to the validity of standard errors. This is perhaps one reason for many to choose Bayesian inference over classical frequentist methods. • We compared results from three approaches, namely, Bayesian, frequentist with non-parametric bootstrapping, and classical frequentist with standard errors. School of Education EEF Conference Frequentist versus Bayesian Methods 14 Effect sizes from MST and CRT using Bayesian and frequentist methods Interv. Bayesian (95% CRI) Bootstrap (95% QCI) Freq.SE (95% CI) 1 0.07 (-0.00, 0.14) 0.07 (0.00, 0.14) 0.07 (-0.13, 0.28) 2 0.27 (0.02, 0.53) 0.28 (0.08, 0.51) 0.28 (-0.01, 0.57) 3 0.32 (0.10, 0.54) 0.32 (0.13, 0.51) 0.32 (-0.00, 0.64) 4 0.69 (0.16, 1.22) 0.69 (0.42, 0.92) 0.67 (0.07, 1.28) 5 0.08 (-0.08, 0.25) 0.08 (-0.08, 0.24) 0.08 (-0.11, 0.27) School of Education EEF Conference Discussion 15 • Should simple randomisation be used in Educational trials? • Fixed or random effect model for multisite trials? • Total or within variance for effect size calculation? • Should bootstrapped confidence interval or Bayesian credible interval be used to quantify uncertainty in effect size estimation? School of Education EEF Conference Thank You School of Education Adetayo Kasim, ZhiMin Xiao, and Steve Higgins