Archive Analysis: On EEF Trials 3rd July 2015 896 KB file

advertisement
Archive Analysis: On EEF Trials
School of Education
Adetayo Kasim, ZhiMin Xiao, and Steve Higgins
Outline
2
• Introduction
• Design Methods
– Simple Randomised Trial (SRT)
– Multi-Site Trial (MST)
– Cluster Randomised Trial (CRT)
• Estimation Framework
– Frequentist versus Bayesian Approach
• Discussion
School of Education
EEF Conference
Introduction
3
• This presentation is based on the data released by the
FFT in December 2014. We present here 15 effect sizes
from 11 randomised trials, which involve three design
specifications:
Design
# Trials
# Interv.
SRT
7
10
MST
2
3
CRT
2
2
• The goal of this presentation is to facilitate discussion
around analysis of educational trials using different
analytical models.
School of Education
EEF Conference
Simple Randomised Trials
• These
trials
randomised
children
4
using
simple
randomisation without acknowledging the nested
structure of pupils within schools.
• We analysed the data to investigate if:
– there is any difference between Cohen’s d and Hedges’ g.
– simple randomisation results in zero correlation within
schools, i.e., Intra-Cluster Correlation (ICC) equals zero.
School of Education
EEF Conference
Simple Randomised Trials
0.6
X
0.7
g
d
D
5
IX
0.5
VIII
0.4
VII
ICC
VI
0.3
V
0.2
IV
0.1
III
II
0.0
I
-0.1
-1.0
-0.5
0.0
0.5
Effect sizes from SRTs only
School of Education
0.0
0.1
0.2
1.0
Hedges' g
EEF Conference
Multi-Site Trials
6
• These trials performed randomisation within schools in
order to account for differences in effect between
schools and the nested structure of pupils within
schools.
• We analysed the data to investigate:
– if randomisation within schools removes ICC.
– fixed versus random effects using multilevel modelling
(MLM).
School of Education
EEF Conference
Multi-Site Trials
7
No Interaction
Interv.
1
2
3
With Interaction
Fixed (95% CI)
MLM (95% CI)
Fixed
MLM (95% CI)
Within
0.32 (0.07, 0.70)
0.31 (0.10, 0.66)
-
0.31 (0.11, 0.84)
Total
-
0.28 (0.08, 0.51)
-
0.28 (0.07, 0.50)
ICC
-
0.17
-
0.17
Within
0.40 (0.18, 0.75)
0.40 (0.21, 0.74)
-
0.41 (0.26, 1.00)
Total
-
0.32 (0.13, 0.51)
ICC
-
0.37
-
0.39
Within
0.08 (-0.07, 0.24)
0.08 (-0.08, 0.24)
-
0.08 (-0.10, 0.25)
Total
-
0.08 (-0.08, 0.24)
-
0.07 (-0.10, 0.23)
ICC
-
0.01
-
0.05
School of Education
0.32 (0.13. 0.52)
EEF Conference
Multi-Site Trials
8
• One advantage of the fixed effect model is that it
does not require a minimum number of schools per
treatment arm. However, it relies on a strong
assumption of no treatment-by-school interaction, an
assumption we cannot verify because most studies
are not powered enough to detect such interactions.
School of Education
EEF Conference
Multi-Site Trials
9
• The multilevel model is more robust than the fixed
effect model because treatment-by-school interaction
is specified as random effects. It will always result in a
single effect size estimation per outcome. However,
MLM may be unsuitable for studies with small number
clusters. For Gaussian data, a minimum of five clusters
per treatment arm has been recommended for MLM.
School of Education
EEF Conference
Cluster Randomised Trials
10
• Randomisation to treatment is implemented at school
level. All pupils in the same school receive the same
intervention.
• We used different sources of variability to calculate the
probabilities of observing a certain effect size given the
data we happened to observe.
School of Education
EEF Conference
11
0.4
0.8
Variance
Within
Between
Total
0.0
Probability
Cluster Randomised Trials
0.0
0.2
0.4
0.6
0.8
1.0
0.8
1.0
0.8
0.4
Variance
Within
Between
Total
0.0
Probability
Effect size ³ x
0.0
0.2
0.4
0.6
Effect size ³ x
School of Education
EEF Conference
Cluster Randomised Trials
12
• Using total variability is the most conservative approach
and least likely to result in false positives compared to
the use of within or between variability.
• Within variance is sometimes preferred to ensure
comparability across studies. However, it could also
lead to false positives if there is substantial betweenschool variability.
• Between variability is very prone to false positives. This
should be used with caution!!
School of Education
EEF Conference
Frequentist versus Bayesian Methods
13
• There is a general concern about inference based on
non-random samples due to the validity of standard
errors. This is perhaps one reason for many to choose
Bayesian inference over classical frequentist methods.
• We compared results from three approaches, namely,
Bayesian,
frequentist
with
non-parametric
bootstrapping, and classical frequentist with standard
errors.
School of Education
EEF Conference
Frequentist versus Bayesian Methods
14
Effect sizes from MST and CRT using Bayesian and frequentist methods
Interv.
Bayesian (95% CRI)
Bootstrap (95% QCI)
Freq.SE (95% CI)
1
0.07 (-0.00, 0.14)
0.07 (0.00, 0.14)
0.07 (-0.13, 0.28)
2
0.27 (0.02, 0.53)
0.28 (0.08, 0.51)
0.28 (-0.01, 0.57)
3
0.32 (0.10, 0.54)
0.32 (0.13, 0.51)
0.32 (-0.00, 0.64)
4
0.69 (0.16, 1.22)
0.69 (0.42, 0.92)
0.67 (0.07, 1.28)
5
0.08 (-0.08, 0.25)
0.08 (-0.08, 0.24)
0.08 (-0.11, 0.27)
School of Education
EEF Conference
Discussion
15
• Should simple randomisation be used in Educational
trials?
• Fixed or random effect model for multisite trials?
• Total or within variance for effect size calculation?
• Should bootstrapped confidence interval or Bayesian
credible interval be used to quantify uncertainty in
effect size estimation?
School of Education
EEF Conference
Thank You
School of Education
Adetayo Kasim, ZhiMin Xiao, and Steve Higgins
Download