Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard Analysis of Randomised Controlled Trials (RCTs) Paul Connolly Centre for Effective Education Queen’s University Belfast Conference of EEF Evaluators: Building Evidence in Education Training Day, 11 July 2013 Main Analysis of Simple RCT • These slides provide an introductory overview of one approach to analysing RCTs • Assume we are dealing with a continuous outcome variable that is broadly normally distributed • Three variables: • Pre-test score “score1” (centred so that mean = 0) • Post-test score “score2” • Group membership “intervention” (coded 0 = control group; 1 = intervention group) • Basic analysis via linear regression: predicted score2 = b0*constant + b1* intervention + b2*score1 Main Analysis of Simple RCT Predicted score2 = b0*constant + b1* intervention + b2*score1 • b0 = adjusted mean post-test score for those in control group • b0 + b1 = adjusted mean post-test score for those in intervention group • Estimate standard deviations for post-test mean scores using s.d. for predicted score2 for control and intervention group separately* • Significance of b1 = significance of difference between post-test mean scores for intervention and control groups • Effect size, Cohen’s d = b1 / [s.d. for pred. score2] • 95% confidence interval for effect size: = b1 ± 1.96*(standard error of b1) standard deviation for pred. score2 *Most statistical software packages provide the option of creating a new variable comprising the predicted scores of the model. This new variable is the one to use to estimate standard deviations for adjusted post-test scores. Exploratory Analysis of Mediating Effects for RCT • Take example of gender differences (variable “boy”, coded as: 0 = girls; 1= boys) • Analysis via extension of basic linear regression model: predicted score2 = b0*constant + b1* intervention + b2*score1 + b3*boy + b4*boy*intervention • Significance of b4 indicates whether there is evidence of an interaction effect (i.e. in this case that the intervention has differential effects for boys and girls) • Same approach when your contextual variable is continuous rather than binary as here Exploratory Analysis of Mediating Effects for RCT predicted score2 = b0*constant + b1* intervention + b2*score1 + b3*boy + b4*boy*intervention • Use the model to estimate adjusted mean post-test scores*: • b0 = girls in control group • b0 + b3 = girls in control group • b0 + b1 = girls in intervention group • b0 + b3+ b4 = boys in intervention group • Estimate standard deviations by calculating s.d. for predicted score2 for each subgroup separately *When dealing with a continuous contextual variable, it is often still useful to calculate adjusted mean post-test scores to illustrate any interaction effects found. This can be done by using the model to predict the adjusted post-test mean scores for those participants in the control and intervention groups who have a score for the contextual variable concerned that is one standard deviation below the mean and then doing the same for those who have a score one standard deviation above the mean. Extending the Analysis • For trials with binary or ordinal outcome measures, the same approach can be used but with generalised linear regression models: – Binary logistic regression (binary outcomes) – Ordered logistic regression (ordinal outcomes) • For cluster randomised trials (with >30 clusters), the same models can be used but extended to create two level models • For quasi-experimental designs, either: – Same models as above but adding in a number of additional co-variates (all centred) to control for pre-test differences – Propensity score matching • For repeated measures designs can also extend the above using multilevel models with observations (level 1) clustered within individuals (level 2) Discussion (2 mins) Write on post-it notes: • What are the key issues or questions for evaluators? • Have you found any solutions? Analysis Stephen Gorard s.a.c.gorard@durham.ac.uk http://www.evaluationdesign.co.uk/ What is N? how many cases were assessed for eligibility? how many of those assessed did not participate, and for what reasons (not meeting criteria, refused etc.)? how many then agreed to participate? how many were allocated to each group (if relevant)? how many were lost or dropped out after agreeing to participate (and after allocation to a group, if relevant)? how many were analysed, and why were any further cases excluded from the analysis? An example of reporting problems with a sample In total, 314 individual Year 7 pupils took part in the study. 157 pupils were assigned to treatment and 157 to control. The sample included students from a disadvantaged background (eligible for free school meals), those with a range of learning disabilities (SEN) and those for whom English was a second language. By the final analysis six students had dropped out or could not be included in the gain score analysis. One took the pre-test (repeatedly) but his school were unable to record the score. His post-test score was 78, and he would have been in the control. Five others took the pre-test but did not sit the post-test. One left the school and could not be traced, initially scored 78 and would have been in treatment. One left the school and their new school was not able to arrange the post-test, initially scored 64 and would have been control. One changed schools, one could not get their score saved at pre-test, one refused to cooperate and one was persistently absent at post-test (perhaps excluded). Although this loss of data, and the reduction of the sample to 308 pupils, is unfortunate, there is no specific reason to believe that this dropout was biased or favoured one group over the other. Pupils allocated to groups but with no gain score, and reason for omission Allocation Pre-test score Post-test score Reason Treatment group 78 - Left school, not traced Treatment group 73 - Long-term sick during post-test Control 74 - Left school, new school would not test Control 75 - Withdrawn, personal reasons Control - 70 Pre-test not recorded, technical reasons Control 73 - Permanently excluded by school Source: Gorard, S., Siddiqui, N. and See, BH (2013) Process and summative evaluation of the Switch-On literacy transition programme, Report to the Educational Endowment Foundation Discussion (2 mins) Write on post-it notes: • What are the key issues or questions for evaluators? • Have you found any solutions? Calculating effect sizes and the toolkit meta-analysis – implications for evaluators Steve Higgins s.e.higgins@durham.ac.uk School of Education, Durham University EEF Evaluators Conference, June 2013 Sutton Trust/EEF Teaching and Learning Toolkit Comparative evidence Aims to identify ‘best buys’ for schools Based on meta-analysis http://educationendowmentfoundation.org.uk/toolkit What is meta-analysis? A way of combining the results of quantitative research To accumulate evidence from smaller studies To compare results of similar studies - consistency To investigate patterns of association in the findings of different studies – explaining variation ‘Surveys’ research studies Why meta-analysis? Cumulative – synthesis of evidence Based on size of effect and confidence intervals rather than significance testing – patterns in the data Identifying and understanding variation helps develop explanatory models What is an “effect size”? Standardised way of looking at difference Different methods for calculation Binary (Risk difference, Odds ratio, Risk ratio) Continuous Correlational (Pearson’s r) Standardised mean difference (d, g, Δ) Difference between control and intervention group as proportion of the dispersion of scores Intervention group score – control group score / standard deviation of scores Examples of Effect Sizes: ES = 0.2 “Equivalent to the difference in heights between 15 and 16 year old girls” 58% of control group below mean of experimental group Probability you could guess which group a person was in = 0.54 Change in the proportion above a given threshold: from 50% to 58% or from 75% to 81% ES = 0.8 “Equivalent to the difference in heights between 13 and 18 year old girls” 79% of control group below mean of experimental group Probability you could guess which group a person was in = 0.66 Change in the proportion above a given threshold: from 50% to 79% or from 75% to 93% The rationale for using effect sizes Traditional quantitative reviews focus on statistical significance testing Highly dependent on sample size Null finding does not carry the same “weight” as a significant finding Meta-analysis focuses on the direction and magnitude of the effects across studies From “Is there a difference?” to “How big is the difference?” and “How consistent is the difference?” Direction and magnitude represented by “effect size” Issues and challenges in meta-analysis Conceptual Reductionist - the answer is .42 Comparability - apples and oranges Atheoretical - ‘flat-earth’ Technical Heterogeneity Publication bias Methodological quality Comparative meta-analysis Theory testing Emphasises practical value Incorporate EEF findings in new Toolkit meta-analyses Ability grouping Slavin 1990 b (secondary low attainers) -0.06 Lou et al 1996 (on low attainers) -0.12 Kulik & Kulik 1982 (secondary - all) 0.10 Kulik & Kulik 1984 (elementary - all) 0.07 Meta-cognition and self-regulation strategies Abrami et al. 2008 0.34 Haller et al. 1988 0.71 Klauer & Phye 2008 0.69 Higgins et al. 2004 0.62 Chiu 1998 0.67 Dignath et al. 2008 0.62 Calculating effect sizes The difference between the two means, expressed as a proportion of the standard deviation ES = (Me – Mc) / SD Cohen's d Glass’ Δ Hedges' g Reporting effect sizes: RCTs Post-test standardised mean difference with confidence intervals Fixed effect ok for individual randomisation Not for clusters… Cluster analysis MLM Equivalent measure Other comparisons Matched, Regression discontinuity http://www.cem.org/evidence-based-education/effect-size-calculator Discussion task What analyses are you intending to undertake? How do you plan to calculate effect size(s)? What statistical techniques: 1. 2. 3. Are you confident to undertake? Would be happy to advise other evaluation teams? Would appreciate advice and/or support? Key requirement: be explicit… Describe analysis decisions (e.g. ITT and missing data) Report clusters separately Submit complete data-set in case different analysis is required for comparability References, further readings and information Books and articles Borenstein, M., Hedges, L.V., Higgins, J.P.T. & Rothstein, H.R. (2009) Introduction to Meta Analysis (Statistics in Practice) Oxford: Wiley Blackwell. Chambers, E.A. (2004). An introduction to meta-analysis with articles from the Journal of Educational Research (1992-2002). Journal of Educational Research, 98, pp 35-44. Cooper, H.M. (1982) Scientific Guidelines for Conducting Integrative Research Reviews Review Of Educational Research 52; 291. Cooper, H.M. (2009) Research Synthesis and meta-analysis: a step-by-step approach London: SAGE Publications (4th Edition). Cronbach, L. J., Ambron, S. R., Dornbusch, S. M., Hess, R.O., Hornik, R. C., Phillips, D. C., Walker, D. F., & Weiner, S. S. (1980). Toward reform of program evaluation: Aims, methods, and institutional arrangements. San Francisco, Ca.: Jossey-Bass. Eldridge, S. & Kerry, S. (2012) A Practical Guide to Cluster Randomised Trials in Health Services Research London: Wiley Blackwell Glass, G.V. (2000). Meta-analysis at 25. Available at: http://glass.ed.asu.edu/gene/papers/meta25.html (accessed 9/9/08) Lipsey, Mark W., and Wilson, David B. (2001). Practical Meta-Analysis. Applied Social Research Methods Series (Vol. 49). Thousand Oaks, CA: SAGE Publications. Torgerson, C. (2003) Systematic Reviews and Meta-Analysis (Continuum Research Methods) London: Continuum Press. Websites What is an effect size?, by Rob Coe: http://www.cemcentre.org/evidence-based-education/effect-size-resources The meta-analysis of research studies: http://echo.edres.org:8080/meta/ The Meta-Analysis Unit, University of Murcia: http://www.um.es/metaanalysis/ The PsychWiki: Meta-analysis: http://www.psychwiki.com/wiki/Meta-analysis Meta-Analysis in Educational Research: http://www.dur.ac.uk/education/meta-ed/ Discussion (2 mins) Write on post-it notes: • What are the key issues or questions for evaluators? • Have you found any solutions? Interpreting and Reporting Findings and Managing Expectations Paul Connolly Centre for Effective Education Queen’s University Belfast Conference of EEF Evaluators: Building Evidence in Education Training Day, 11 July 2013 Interpreting Findings • Findings: – only relate to the outcomes measured – represent effects of programme compared to what those in the control group currently receive – usually only relate to sample recruited (and thus are context- and time-specific) • Dangers of: – ‘fishing exercises’ characterised by post-hoc decisions to consider other outcomes and/or differences in effects for differing sub-groups – hypothesising regarding the causes of the effects (or reasons for the non-effects) Reporting Findings • Being clear: – Option of using adjusted post-test scores – Conversion of findings into effect sizes more readily understandable (e.g. ‘improvement index’) • Being transparent: – Identify outcomes at the beginning and stick to these; register the trial – Report methods fully (CONSORT statement) • Being tentative: – Acknowledge limitations – Move from evidence of “what works” to evidence of “what works for specific pupils, in a particular context and at a particular time” Example: Adjusted post-test scores Source: Connolly, P., Miller, S. & Eakin, A. (2010) A Cluster Randomised Controlled Trial Evaluation of the Media Initiative for Children: Respecting Difference Programme. Belfast: Centre for Effective Education (p. 31). See: http://www.qub.ac.uk/research-centres/CentreforEffectiveEducation/Publications/ Example: Improvement index • Take effect size and convert to Cohen’s U3 index (either by using statistical tables of effect size calculators online) • The improvement index represents the increase/decrease in the percentile rank for an average student in the intervention group (assuming at pre-test they are at the 50th percentile) • Effect size of 0.30 U3 of 62% i.e. the intervention is likely to result in an average student in the intervention group being ranked 12 percentile points higher compared to the average student in the control group (who would remain at the 50th percentile). 0.10 4 percentile points 0.20 8 percentile points 0.40 16 percentile points 0.50 19 percentile points Managing Expectations • Regular and ongoing communication is the key • Importance of logic models and agreement of outcomes with programme developers/providers at the outset – Careful consideration of the intervention and associated activities and clear link between these and expected outcomes – Ensure outcomes are domain-specific • Include sufficient time to discuss findings with programme developers/providers – Talk through possible interpretations – Discuss further potential analyses (but be clear that these are exploratory) Discussion (2 mins) Write on post-it notes: • What are the key issues or questions for evaluators? • Have you found any solutions? Group discussion and feedback Tables will be arranged by theme. Evaluators should move to the table with a theme which either they are able to contribute expertise on or which they are struggling with. Tables should discuss: • What are the key issues or questions for evaluators? • What are the solutions? • How can the EEF help? Feedback from tables.