Ofsted: Part of the Problem or Part of the Solution? Robert Coe, Durham University Association of Colleges Annual Conference, 19 November 2013 Ofsted: Part of the Problem or Part of the Solution? Both The case for evidence and rigour Accountability dos and ∂ don’ts Problems with judgement and classroom observation What could be improved? 2 Evidence and rigour in the search for real improvement www.cem.org/attachments/publications/ ImprovingEducation2013.pdf ∂ (Updated from Coe, 2007) 4 School ‘improvement’ often isn’t School would have improved anyway – Those willing to improve will (misattributed to intervention) – Chance variation (esp if start low) Poor outcome measures – Perceptions of those who worked hard at it – No assessment of pupil learning ∂ Poor evaluation designs – Weak evaluations more likely to show positive results – Improved intake mistaken for impact of intervention Selective reporting – Dredging for anything positive (within a study) – Only success is publicised (Coe, 2009, 2013) Effect Size (months gain) Impact vs cost www.educationendowmentfoundation.org.uk/toolkit Most promising for raising attainment 8 May be worth it Feedback Meta-cognitive Peer tutoring Homework (Secondary) Collaborative Early Years 1-1 tuition ∂ Behaviour Small gp Phonics Parental tuition involvement ICT Social Individualised Summer schools learning Mentoring Homework (Primary) Performance Aspirations 0 pay £0 Ability grouping Cost per pupil Smaller classes After school Teaching assistants £1000 Small effects / high cost Monitoring the quality of teaching Classroom observation – Much harder than you think! – Multiple observations/ers, trained and QA’d ∂ Progress in assessments – Quality of assessment matters Student ratings – Extremely valuable, if done properly 7 Accountability Accountability cultures Trust Confidence Challenge Supportive Improvement-focus ∂ Problem-solving Long-term Genuine quality Evaluation Distrust Fear Threat Competitive Target-focus Image presentation Quick fix Tick-list quality Sanctions Ways to avoid gaming Choose measures that are genuinely aligned with what is valued (& hard to distort) State general aims, but be vague/flexible about specific targets/measures Actively look for (and publicise) gaming and unintended consequences; encourage whistleblowing on counter-productive gaming ∂ Build in loophole-closing mechanisms (eg to realign credit with difficulty/value) Combine statistical measures with face-to-face observation & judgement Measure a wide range of outcomes Look at distributions, not just thresholds (Bevan & Hood, 2006; Bird et al., 2005; Smith 1995; Fitz-Gibbon 1997) Problems with judgement and classroom observation Do We Know a Successful Teacher When We See One? Filmed lessons (or short clips) of effective (value-added) and ineffective teachers shown to – School Principals and Vice-Principals – Teachers ∂ – Public Some agreement among raters, but unable to identify effective teaching No difference between education experts and others Training in CLASS did help a bit Strong et al 2011 12 Obvious – but not true Why do we believe we can spot good teaching? We absolutely know what we like – Strong emotional response to particular behaviours/styles is hard to over-rule We focus on observable proxies for learning – Learning is invisible ∂ Preferences for particular pedagogies are widely shared, but evidence/understanding of their effectiveness is limited We think learning depends on what the teacher does We assume that if you can do it you can spot it We don’t believe observation can miss so much 13 Poor Proxies for Learning Students are busy: lots of work is done (especially written work) Students are engaged, interested, motivated Students are getting attention: feedback, explanations ∂ Classroom is ordered, calm, under control Curriculum has been ‘covered’ (ie presented to students in some form) (At least some) students have supplied correct answers (whether or not they really understood them or could reproduce them independently) 14 ∂ Hamre et al (2009) 15 ∂ Simons & Chabris (1999) 16 “We generally recommend that observers have some classroom experience. However, we sometimes find that individuals with the most classroom experience have the greatest difficulty becoming certified CLASS observers. Experienced teachers or administrators often have strong opinions about effective teaching practice. The CLASS requires putting those opinions aside, at least while using the CLASS, to attend to and score specific, observable teacher-child interactions.” (Hamre et al 2009, p35) “Becoming a certified CLASS observer requires attending a twoday Observation Training provided by a certified CLASS trainer and passing a reliability test. The∂ reliability test consists of watching and coding five 15-minute classroom video segments online … Trainings with a CLASS certified trainer result in 6080% of trainees passing the first reliability test … CLASS Observation recertification requirements include annually taking and passing a reliability test.” (Hamre et al 2009, p37-8) In the EPPE 3-11 study, observers had 12 days of training and achieved an inter-rater reliability of 0.7. (Sammons et al 2006, p56) 17 Reliability Probability that 2nd rater disagrees Outstanding 12% Best case r = 0.7 ∂ 51% Good Req. Impr. Inadequate 55% 29% 4% 31% 46% 62% 43% 64% 90% 39% 45% 1st rater gives % Overall Percentages based on simulations 18 Worst case r = 0.24 78% Validity Probability value-added data disagrees 1st rater gives % Outstanding 12% Good Best case r = 0.4 Worst case r = -0.3 96% 55% ∂ 71% 40% Req. improv. 29% 59% 79% Inadequate 4% 83% >99% 51% 63% Overall Percentages based on simulations 19 45% Part of the solution Accountability is here to stay It should definitely include site visits and classroom observation ∂ and statements from Recent policy changes Ofsted are positive 20 Requires Improvement Ofsted must demonstrate that all inspectors are able to interpret complex data Ofsted should use a validated protocol for ∂ lesson observation, with appropriate training Ofsted should demonstrate the validity of all aspects of inspectors’ judgements There should be ongoing, transparent, independently verified processes for QA 21