Assessment in the Workplace Dr Gavin Johnson Consultant Gastroenterologist UCLH Senior Lecturer in Medical Education UCL 1 Objectives 1) Discuss the evolution of workplacebased assessment 2) Argue the pros and cons of WPBA 3) Improve the utility of WPBA 4) Evaluate the utility of WPBA 5) Appreciate the changing role of WPBA in 2012 2 Why assess doctors? • Public confidence – Scepticism of profession to self-regulate (Smith 1998) – Better measures of quality of practice (Scally 1998) • Evidence of competence/inform progression – Tomorrow’s doctors (GMC 1998, 2003) • To drive learning • To improve trainee confidence • To rebuke legal challenges (Van der Vleuten 2000) (Tweed and Miola 2001) 3 The Metro Front Page 2011 4 The Metro Front Page 2011 5 Assessment – Miller 1990 DOES SHOWS HOW KNOWS HOW KNOWS 6 Assessment – Miller 1990 DOES PERFORMANCE SHOWS HOW COMPETENCE KNOWS HOW KNOWLEDGE KNOWS 7 Assessment DOES – Miller 1990 MSF, ACAT SHOWS HOW OSCE, miniCEX KNOWS HOW Best answer MCQ KNOWS T/F MCQ 8 WPBA – the origins • Chart stimulated recall – ABEM >1983 • Multisource feedback – Business and industry >1993 • miniCEX – Norcini >1995 9 10 11 MSF DOPS CbD miniCEX 12 Clinical judgement & decision-making Communication skills Knowledge Curriculum Teaching Skills Interpretative Skills Clinical skills Team Work Audit Procedural Skills 13 13 Clinical judgement & decision-making Knowledge KBA mini-CEX, MSF Curriculum Teaching Skills TO Interpretative Skills CBD Clinical skills Team Work MSF Communication skills CBD, ACAT mini-CEX Audit AA Procedural Skills DOPS 14 14 ✓ WPBA ✗ 15 • In vivo ✓ – higher up Miller’s pyramid • Educational Impact (facilitate feedback) • Drive learning • Gather evidence: – inform decision making – Re-sample borderline trainees • Practical/Cheap 16 ✓ Educational Impact – Cbd Comments • “Very helpful to receive constructive feedback on outpatient encounters + letter written to GP.” • “Helpful to receive structured feedback on consultation in outpatient clinic” • “Valuable exercise covering ground not previously covered in other assessments.” • “Useful assessment. A useful way to document conversations and assessments taking place on a daily basis.” 17 ✓ Educational Impact – Cbd Comments • “Very helpful to receive constructive feedback on outpatient encounters + letter written to GP.” • “Helpful to receive structured feedback on consultation in outpatient clinic” • “Valuable exercise covering ground not previously covered in other assessments.” • “Useful assessment. A useful way to document conversations and assessments taking place on a daily basis.” 18 ✓ Feedback Kolb 1984 19 • • • • ✗ Time – trainee, assessor Space – appropriate areas for discussion Conflict - Turning supervisors into assessors Reliability – calibrating assessors, faculty development • Validity – being used incorrectly/en masse • Formative assessments summative decisions 20 ✓ WPBA ✗ 21 22 “The profession is rightly suspicious of the use of reductive ‘tick-box’ approaches to assess the complexities of professional behaviour, and widespread confusion exists regarding the standards, methods and goals of individual assessment methods…This has resulted in widespread cynicism about WBA within the profession, which is now increasing” 23 ✓ WPBA ✗ 24 Improving and Evaluating WPBA 25 Van der Vleuten (1996) Utility = Rw × Vw × Iw × Pw • Reliability – are the scores reproducible? • Validity – does it measure the knowledge, skills and attitudes it was designed to cover? • Educational Impact – does assessment drive learning? • Practicality/Cost – is assessment feasible an acceptable? w = ‘weighting’ depending on context 26 Validity •To improve: – Match objectives and to assessments – Pilot – Collaborate in the development of the assessment •To measure: – Question trainees and assessors – Correlation between similar performance traits within assessment – Correlation between different assessments measuring similar traits e.g. CbD and ACAT – Do scores improve over 27 time? Reliability To improve: •Train assessors •Use grounded descriptions of performance •Increased number of assessments •Increase number of assessors To measure: • Gather a large number of assessments – Generalisability theory 28 Ask a stupid question, you’ll get a stupid answer: Construct alignment improves the performance of WPBA J Crossley, GJ Johnson, JR Booth, WB Wade Medical Education 2011 29 ACAT ratings CMT 2008-9 - Overall Clinical Judgement Well below expectations 0.0% Below expectations 0.0% Borderline 0.1% n=13,977 Meets Expectations 18.8% Above expectations 55.0% Well above expectations 26.0% 30 Hypothesis ‘WPBA reliability improves when the assessor’s rating uses anchor statements based on clear descriptors of performance, rather than on a scale based on what was expected by the assessor’ 31 Methods (1) • RCP Nationwide electronic portfolio • WPBA form had both old and new scales • Data extracted and anonymised • ‘Real world assessments’ • All years of higher speciality training • Generalisability theory used to calculate reliability for both old and new rating scales 32 Methods (2) : ACAT Anchor Statements Below level expected during Foundation Programme Trainee required frequent supervision to assist in almost all clinical management plans and/or time management Performed at the level expected Trainee required supervision to assist in some clinical management plans and/or time at completion of Foundation Programme / early Core Training management Performed at the level expected on completion of Core Training / early Higher Training Supervision and assistance needed for complex cases, competent to run the acute care period with senior support Performed at the level expected during Higher Training Very little supervising consultant input needed, competent to run the acute care period with occasional senior support Performed at the level expected Able to practise independently and provide for completion of Higher Training senior supervision for the acute care period Results (1) • mini-CEX, n = 3185 • CbD, n = 4513 • ACAT, n = 3235 34 Results (2) : mini-CEX Number of CbDs 3 6 9 12 R co-efficient – old rating 0.55 0.71 0.78 0.83 R co-efficient – new rating 0.77 0.87 0.91 0.93 Results (3) : CbD Number of CbDs 3 6 9 12 R co-efficient – old rating 0.48 0.65 0.73 0.78 R co-efficient – new rating 0.73 0.84 0.89 0.92 Results (4) : ACAT Number of CbDs 3 6 9 12 R co-efficient – old rating 0.21 0.35 0.44 0.52 R co-efficient – new rating 0.36 0.53 0.63 0.70 Conclusion from Study • The reliability of WPBA improves significantly when the ratig for Overall Performance is based on the stage of training (with descriptive anchor statements) rather than a scale based on ‘what was expected’ by assessor 38 Feasibility •To improve: •To measure: length of – Question trainees and assessments assessors • Questionnaires number of • Focus groups required assessments • Interviews – Embed in working day – Assessment form data – Facilitate process • handheld • Duration • Satisfaction 39 Educational Impact •To improve: – Faculty development – Find time! – Encourage free text boxes to be completed (reflective practice) – Discuss at appraisal •To measure: – Question trainees and assessors – Evaluate quality of free text entries 40 Challenges 2012 • Too many WPBA • Ratings removed • Only ‘anchor statements’ • Difficult to use to inform progression • Legal challenges 41 Where do we go? • Clarity – purpose and benefits • Train the assessors • Use formatively only – ? reliability irrelevant • Educational Supervision • Progression needs to be the opinion of an ‘expert’ and evidence based • ARCP decision need to stand up to legal scrutiny 42 Conclusions • Boom…to bust? • There are established benefits – Educational Impact • Consensus needed on how summative decisions are reached – But this must be evidence based 43