Development of Exercises for Basic Surgical Skills Assessment Niyant Patel, James Robbins, Mario Villalba Jr., Daryl Reid, and Charles Shanley Department of Surgery William Beaumont Hospital, Royal Oak, Michigan Changes in Operative Experience The 80 hour workweek Resident Autonomy Specialized Centers Minimally Invasive Surgery Uniformly Used Methods of Assessment Operative Logs Faculty Evaluations In-training Examination scores Goals and Objectives To develop low fidelity exercises for basic, open surgical skills To demonstrate construct validity To establish interrater reliability To show internal consistency of the test Definitions Construct validity Interrater reliability Extent to which a test discriminates between various levels of expertise Extent of agreement between two or more independent raters Internal consistency Correlation of parts of a test with each other Model Development Low fidelity Reproducible Portable Focused on components of basic skills Model Development The five included in this study had face validity* All exercises were limited by time Promote efficiency Accentuate differences * Face validity - Resemblance to real life situations Exercises 1 & 2 Needle Driving 30 targets 4 x 2 inch label Exercise 1 Needle Driving The needle was placed directly through the target and out the sides Exercise 2 Needle Driving (blind) The needle was placed through the sides and out the target Exercises 1 & 2 Needle Driving Metrics recorded Accuracy of each target Time (limit 300s) Score = (Red x 7.46)+ (White x 1.95) + Blue - Miss + ((300-time) x (total completed/30)) Accuracy Scoring Red Score = (Red x 7.46)+ (White x 1.95) + Blue - Miss + ((300-time) x (total completed/30)) Accuracy Scoring White Score = (Red x 7.46)+ (White x 1.95) + Blue - Miss + ((300-time) x (total completed/30)) Exercise 3 Needle Transferring 30 needles, 3 different sizes Pick up with forceps, transfer to needle driver, place into sponge Metrics recorded Number transferred Number dropped Time (limit 150s) Score = (transferred x 2) – dropped + ((150-time) x (needles attempted/30)) Exercise 4 Fine Forceps use Threading of beads onto monofilament with forceps Metrics recorded Number threaded Time (limit 150s) Score = Beads Threaded Exercise 5 Knot Tying 4 knots Any type or technique Metrics recorded Secure knots in appropriate place Time (limit 150s) Score = (knots x 10)+ ((150-time) x (total completed/4)) Testing and Scoring Forty Volunteers general surgical residents and attending surgeons All participants were scored by an evaluator and independently scored themselves Normalization of scores to the highest score for that exercise score/high score x 100 Construct Validity Discrimination between 2 levels of expertise: novice and proficient Exercises 1 - Needle driving 2 - Needle driving (blind) 3 - Needle transferring 4 - Fine Forceps use 5 - Knot tying Evaluator Scoring Novice (24) Proficient (16) p-value 35 (14) 44 (22) 42 (8) 50 (22) 45 (26) 59 (15) 62 (20) 67 (16) 59 (14) 87 (9) <0.01 0.01 <0.01 0.14 <0.01 Values are means (standard deviation). Analysis by Mann-Whitney U test. Novice - Junior residents (Postgraduate year level 1-3) Proficient - Senior residents and attendings (Postgraduate year level 4 and above) Interrater Reliability Extent of agreement between self-scoring and scoring by evaluators SelfExercises scoring 1 - Needle driving 51 (18) 2 - Needle driving (blind) 47 (24) 3 - Needle transferring 52 (17) 4 - Fine Forceps use 54 (19) 5 - Knot tying 62 (30) Evaluator scoring Difference p-value 45 (19) 6.8 (12) <0.01 51 (22) -3.9 (13) 0.07 52 (17) 0 1 54 (19) 0 1 61 (29) 0.6 (4) 0.32 Values are means (standard deviation). Analysis by paired t-test. Internal Consistency Correlation of parts of the test with each other 0.9 Self-scoring Evaluator scoring 0.85 0.83 0.8 Alpha Coefficient 0.78 Highly reliable value 0.75 0.7 Adequate Value 0.6 Overall Exclusion of Fine Forceps use exercise Limitations Lack of a significant difference in scores for the forceps use exercise may be the result of a type II error Despite trying to focus on specific components, our exercises likely test multiple skills Only 5 exercises were formally evaluated Summary Develop low fidelity exercises for the assessment of basic, open surgical skills Discriminate between two levels of expertise establishing construct validity Agreement between raters demonstrating interrater reliability and the ability to selfevaluate Correlation between the 5 exercises demonstrating internal consistency improved with the exclusion of the forceps use exercise Future Directions Establishment of other forms of validity and reliability Development of other exercises to make a comprehensive set Demonstrate evidence of improvement with practice Use of sophisticated technology Conclusion These data provide evidence of validity, reliability and consistency for a series of low fidelity exercises with self-evaluation metrics Thank you for your time Current Methods of Assessment 1. 2. 3. Operative Logs1, 2 Faculty Evaluations2 In-training Examination scores3 Cuschieri, A., et al., What do master surgeons think of surgical competence and revalidation? Am J Surg, 2001. 182(2): p. 110-6. Reznick, R.K., Teaching and testing technical skills. Am J Surg, 1993. 165(3): p. 358-61. Scott, D.J., et al., Evaluating surgical competency with the American Board of Surgery In-Training Examination, skill testing, and intraoperative assessment. Surgery, 2000. 128(4): p. 613-22. Definitions Face validity Resemblance to real life situations Content validity Domain that is being measured is actually being measured Concurrent validity Correlation of results with the gold standard for that domain Definitions Predictive validity Ability to predict future performance Test-retest reliability Consistency of trainee performance on different occasions Construct Validity Discrimination between 2 levels of expertise: novice and proficient Discrimin a tio n b etw e en 2 levels o f exper tise : novic e a n d pr o ficie n t Se lf-scoring Evalu a to r Sco r ing Exercises Novice (24) Pro ficie n t (16) p -value Novice (24) 1 - N e edl e drivi n g 46+/ -17 60+/ -17 0.02 39+/ -16 66+/ -17 <0.01 2 - N e edl e drivi n g (b lind) 35+/ -15 65+/ -23 <0.01 45+/ -22 63+/ -20 0.01 3 - N e edl e tr a nsferring 42+/ -8 67+/ -16 <0.01 42+/ -8 67+/ -16 <0.01 4 - Fin e Forcep s use 50+/ -22 59+/ -14 0.14 50+/ -22 59+/ -14 0.14 5 - Kn o t tying 45+/ -26 88+/ -9 <0.01 45+/ -26 87+/ -9 <0.0 1 V al u es a re m ea n s ± s ta nd a rd d e v iati on. An a ly sis b y M a nn -W h itn e y U tes t. Pro ficie n t (16) p -value Internal consistency 0.9 Self-scoring Evaluator scoring 0.8 Highly reliable value Cronbach's Alpha 0.7 Coefficient Adequate value 0.6 0.5 Overall 1 2 3 4 Exercise removed 5