Development of Exercises for Basic Surgical Skills Assessment

advertisement
Development of
Exercises for
Basic Surgical
Skills Assessment
Niyant Patel, James Robbins, Mario Villalba Jr.,
Daryl Reid, and Charles Shanley
Department of Surgery
William Beaumont Hospital,
Royal Oak, Michigan
Changes in Operative
Experience

The 80 hour workweek

Resident Autonomy

Specialized Centers

Minimally Invasive
Surgery
Uniformly Used Methods of
Assessment

Operative Logs

Faculty Evaluations

In-training
Examination scores
Goals and Objectives

To develop low fidelity exercises for basic,
open surgical skills

To demonstrate construct validity

To establish interrater reliability

To show internal consistency of the test
Definitions

Construct validity


Interrater reliability


Extent to which a test discriminates between
various levels of expertise
Extent of agreement between two or more
independent raters
Internal consistency
 Correlation of parts of a test with each other
Model Development




Low fidelity
Reproducible
Portable
Focused on components of basic skills
Model Development

The five included in this study had face
validity*

All exercises were limited by time
 Promote efficiency
 Accentuate differences
* Face validity - Resemblance to real life situations
Exercises 1 & 2
Needle Driving


30 targets
4 x 2 inch label
Exercise 1
Needle Driving

The needle was placed directly through
the target and out the sides
Exercise 2
Needle Driving (blind)

The needle was placed through the
sides and out the target
Exercises 1 & 2
Needle Driving

Metrics recorded
 Accuracy of each
target
 Time (limit 300s)
Score = (Red x 7.46)+ (White x 1.95) + Blue - Miss + ((300-time) x (total completed/30))
Accuracy Scoring
Red
Score = (Red x 7.46)+ (White x 1.95) + Blue - Miss + ((300-time) x (total completed/30))
Accuracy Scoring
White
Score = (Red x 7.46)+ (White x 1.95) + Blue - Miss + ((300-time) x (total completed/30))
Exercise 3
Needle Transferring

30 needles, 3 different sizes

Pick up with forceps, transfer
to needle driver, place into
sponge

Metrics recorded



Number transferred
Number dropped
Time (limit 150s)
Score = (transferred x 2) – dropped + ((150-time) x (needles attempted/30))
Exercise 4
Fine Forceps use

Threading of beads
onto monofilament
with forceps

Metrics recorded
Number threaded
 Time (limit 150s)

Score = Beads Threaded
Exercise 5
Knot Tying

4 knots

Any type or technique

Metrics recorded


Secure knots in
appropriate place
Time (limit 150s)
Score = (knots x 10)+ ((150-time) x (total completed/4))
Testing and Scoring

Forty Volunteers

general surgical residents and attending
surgeons

All participants were scored by an evaluator
and independently scored themselves

Normalization of scores to the highest score
for that exercise

score/high score x 100
Construct Validity
Discrimination between 2 levels of expertise: novice and proficient
Exercises
1 - Needle driving
2 - Needle driving (blind)
3 - Needle transferring
4 - Fine Forceps use
5 - Knot tying
Evaluator Scoring
Novice (24) Proficient (16) p-value
35 (14)
44 (22)
42 (8)
50 (22)
45 (26)
59 (15)
62 (20)
67 (16)
59 (14)
87 (9)
<0.01
0.01
<0.01
0.14
<0.01
Values are means (standard deviation). Analysis by Mann-Whitney U test.
Novice - Junior residents (Postgraduate year level 1-3)
Proficient - Senior residents and attendings (Postgraduate year level 4 and
above)
Interrater Reliability
Extent of agreement between self-scoring and scoring by evaluators
SelfExercises
scoring
1 - Needle driving
51 (18)
2 - Needle driving (blind) 47 (24)
3 - Needle transferring
52 (17)
4 - Fine Forceps use
54 (19)
5 - Knot tying
62 (30)
Evaluator
scoring Difference p-value
45 (19)
6.8 (12) <0.01
51 (22) -3.9 (13) 0.07
52 (17)
0
1
54 (19)
0
1
61 (29)
0.6 (4)
0.32
Values are means (standard deviation). Analysis by paired t-test.
Internal Consistency
Correlation of parts of the test with each other
0.9
Self-scoring
Evaluator scoring
0.85
0.83
0.8
Alpha
Coefficient
0.78
Highly reliable
value
0.75
0.7
Adequate Value
0.6
Overall
Exclusion of Fine
Forceps use
exercise
Limitations

Lack of a significant difference in scores for
the forceps use exercise may be the result
of a type II error

Despite trying to focus on specific
components, our exercises likely test
multiple skills

Only 5 exercises were formally evaluated
Summary

Develop low fidelity exercises for the
assessment of basic, open surgical skills

Discriminate between two levels of expertise
establishing construct validity

Agreement between raters demonstrating
interrater reliability and the ability to selfevaluate

Correlation between the 5 exercises
demonstrating internal consistency
 improved with the exclusion of the forceps
use exercise
Future Directions

Establishment of other forms of validity and
reliability

Development of other exercises to make a
comprehensive set

Demonstrate evidence of improvement with
practice

Use of sophisticated technology
Conclusion

These data provide evidence of
validity, reliability and consistency for
a series of low fidelity exercises with
self-evaluation metrics
Thank you for your time
Current Methods of
Assessment
1.
2.
3.

Operative Logs1, 2

Faculty Evaluations2

In-training
Examination scores3
Cuschieri, A., et al., What do master surgeons think of surgical competence and revalidation? Am J
Surg, 2001. 182(2): p. 110-6.
Reznick, R.K., Teaching and testing technical skills. Am J Surg, 1993. 165(3): p. 358-61.
Scott, D.J., et al., Evaluating surgical competency with the American Board of Surgery In-Training
Examination, skill testing, and intraoperative assessment. Surgery, 2000. 128(4): p. 613-22.
Definitions



Face validity
 Resemblance to real life situations
Content validity
 Domain that is being measured is
actually being measured
Concurrent validity
 Correlation of results with the gold
standard for that domain
Definitions


Predictive validity
 Ability to predict future performance
Test-retest reliability
 Consistency of trainee performance
on different occasions
Construct Validity
Discrimination between 2 levels of expertise: novice and proficient
Discrimin a tio n b etw e en 2 levels o f exper tise : novic e a n d pr o ficie n t
Se lf-scoring
Evalu a to r Sco r ing
Exercises
Novice
(24)
Pro ficie n t
(16)
p -value
Novice
(24)
1 - N e edl e drivi n g
46+/ -17
60+/ -17
0.02
39+/ -16
66+/ -17
<0.01
2 - N e edl e drivi n g (b lind)
35+/ -15
65+/ -23
<0.01
45+/ -22
63+/ -20
0.01
3 - N e edl e tr a nsferring
42+/ -8
67+/ -16
<0.01
42+/ -8
67+/ -16
<0.01
4 - Fin e Forcep s use
50+/ -22
59+/ -14
0.14
50+/ -22
59+/ -14
0.14
5 - Kn o t tying
45+/ -26
88+/ -9
<0.01
45+/ -26
87+/ -9
<0.0 1
V al u es a re m ea n s ± s ta nd a rd d e v iati on. An a ly sis b y M a nn -W h itn e y U tes t.
Pro ficie n t
(16)
p -value
Internal consistency
0.9
Self-scoring
Evaluator scoring
0.8
Highly reliable
value
Cronbach's Alpha
0.7
Coefficient
Adequate value
0.6
0.5
Overall
1
2
3
4
Exercise removed
5
Download