Click to edit Master Comparative title style Research on Training Simulators in • Click to edit Master text styles Emergency Medicine: • Second level A Methodological Review • Third level • Fourth level • Fifth level Matt Lineberry, Ph.D. Research Psychologist, NAWCTSD matthew.lineberry@navy.mil Medical Technology, Training, & Treatment (MT3) May 2012 1 Click to edit Master title style Credits and Disclaimers • • • • • • • Click to edit Master text styles Co-authors Second – Melissalevel Walwanis, Senior Research Psychologist, NAWCTSD Third level – Josephlevel Reni, Research Psychologist, NAWCTSD Fourth Fifth level These are my professional views, not necessarily those of NAWCTSD, NAVMED, etc. 2 Click to edit Master title style Objectives •• • •• • • • Click to edit Master text styles research in Motivate conduct of comparative simulation-based Second level training (SBT) for healthcare Third level Identify challenges evident from past comparative Fourth levelresearch Fifth level Promote more optimal research methodologies in future research 3 Cook et al. (2011) meta-analysis in JAMA: “…we question the need for further studies comparing simulation with no intervention (ie, single-group pretestposttest studies and comparisons with no-intervention controls). …theory-based comparisons between different technology-enhanced simulation designs (simulation vs. simulation studies) that minimize bias, achieve appropriate power, and avoid confounding… are necessary” Issenberg et al. (2011) research agenda in SIH: “…studies that compare simulation training to traditional training or no training (as is often the case in control groups), in which the goal is to justify its use or prove it can work, do little to advance the field of human learning and training.” Click to edit Master title style Moving forward: comparative research • Click to edit Master text styles •How do varying degrees and types of • Second level fidelity affect learning? • Third level •Are some simulation or modalities superior to • Fourth level approaches others? For what learning objectives? • Fifth level Which learners? Tasks? Etc. •How do cost and throughput considerations affect the utility of different approaches? 6 Click to edit Master title style Where are we now? •Searched Click for to peer-reviewed edit Masterstudies text comparing styles training effectiveness of simulation approaches and/or measured • Second practice on level human patients for emergency medical skills •• Third level Searched PubMed and CINAHL – mannequin, manikin, animal, cadaver, simulat*, virtual • Fourth level reality, VR, compar*, versus, and VS Fifth levelsearched Simulation in Healthcare •• Exhaustively • Among identified studies, searched references forward and backward 7 Click to edit Master title style Reviewed studies •17Click to met editcriteria Master text styles studies • Second level • Procedure trained: • Third level – Predominantly needle access (7 studies). 4 airway adjunct, 3 TEAM, 2 FAST, etc. • Fourth level • Fifth level • Simulators compared: – Predominantly manikins, VR systems, and parttask trainers 8 Click to edit Master title style Reviewed studies •• • •• • • Click Design:to edit Master text styles Almost entirely Second levelbetween-subjects (16 of 17) Third level Trainee performance measurement: – 7 were post-test only; all others included pre-tests Fourth level – Most (9 studies) use expert ratings; also:level knowledge tests (7), success/failure (6), and Fifth objective criteria (5) – 6 studies tested trainees on actual patients – 6 tested trainees on one of the simulators used in training 9 Click to edit Master title style Apparent methodological challenges •1. Click to edit Master text styles Inherently smaller differences between conditions – and consequently, underpowered designs • Second level •2. Third level An understandable desire to “prove the null” – but inappropriate approaches to testing • Fourth level equivalence • Fifth level 3. Difficulty measuring or approximating the ultimate criterion: performance on the job 10 Click to edit Master title style Challenge #1: Detecting “small” differences •• Click edit Master Cook et to al. (2011) meta: text styles in outcomes of roughly 0.5-1.2 standard • Differences Second level deviations, favoring simulation-based training over no • simulation. Third level research should expect smaller differences • Comparative Fourth level than these. • Fifth level • HOWEVER, small differences can have great practical significance if they… – correspond to important outcomes (e.g., morbidity or mortality), – can be exploited widely, and/or – can be exploited inexpensively. 11 Click to edit Master title style The power of small differences… • Click to edit Master text styles • Physicians Health Study: • Second levelhalted prematurely due to Aspirin trial • Third level obvious benefit for heart attack • reduction Fourth level – Effect size: r = .034 • Fifth level – Of 22k participants, 85 fewer heart attacks in the aspirin group 12 Click to edit Master title style …and the tyranny of small differences •• Click to edit text styles(power) Probability to Master detect differences exponentially as effect size decreases • decreases Second level •• Third level can’t control effect sizes. We generally • Among Fourth other level things, we can control: – Sample size • Fifth level – Reliability of measurement – Chosen error rates 13 Click to edit Master title style Sample size •• Click to edit Master text styles Among reviewed studies, n ranges from 8 to 62; median n = 15. • Second level •• If n = 15,level α = .05, true difference = 0.2 SDs, and Third measurement is perfectly reliable, of detecting the difference is only 13% • probability Fourth level •RECOMMENDATION: Fifth level Pool resources in multi-site collaborations to achieve needed power to detect effects (and estimate power requirements a priori) 14 Click to edit Master title style Reliability of measurement • •• •• • •• Click to edit Master text styles Potential Secondrater levelerrors are numerous Third level Typical statistical estimates can be uninformative (i.e. coefficient alpha, inter-rater correlations) Fourth level If measures are unreliable – Fifth level and especially if samples are also small – you’ll almost always fail to find differences, whether they exist or not 15 Click to edit Master title style Reliability of measurement •Among Click tostudies edit Master text styles nine using expert ratings: •• Second level Only two used multiple raters for all participants • Third level • Six studies did not estimate reliability at all • Fourth level – One study reported an inter-rater reliability coefficient – Two studies reported correlations between raters’ scores • Fifth Bothlevel approaches make unfounded assumptions • Ratings were never collected on multiple occasions 16 Click to edit Master title style Reliability of measurement • Click to edit Master text styles RECOMMENDATIONS: 1. Use robust measurement protocols – • Second level e.g., frame-of-reference rater training, multiple raters •2. Third level For expert ratings, use generalizability theory to estimate and improve reliability • Fourth level G-theory respects a basic truth: •“Reliability” Fifth level is not a single value associated with a measurement tool Rather, it depends on how you conduct measurement, who is being measured, the type of comparison for which you use the scores, etc. 17 Click to edit Master title style G-theory process, in a nutshell Clickratings, to editusing Master text stylesdesign to expose 1.•Collect an experimental of error •sources Second level (e.g., have multiple raters give ratings, on multiple •occasions) Third level Fourth level 2.•Use ANOVA to estimate magnitude of errors 3.•Given Fifthresults levelfrom step 2, forecast what reliability will result from different combinations of raters, occasions, etc. 18 18 Click to edit Master title style Weighted scoring •• • • • • • Click to editused Master text styles Two studies weighting schemes – more points associated with more critical Second level procedural steps Third level both reliability and validity – Can improve Fourth level RECOMMENDATION: Fifth level Use task analytic procedures to identify criticality of subtasks; weight scores accordingly 19 Click to edit Master title style Selecting error rates • Click to edit Master text styles Why do we choose p = .05 as the • Second level threshold for statistical significance? • Third level • Fourth level • Fifth level 20 Click to edit Master title style Relative severity of errors • Click to edit Master text effective styles than Simulator y” Type I error: “Simulator x is more (but really, they’re • Second levelequally effective) • Third outcome: level Largely trivial; both are equally Potential effective, so erroneously favoring one does not affect •learning Fourth level or patient outcomes • Fifth level Type II error: “Simulators x and y are equally effective” (but really, Simulator X is superior) Potential outcome: Adverse effects on learning and patient outcomes if Simulator X is consequently underutilized 21 Click to edit Master title style Relative severity of errors • Click to edit Master text effective styles than Simulator y” Type I error: “Simulator x is more (but really, they’re • Second levelequally effective) α=.05 • Third outcome: level Largely trivial; both are equally Potential effective, so erroneously favoring one does not affect •learning Fourth level or patient outcomes • Fifth level Type II error: “Simulators x and y are equally effective” (but really, Simulator X is superior) β=1-power Potential outcome: Adverse effects on learning and patient (e.g., 1-.80 = .20) outcomes if Simulator X is consequently underutilized 22 Click to edit Master title style Relative severity of errors •• Click to edit Master text styles RECOMMENDATION: in a new line of research, adopt an • Particularly Second level alpha level that rationally balances inferential • Third errors level according to their severity • Fourth level Cascio, W. F., & Zedeck, S. (1983). Open a new window in rational research planning: Adjust alpha to maximize statistical power. Personnel Psychology, 36, 517-526. • Fifth level Murphy, K. (2004). Using power analysis to evaluate and improve research. In S.G. Rogelberg (Ed.), Handbook of research methods in industrial and organizational psychology (Chapter 6, pp. 119-137). Malden, MA: Blackwell. 23 Click to edit Master title style Challenge #2: Proving the null • Click to edit Master text styles • Language in studies often reflects desire to • Second level assert equivalence – e.g.,level different simulators are “reaching • Third parity” • Fourth level •• Fifth levelnull hypothesis statistical testing Standard (NHST) does not support this assertion – Failure to detect effects should prompt reservation of judgment, not acceptance of the null hypothesis 24 Click to edit Master title style Which assertion is more bold? • Click to edit Master text styles • Second “Sim X is level more effective than Sim Y” • Third level • Fourth level Y favored X favored 0 • Fifth level “Sims X and Y are equally effective” Y favored 0 X favored 25 Click to edit Master title style Proving the null •• • • • • • Click totoedit Master text styles Possible prove the null: – Set a region of practical equivalence around zero Second level – Evaluate whether all plausible differences (e.g., 95% confidence Third level interval) fall within the region Fourth level RECOMMENDATION: Fifth level – Avoid unjustified acceptance of the null – Use strong tests of equivalence when hoping to assert equivalence – Be explicit about what effect size you would consider practically significant, and why 26 Click to edit Master title style Challenge #3: Getting to the ultimate criterion •• Click to is edit text stylesbut job The goal notMaster test performance • performance; Second level “the map is not the terrain” • Third level •• Fourth level Typical to test demonstration of procedures, often on a simulator • Fifth level – Will trainees perform similarly on actual patients, under authentic work conditions? – Do trainees know when to execute the procedure? – Are trainees willing to act promptly? 27 Click to edit Master title style e.g.: Roberts et al. (1997) •• • • • •• Click to edit detected Master text No differences in ratestyles of successful laryngeal mask airway placement for manikin vs. Second level manikin-plus-live-patient training – However: Third levelConfidence very low, and only increased with live-patient practice Fourth level “…if a level nurse does not feel confident enough… the Fifth patient will initially receive pocket-mask or bag-mask ventilation, and this is clearly less desirable” Issue of willingness to act decisively 28 Click to edit Master title style Criterion relevance •• • • • • Click to edit Master text styles RECOMMENDATION: Where possible, Second level use criterion testbeds that correspond highly to actual job performance Third level – Assess performance on human patients/volunteers Fourth level – Replicate performance-shaping factors (not just Fifth level environment) – Test knowledge of indications and willingness to act 29 Click to edit Master title style What if patients can’t be used? • Click to edit Master text styles • Using simulators as the criterion • Second level testbed introduces potential biases • Third level – e.g., level train on cadaver or manikin; • Fourth test on a different manikin • Fifth level 30 Click to edit Master title style A partial solution: Crossed-criterion design • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level 31 Click to edit Master title style A partial solution: Crossed-criterion design •• Click to edit Master text styles Advantages • Second level – Mitigates bias • Third level – Allows comparison of generalization of learning from each training condition • Fourth level •• Disadvantages Fifth level – Precludes pre-testing, if pre-test exposure to each simulator is sufficiently lengthy to derive learning benefits 32 Click to edit Master title style Conclusions • Click to edit Master text styles • “The greatest enemy of a good plan is • Second level the dream of a perfect plan” •• Third level comparative research is to All previous • Fourth levelfor pushing the field forward be lauded •• Fifth level steps can be taken to Concrete maximize the theoretical and practical value of future comparative research 33