COMPARING THE ACCURACY OF PERFORMING DIGITAL AND PAPER CHECKLISTS USING A FEEDBACK INTERVENTION PACKAGE DURING NORMAL WORKLOAD CONDITIONS IN SIMULATED FLIGHT William Rantz Western Michigan University Department of Aviation Sciences March 16, 2009 Overview Rationale & Purpose Location & Duration Frasca Flight Training Device Participants Paper-Digital Checklists & Flight Pattern Dependent Variables Independent Variables & Integrity of IVs Experimental Design Results Inferential Results Discussion & Future Research 2 Rationale Most common error cited in LOSA data Observational data 54% of errors (Helmreich et al., 2001) Contributing factor to numerous accidents Improper configuration of aircraft (NTSB, 1969, 1975, 1982, 1988a, 1988b, 1989, 1990, 1997, 1998, 2001, 2002, 2003a, 2003b, 2004a, 2004b, 2006, 2007a, 2007b, 2008a, 2008b, 2008c, 2008d) Improper use of checklist (Adamski & Stahl, 1997 Degani, 1992, 2002; Diez, Boehm-Davis, & Holt, 2003; Federal Aviation Administration [FAA], 1995, 2000; Lautmann and Gallimore, 1987; Turner, 2001) 3 Rationale Digital Checklist Importance Assertion: Errors prone to paper are reduced by digital format • • • • • • • • Items skipped Losing place when distracted Incorrect switch selected Item incorrectly confirmed complete Excessive psychomotor workload fumbling with paper Unreadable text due to low illumination Subsequent checklist accomplished before critical flight phase All checklists omitted Enhanced flow of branched sequences-abnormal checklists These errors not experimentally confirmed (Boorman, 2001) Improved checklist performance using graphic feedback Normal work load in simulated environment Paper checklists (Rantz, Dickinson, Sinclair, Van Houten, 2009) 4 Purpose To examine whether pilots would complete airplane digital checklists more accurately when they received post-flight graphic and verbal feedback No study compared the accuracy of traditional paper against emergent digital checklists Only second study in aviation to attempt to increase checklist accuracy using experimental manipulation of IVs 5 Location & Duration The simulation laboratory is located in a hanger at WMU’s Aviation Education Center in Battle Creek, MI Data collection took approximately 64 sessions July 29, 2008 through March 11, 2009 256 flight trials 6 Frasca 241 Flight Training Device –Cirrus SR20 7 Frasca 241 Instructor Station Cirrus SR20 8 Cirrus SR20 Primary Flight Display (PFD) 9 Cirrus SR20 Multi Function Display (MFD) 10 Cirrus SR20 Multi Function Display (MFD) Checklist 11 Observation Area 12 Participants 6 WMU flight students Males Private Pilot Certificate 125 minimum flight hours Instrument rated Average 186 total time flight hours Average 80 hours total time in Cirrus Aircraft or FTD 13 Digital Flight Checklist 14 Paper Flight Checklist 15 Null Hypotheses 1) There is no intervention effect with either paper or digital checklists. Tested at both the individual and group level 2) There is no difference between paper or digital checklist accuracy during all phases. 16 Main Dependent Variable The number of paper checklist items completed correctly per flight The number of digital checklist items completed correctly per flight 17 Secondary Dependent Variable 1 The percentage of total errors for each of the eight flight segments using the paper checklist during each experimental phase (baseline, feedback, and reversal) per participant The percentage of total errors for each of the eight flight segments using the digital checklist during each experimental phase (baseline, feedback, and reversal) per participant 18 Secondary Dependent Variable 2 The percentage of baseline trials participants performed each of the paper checklist items incorrectly The percentage of baseline trials participants performed each of the digital checklist items incorrectly 19 Secondary Dependent Variable 3 The percentage of baseline trials participants omitted paper checklist items The percentage of baseline trials participants omitted digital checklist items 20 Secondary Dependent Variable 4 Percentage of timing errors participants performed paper checklist segments too early or too late Percentage of timing errors participants performed digital checklist segments too early or too late 21 Experimental Phases Baseline Checklist Graphic Feedback & Vocal Praise Technical feedback of flight performance Graphic feedback on the total number of checklist items completed correctly per flight Graphic feedback on the number of items completed correctly, completed incorrectly, and omitted for each of the eight flight segments per flight Vocal praise for any improvement Reversal Only technical feedback of flight performance was given Only technical feedback of flight performance was given 60-90 Day Data Probe No Feedback was given 22 Technical Feedback 23 Graphic Feedback Received by Participant: Total Checklist Items Correct Participant 03 - Items Completed Correctly Closed Data Points Represent Digital Checklist Open Data Points Represent Paper Checklist 70 Number of Checklsit Items 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Trials 24 Graphic Feedback Received by Participant: Flight Segments (Total-Correct-Incorrect-Omitted) P6S7T1D Checklist Items Per Segment 70 60 50 40 Total Items Correct Incorrect 30 Omitted 20 10 0 Before Takeoff Normal Takeoff Climb Cruise Descent Before Landing After Landing Shut Down Total 25 Inter Observer Agreement Inter-observer agreement for correct and incorrect items was an average of 95% with a range of 79% to 100%. Inter-observer agreement for omitted items was an average of 97% with a range of 63% to 100%. 26 Integrity of the IV Technical flight and checklist feedback were read from prepared scripts Participants were asked to initial the technical flight diagrams and the checklist feedback graphs and returned to the experimenter. Integrity of IV = 100% 27 Experimental Design A multiple baseline with reversal design across paired individuals Initial phase change occurred when performance was judged as stable upon visual inspection Reversal phase change occurred within 3 consecutive (paper or digital) trials exceeded 95% items correct Probe phase occurred between 60-90 days past last reversal trial 28 Results Individually and grouped-all participants using both paper and digital checklists increased performance accuracy over baseline when post-flight checklist feedback and praise was added Improvements in performance remained near perfect during intervention withdrawal Improvements declined slightly within the 6090 day probe period Results were statistically significant 29 Results-Figure 5 continued Average percentage of paper checklist items completed correctly increased from 38% during the baseline phase to 90% during the intervention phase Average percentage of digital checklist items completed correctly increased from 39% during the baseline phase to 89% during the intervention phase 30 Results-Figure 5 continued The average percentage of paper checklist items completed correctly was nearly 100% during the return to baseline condition The average percentage of digital checklist items completed correctly was 99% during the return to baseline condition 31 Results-Figure 5 continued The average percentage of paper checklist items completed correctly was 97% during the probe condition The average percentage of digital checklist items completed correctly was 96% during the probe condition 3% decrement in paper performance after 2-3 months 4% decrement in digital performance after 23 months 32 Results Participant 01 - Items Completed Correctly Closed Data Points Represent Digital Checklist Open Data Points Represent Paper Checklist 70 60 Number of Checklist Items 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 Trials 13 14 15 16 17 18 19 20 21 22 33 Results Participant 02 - Items Completed Correctly Closed Data Points Represent Digital Checklist Open Data Points Represent Paper Checklist 70 60 Number of Checklsit Items 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 Trials 13 14 15 16 17 18 19 20 21 22 34 Results Participant 03 - Items Completed Correctly Closed Data Points Represent Digital Checklist Open Data Points Represent Paper Checklist 70 60 Number of Checklsit Items 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 Trials 13 14 15 16 17 18 19 20 21 22 35 Results Participant 04 - Items Completed Correctly Closed Data Points Represent Digital Checklist Open Data Points Represent Paper Checklist 70 60 Number of Checklist Items 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 Trials 13 14 15 16 17 18 19 20 21 22 36 Results Participant 05 - Items Completed Correctly Closed Data Points Represent Digital Checklist Open Data Points Represent Paper Checklist 70 60 Number of Checklist Items 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 Trials 13 14 15 16 17 18 19 20 21 22 37 Results Participant 06 - Items Completed Correctly Closed Data Points Represent Digital Checklist Open Data Points Represent Paper Checklist 70 60 Number of Checklist Items 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 Trials 13 14 15 16 17 18 19 20 21 22 38 Results-Paper Checklist Segments-Figure 6 During all flights using paper checklists, 2,451 total errors were observed The average percentage of paper checklist segment errors was highest for the normal takeoff segment (75%, range = 56.67% - 84.44%) The average percentage of paper checklist segment errors was lowest for the after landing segment (50%, range = 0% -100%) Please see Figure 6 hand out 39 Results-Digital Checklist Segments-Figure 7 During all flights using digital checklists, 2,562 total errors were observed The average percentage of digital checklist segment errors was highest for the normal takeoff segment (71%, range = 56.67% - 88.89%) The average percentage of digital checklist segment errors was lowest for the after landing segment (50%, range = 1.85% -100%) Please see Figure 7 hand out 40 Results-Figure 6 & 7 Summation Generally, the percentage of errors by flight segment for both paper and digital checklists varied across participants and flight segments Errors decreased considerably for all participants during intervention. Errors were very low during reversal Errors increased slightly after 60 days 41 Results-Paper Checklist Total Incorrect Items-Table 1 Percentages of incorrect items that are 50% or greater are shaded for each participant. Also, the checklist item name is shaded if the percentage of error was 50% or greater for four or more participants The highest frequency of errors, across nearly all participants occurred for nine items in the before take-off segment: ALTERNATOR, PITOT HEAT, NAV LIGHT, LANDING LIGHT, ANNUNCIATOR LIGHT, VOLTAGE, PIOTOT HEAT, NAV LIGHT, AND LANDING LIGHT. (AVG 86% n=9) Four items in the normal takeoff segment were problematic for nearly all participants: POWER LEVER, ENGINE PARAMETERS, BRAKES, ELEVATOR CONTROL. (AVG 92% n=4) Please see Table 1 hand out 42 Results-Digital Checklist Total Incorrect Items-Table 2 Percentages that are 50% or greater are shaded for each participant. Also, the checklist item name is shaded if the percentage of error was 50% or greater for four or more participants The highest frequency of errors, across nearly all participants occurred for the same nine items in the before take-off segment: ALTERNATOR, PITOT HEAT, NAV LIGHT, LANDING LIGHT, ANNUNCIATOR LIGHT, VOLTAGE, PITOT HEAT, NAV LIGHT, AND LANDING LIGHT. (AVG 91% n=9) Four items in the normal takeoff segment were problematic for nearly all participants: POWER LEVER, ENGINE PARAMETERS, BRAKES, ELEVATOR CONTROL. (AVG 84% n=4) Please see Table 2 hand out 43 Results-Paper Checklist Total Omitted Items-Table 3 Percentages of omitted items that are 50% or greater are shaded for each participant. Also, the checklist item name is shaded if the percentage of omission was 50% or greater for four or more participants There were no shaded omission errors for any checklist item either paper or digital There were no shaded omission errors for any participant either paper or digital No omitted item resulted in a crash or incident There were only a random selection of omission items across many segments that P1 (AVG 19%), P2 (AVG 13%), and P4 (AVG 18%) did not perform. Please see Table 3 hand out 44 Results-Digital Checklist Total Omitted Items-Table 4 Percentages of omitted digital checklist items that are 50% or greater are shaded for each participant. Also, the digital checklist item name is shaded if the percentage of omission was 50% or greater for four or more participants There were no shaded omission errors for any checklist item either paper or digital There were no shaded omission errors for any participant either paper or digital No omitted item resulted in a crash or incident P1 omitted a high percentage (AVG 24%) of items during the CRUISE, DESCENT, and AFTER LANDING segments P4 omitted a high percentage (24%) randomly across several segments P2 omitted a high percentage (15%) of the same items cited as done incorrectly in both the paper and digital during the before takeoff segment (ALTERNATOR, PITOT HEAT, NAV LIGHT, LANDING LIGHT, ANNUNCIATOR LIGHT, VOLTAGE, PITOT HEAT, NAV LIGHT, AND LANDING LIGHT) 45 Please see Table 4 hand out Results-Paper Checklist Segment Timing Errors in Baseline-Figure 8 Descent (43%) Before Landing (39%) Climb (38%) Please see Figure 8 hand out 46 Results-Digital Checklist Segment Timing Errors in Baseline-Figure 9 Before Landing (42%) Descent (39%) Climb (22%) Please see Figure 9 hand out 47 Individual Inferential Statistical Analysis General time-series intervention regression modeling Huitema and McKean (1998, 2000a,b) and McKnight, McKean, and Huitema (2000) Bootstrap based time-series regression method estimate parameters of individual’s behavior AVG Baseline, Beta2, Beta3, AVG remaining Intervention, AVG Reversal, AVG Probe 48 Beta Source Participant 04 - Items Completed Correctly Closed Data Points Represent Digital Checklist Open Data Points Represent Paper Checklist 70 60 Number of Checklist Items 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 Trials 13 14 15 16 17 18 19 20 21 22 49 Group Inferential Statistical Analysis The purpose of this analysis was to provide an overall evaluation of the effects of the interventions for the group of six pilots over time Previous time-series regression estimates parameters of individual’s behavior are used as DVs One sample T-test used to evaluate hypothesis (no difference in intervention or phase changes) Ho: AVG mean of Beta1 = 0 50 Checklist Inferential Statistical Analysis Difference between Paper & Digital raw scores Double bootstrap based time-series regression method to evaluate the hypothesis of zero difference between digital and paper feedback Based on an analysis not reported here, does not differ as a function of subject or phase. Non significant results Avg mean paper – Avg mean digital =1.18 paper (slight advantage) t=1.78 p=.0776 51 Recap of Results Both paper & digital checklist errors were reduced or eliminated during the intervention phase Omission errors were eliminated during reversal and probe Timing errors for both paper and digital segments were eliminated during reversal and probe Performance improvement maintained during reversal for both checklists Slight performance decrement during post study probe period for both checklists Dramatic individual intervention effects and phase changes for both paper & digital checklists Results are consistent for the group of participants over time Differences between paper and digital checklists are not statistically significant (can not reject the null hypothesis) 52 Possible Confounding Variables These variables could account for variability in pilot performance: Flight Training Device experience level Recency of flight experience Recency of flight in aircraft type Fatigue/stress Recent practice with paper or digital checklists 53 Limitations Limited timeline of semester Transferability to other simulator platforms Transferability to actual flight training Partial out intervention components (graph vs vocal praise) Only used “normal” workload during trials 54 Future Research Replicating the current study and increase workload conditions such as weather, airport traffic, and other distractions Determining how long term gains in checklist accuracy would continue in the absence of post-flight feedback and praise Determining how long term gains in checklist accuracy would continue by modifying the detail of post-flight feedback and praise of checklist items Replicating the current study and ascertaining whether checklist compliance transfers to actual flight Investigating the nature of the rule changes and whether accurate checklist use would generalize to actual flight Curricular modifications regarding formal checklist instruction and assessment 55 56