Using Systems Thinking to Improve Safety in Radiation Therapy Prof. Nancy G. Leveson Aeronautics and Astronautics Engineering Systems MIT To understand and prevent accidents, must consider system as a whole And so these men of Hindustan Disputed loud and long, Each in his own opinion Exceeding stiff and strong, Though each was partly in the right And all were in the wrong. John Godfrey Saxe (1816-1887) Facts about Accidents • Almost never have single causes – “Root cause seduction” – Accidents are complex processes • Usually involve flaws in – Engineered equipment – Operator behavior – Management decision making – Safety culture – Regulatory oversight Human Error as a Cause • ALL accidents are caused by “human error” (except “acts of God,” like hurricanes) • Almost always there is: – Operator “error” – Flawed management decision making – Flaws in the physical design of equipment – Safety culture problems – Regulatory deficiencies – Etc. Do Operators Really Cause Most Accidents? • When say human error, usually mean “operator error” • Hindsight bias • Operator error vs. design error Hindsight Bias (Sidney Dekker, 2009) “should have, could have, would have” Overcoming Hindsight Bias • Assume nobody comes to work to do a bad job. – Assume were doing reasonable things given the complexities, dilemmas, tradeoffs, and uncertainty surrounding them. – Simply finding and highlighting people’s mistakes explains nothing. – Saying what did not do or what should have done does not explain why they did what they did. • Investigation reports should explain – Why it made sense for people to do what they did rather than judging them for what they allegedly did wrong and – What changes will reduce likelihood of happening again? Fumbling for his recline button Ted unwittingly instigates a disaster Operator Error: Traditional View • Operator error is cause of most incidents and accidents • So do something about operator involved (suspend, retrain, admonish) • Or do something about operators in general – Marginalize them by putting in more automation – Rigidify their work by creating more rules and procedures Operator Error: System View • Operator error is a symptom, not a cause • All behavior affected by context (system) in which occurs • To do something about error, must look at system in which people work: – Design of equipment and interface with equipment – Usefulness of procedures – Existence of goal conflicts and production pressures – Etc. Mental Models Procedures • Cannot guarantee safety • Safety comes from people being skillful in judging when and how they apply. • Old view: Safety improvements come from organizations telling people to follow procedures and enforcing this. • New view: Safety improvements come from organizations monitoring and understanding the gap between procedures and practice. An Engineering View of Safety: Overview • Safety is a control problem – Accidents occur when the system design does not enforce constraints on safe behavior • Safety must be designed into a system – Engineering relies on modeling and analysis to analyze system design • Identify physical behavior that can lead to accidents • Identify where human errors are prone to happen • Design or redesign the system to prevent accidents Safety as a Control Problem • Goal: Design an effective control structure that eliminates or reduces adverse events. • Controls may be: – Physical design – Processes – Social (cultural, policy, individual self-interest) [Need more than just checklists] • Engineers use a proactive approach – Predict and manage adverse effects – Through modeling and analysis (identify scenarios that can lead to accidents) STAMP • A change in emphasis: “prevent failures” ↓ “enforce safety constraints on system behavior” • Losses (accidents) are the result of complex dynamic processes where the safety constraints are not enforced by the safety control structure • Most major accidents arise from a slow migration of the entire system toward a state of high-risk – Need to control and detect this migration © Copyright Nancy Leveson, Aug. 2006 Example Safety Control Structure Every Controller Contains a Process Model Controller Model of Process Control Actions Accidents occur when model of process is inconsistent with real state of process and controller provides inadequate control actions Feedback Controlled Process Feedback channels are critical -- Design -- Operation A Systems Engineering Approach to Radiation Therapy Safety • Identify system hazards • Establish system safety requirements to reduce the occurrence and/or consequences of hazards • Ensure they are enforced by or implemented in the safety control structure • Do a hazard analysis – Identify unsafe control actions – Identify the causes of the unsafe control actions • Eliminate or control hazards • Establish risk management controls and procedures Identify Accidents and Hazards Accident: – Injury or death of patient related to treatment – Injury or death of staff or visitor – Equipment damaged – Environmental damage? Hazard H1. Overdose H2. Underdose H3. Inadequate fractioning H4. Non-patient exposure to radiation H5. Equipment stress Highest Level Safety Constraint (Requirement) “Process of care” must not be compromised – Patients, staff, and visitors must not be exposed to an unhealthy dose of radiation – Equipment must not be stressed beyond documented design limits RT Hazards and Safety Constraints H1: Patient tissues receive more dose than clinically desirable SC1: The system must be able to prevent delivery of higher than clinically desirable dose H2: Patient tumor receives less dose than clinically desirable SC2: The system must be able to deliver sufficient dose to treat the tumor. H3: Patient treatment is improperly fractioned SC3: Each fraction must not exceed more than TBD Gy and must not be delivered TBD’ hours after the previous one without treatment plan being reevaluated. H4: Non-patient (esp. personnel) is unnecessarily exposed to radiation SC4: The system must be able to prevent unnecessary exposure of personnel and non-patients to radiation H5: Equipment is subject to unnecessary stress SC5: The system must prevent excessive equipment exposure to radiation. System Safety Requirements • A complete, formally documented, effective, and safe clinical treatment plan must be created for each patient undergoing radiation treatment. A radiation oncologist must select and formally approve the plan ultimately selected for treatment. – Changes to the treatment plan must be evaluated and approved by a radiation oncologist. – Standard operating processes must be provided that have been evaluated for safety and effectiveness. If SOPs are tailored, they must be evaluated and approved by – Immobilization treatment devices must provide accurate treatment delivery and must not restrict the treatment techniques. – Radiation safety guidelines (ACR/ASTRO and NRC) must be followed when therapy uses unencapsulated radionuclides. – Dosimetry treatment plan must administer intended dose of radiation to the target volume while minimizing radiation exposure to normal tissues. – A pretreatment quality assurance program must be in place and followed for every patient. The QA program must provide for checking the accuracy of both the dose calculation and the data used for treatment. System Safety Requirements • Verification and documentation of accuracy of treatment delivery (conforms to original or latest clinical and dosimetric plans) must be provided (includes management of organ movement). • Modification of initial treatment plan (to adjust for changes) must be approved by radiation oncologist. • Equipment must be calibrated and maintained according to AAPM guidelines and applicable state and federal regulations concerning radiation treatment delivery technology. – Procedures must be created and followed to ensure that any possible sign of impending machine malfunction is quickly recognized and diagnosed and any necessary corrective or reparative action is taken prior to use of the machine to deliver a clinical treatment to patient. – All radioactive sources must be carefully controlled and monitored at least to the extent required by regulatory agencies. • Radiation oncologist, along with other members of the team, must review and manage ongoing treatment to ensure that it is effective and safe. System Safety Requirements • Follow-up evaluation and care must be provided to manage acute and chronic morbidity resulting from treatment. – A process must be established to monitor for unexpected morbidity, tumor relapse, [etc.], to identify any possible safety problems during treatment and to identify measures that might reduce the risk of toxicity for future patients. All suspicious findings must be thoroughly investigated and resolved. • [Patients must receive an appropriate level of medical, emotional, and psychological care during and after treatment] • Staff must be protected from accidental radiation exposure. • Appropriate arrangements must be made for emergency patients. • Procedures must be evaluated periodically to ensure they are being followed and, if not, then determine why. Use the information to improve the procedures. • All emergency equipment and safety devices must be operational at all times during hazardous operations. System Safety Requirements • Management of change procedures must include hazard analysis for any planned change to individual treatment plans and to the facility itself including any safety-critical equipment. • Procedures must be in place to identify and remediate any unplanned changes over time to behavior within the system or within its environment that can affect system hazards. • Reporting systems must be created that follow Just Culture principles. – Leadership must make all staff feel comfortable (and rewarded) for raising safety concerns without fear of reprimand or reprisal. – All members of the team must be empowered to be active participants in improving the safety of clinical processes. – Trends and migration toward states of higher risk must be identified and effective procedures created to disseminate this information to all staff and to provide corrective measures. System Safety Requirements • Procedures must be in place to identify and investigate thoroughly all serious or potentially serious incidents. Recommendations must be implemented to eliminate or mitigate all identified factors contributing to the adverse events. Follow-up must be provided to ensure that recommendations have been implemented and are effective. Lessons learned must be documented and disseminated. • A process must be established to evaluate the safety (identify hazards) associated with any equipment purchased from vendors or created in the hospital. Thoroughness and quality of the vendor’s hazard analysis and design for safety must be a major criterion in selecting a vendor. A two-way communication channel must be established to provide on-going communication about errors, incidents, and potential hazards. System Safety Requirements • The hospital must establish the safety of integrated systems purchased from multiple vendors and the introduction of new equipment into the total clinical environment. Sophisticated hazard analysis methods must be used to identify potential safety concerns about individual equipment or the integrated equipment and operational environment. – Hazard logs must be created and maintained and used in the investigation of adverse events and in periodic performance audits to ensure that hazards are being adequately controlled and that the staff is sufficiently educated about the hazards involved in their job. Leading indicators of increasing risk must be identified and monitored. – All staff must be educated on the hazards of their job responsibilities and the equipment they operate – Hazard analysis must include the analysis of human–automation interaction. Design methods must be used to minimize any potential human errors and HMI hazards, including investigating and reducing the frequency of spurious alarms and providing error messages and indications of safe operating limits for any potentially hazardous operation. System Safety Requirements • Safety-related decisions must be independent from cost and efficiency concerns. Conflicts must be identified and transparent resolution procedures created and followed to resolve any conflicts. • The hospital must have a documented safety policy and a documented safety management plan. This policy and management plan must be periodically reviewed and updated and communicated to staff. Conformance with the safety policy and safety management plan must be monitored. • The hospital must create and maintain a comprehensive safety information system. • Robust feedback channels must be provided to enhance risk awareness to those with responsibilities related to the safety of the patients and safety. Trace System Safety Requirements to Safety Control Structure • All personnel must have and maintain a minimum level of knowledge and training. [pointers to Board Certification Process, Education process including continuing education, responsible official for training at UCSD, each staff component] [includes initial orientation, education, credentialing, continuing education, and periodic evaluation] • Completion of any component of care must be appropriately documented in the patient record • Patients must be evaluated to determine if treatment is recommended – Patient evaluation must be conducted by a qualified physician in consultation with other team members. High-Level Control Structure for Gantry 2 Treatment Definition Therapeutic Requirements 1. Treatment Specifications QA results (fraction definition, Patient physionomy target positioning information, change steering file) 2. Capability Upgrade Requests Treatment Delivery (delayed) Patient health outcome Patient Preparation Patient well-being Beam Creation and Delivery Patient physiognomy changes Patient Treatment Definition – D1 Zooming into Treatment Definition Tumor Board Approve patient Request therapy slot for patient Medical Doctor Define tumor volume Specify treatment doses Approve treatment plan Propose treatment plan (delayed) Cure evaluation Prognosis Medical Physicist Define field direction Combine CT and MRI images Calculate dose distribution Treatment Planning Software Map body Imaging Facility (CT/MRI) Define fields (direction, energy, intensity) Steering File Generator Capability upgrade requests Steering file with treatment specification (fraction definition, patient positioning information, beam properties) QA results Patient physiognomy changes Treatment Delivery – D0 Patient Position Beam Creation and Delivery Patient well being Patient physiognomy changes Patient Zooming into Treatment Delivery Treatment Definition – D0 Capability upgrade requests (delayed) Cure evaluation Prognosis QA results Treatment specifications (fraction definition, patient positioning information, beam characteristics) PROSCAN Design Team Treatment Delivery – D1 Problem reports Incidents Change requests Performance audits Operations Management Revised operating procedures Work orders Problem reports Resources Change requests Software revisions Hardware modifications Maintenance Hardware Test replacements results Procedures Problem reports Change requests Operators Procedures Room clear Problem reports Change requests Medical Team Start treatment QA results Patient position Interrupt treatment Sensor info Interrupt treatment Position Movement PROSCAN facility (physical actuators and sensors, automated controllers) Patient Position Beam Creation and Delivery Panic button Patient Patient well being Patient physiognomy changes Patient position Beam and Patient Alignment Control Treatment Definition – D0 Patient list, Procedures Treatment Report Treatment Report Operation Management Patient list, Procedures Patient list, Treatment Procedures Report Steering file Treatment Report Medical Team Local Operator Choice of Steering file Manual Corrections Treatment Delivery – D2 Steering File Application Progress System Status Positioning Offsets Steering File Application Progress TCS Beam location at detector Setpoint Sweeper magnets Status GPPS Strip chambers Beam in Gantry referential Loop4 Gantry + Table Position Loop1 Gantry + Table Motors Patient Position on Table Patient & Fixation devices selection Loop2 Loop3 Encoders, Potentiometers Gantry + Table in Room referential Gantry + Table Position Encoders, Potentiometers Medical Team + Patient CT imaging Patient position on Table Process Attributes Beam & Patient alignment Process model Process variable Possible values Comment Personnel close not close "close" is to be understood as "potentially leading to detrimental radiation exposure". "personnel" is to be understood as "non-patient" (can include visitors, family members...) "close" = close to beamline or inside the treatment room "not close" = not close to beamline and outside of the treatment room Patient readiness (esp. position) no patient ready not ready “Ready”: patient is in treatment room, at treatment point, Treatment plan right ID wrong none in the correct position and ready for dose delivery "Not ready": patient is in treatment room, but not ready for dose delivery (e.g. incorrect position) "Right"/"Wrong" refer to whether the correct treatment plan has been selected and loaded for the patient awaiting treatment. "None": no treatment plan has been loaded Equipment readiness ready not ready Mastership status "Not master": other areas have the power to control master not master beamline elements Facility mode therapy non therapy "Ready"/"Not ready": with respect to treatment start "Therapy": facility is configured for patient treatment application. All the patient safety and machine interlocks are enabled and remote operator control is disabled. Example: Operator Starting Treatment Previous progress information Daily plan and updates Operator Nurse • System hazards: … • Controller: Area Operator • Control actions: 1.2 Load steering file 1.3 Start treatment Beam characteristics Treatment progress Therapy Delivery System – Load steering file – Start treatment Configuration Beam characteristics Actuator settings Treatment progress Well-being • STPA Step 1: identify unsafe control actions • STPA Step 2: identify unsafe scenarios that lead to the unsafe control actions Beamline controllers Status Beamline actuators Status Beamline sensors Irradiation at patient Previous progress information Daily plan and updates Patient controller Label controlled process Example Unsafe Control Actions (1) • Treatment is started while personnel are in room (↑H-R4) • Treatment is started while patient is not ready to receive treatment (↑H-R1, H-R2 Note: This includes “wrong patient position”, “patient feeling unwell”, etc. • Treatment is started when there is no patient at the treatment point (↑H-R2, H-R3) • Treatment is started with the wrong treatment plan (↑H-R1,H-R2) • Treatment is started without a treatment plan having been loaded (↑H-R1,H-R2) Example Unsafe Control Actions (2) • Treatment is started while the beamline is not ready to receive the beam (↑H-R1, H-R5) • Treatment is started while not having mastership (↑H-R1, H-R2, H-R4) • Treatment is started while facility is in non-treatment mode (e.g. experiment or trouble shooting mode) (↑H-R1, H-R2) • Treatment start command is issued after treatment has already started (↑H-R1, H-R2) • Treatment start command is issued after treatment has been interrupted and without the interruption having adequately been recorded or accounted for (↑H-R1, H-R2) • Treatment does not start while everything else is otherwise ready (↑H-R1, H-R2) Hazard Causal Scenarios (Causes of Unsafe Control Actions) UCA4: Treatment is started with wrong treatment plan 1. 2. 3. 4. 5. 6. (missing input) – no treatment file available and TDS loads previously used one (wrong input) – error in treatment planning and treatment file is incorrect (wrong input) – operator loads file from previous fraction (distorted transmission) – changes to daily plan not correctly communicated/understood by operator (actuator failure) – GUI fails to transmit the new steering file and TDS uses previously loaded one …. also: inadequate feedback (sensor failure, wrong sensor calibration, …), external perturbations etc. Causal Scenarios • Scenario 1 - Operator was expecting patient to have been positioned, but table positioning was delayed compared to plan (e.g. because of delays in patient preparation or patient transfer to treatment area; because of unexpected delays in beam availability or technical issues being processed by other personnel without proper communication with the operator). • Controls: – Provide operator with direct visual feedback to the gantry coupling point, and require check that patient has been positioned before starting treatment (M1). – Provide a physical interlock that prevents beam-on unless table positioned according to plan Example Causal Scenarios (2) • Scenario 2 - operator is asked to turn the beam on outside of a treatment sequence (e.g. because the design team wants to troubleshoot a problem) but inadvertently starts treatment and does not realize that the facility proceeds with reading the treatment plan. • Controls: – Reduce the likelihood that non-treatment activities have access to treatment related input by creating a non-treatment mode to be used for QA and experiments, during which facility does not read treatment plans that may have been previously been loaded (M2); – Make procedures (including button design if pushing a button is what starts treatment) to start treatment sufficiently different from non-treatment beam on procedures that the confusion is unlikely. Organizational Aspects of Risk • Example so far focuses on physical level • Also requirements and control responsibilities at management level to satisfy system safety requirements • Can identify unsafe control actions and causal scenarios at higher levels of the control structure (perform a risk analysis) and build in controls to prevent them • Behavior and control structures change over time – Prevent migration to higher levels of risk – Detect when occurs © Copyright Nancy Leveson, Aug. 2006 Additional information in: Nancy Leveson, Engineering a Safer World: (Systems Thinking Applied to Safety) MIT Press, January 2012