Introduction to High Reliability Organizations Ralph T. Soule What is Reliability? • “Reliability depends on the lack of unwanted, unanticipated, and unexplainable variance in performance” – Eric Hollnagel, 1993 Examples of High Reliability Organizations • • • • • • • • Nuclear power-generation plants Naval aircraft carriers Chemical production plants Offshore drilling rigs Air traffic control systems Incident command teams Wild land firefighting crews Hospital ER/Intensive care units Low Reliability Example of Low Reliability:Tenerife • Precursors – Small airport, overtaxed air controllers, speaking English, not accustomed to large planes, visibility obscured by fog – KLM Pilot in a hurry to complete mission before crew expended, spends most of his time running simulators as an instructor, not challenged by crew – Too much information passed too fast, speakers “stepping” on each other 1-The planes are parked at the end of runway 12 with the KLM in front of the Pan Am. 2-The KLM has made it to the end of runway 30 and is ready for takeoff. 3-The Pan Am has passed the first taxiway. 4-The Pan Am has passed the second taxiway. 5-The Pan Am has missed the third taxiway where it is supposed to exit. The KLM begins to takeoff. 6-The Pan Am tries to get off the runway but is hit by the KLM. 583 people die. “Swiss cheese” model of accident causation Reason, J. BMJ 2000;320:768-770 Copyright ©2000 BMJ Publishing Group Ltd. So Why Do Smart People Do Dangerous Things? • Excessive Workload – Physical and cognitive effort involved in task performance. • Lack of Situation Awareness – What’s going on? – What’s likely to happen next? – What will happen if I take a particular action? • Excessive Stress, Fatigue, Uncertainty, etc. – Impacts perceptual-motor performance, decisionmaking, etc. So Why Do Smart People Do Dangerous Things? • Diminished Attention – Too much to attend to at once (overload) – Too little to attend to for too long (underload) • Poor Teamwork and Communication – Often due to poor layout of work space and/or poor layout of command and communication structure Common reasons why defensive weaknesses not detected and repaired • People involved tend to forget to be afraid • Bad events rare in well-defended systems, few people have direct experience with them Common reasons why defensive weaknesses not detected and repaired • Production demands are immediate, attention grabbing whereas safe operations generate a constant – and hence relatively uninteresting – non-event outcome – Reliability is invisible - nothing much to pay attention to – If people see nothing happening, easy to presume nothing is happening • If nothing is happening and they continue to operate has they have, they get fooled into thinking nothing will continue to happen Dangerous Defenses • Defense in depth – Redundancy makes systems more complex • Since some types of errors are not immediately recognizable, operators/maintainers will not learn from them as readily • Accumulated errors can allow ‘holes’ in defensive systems to line up and permit the passage of an accident trajectory – Unforeseen common mode failures can make it possible for errors to affect more than one layer simultaneously – Changing procedures to prevent the “last” problem can make the procedures more complex and introduce unforeseen failure modes Common reasons why defensive weaknesses not detected and repaired • Conclusion – If eternal vigilance is the price of freedom, then chronic unease is the price of safety – HROs are hard to sustain when the thing about which one is uneasy either has not happened or happened a long time ago – Accidents do not occur because people gamble and lose, they occur because people do not believe that the accident that is about to occur is possible – High Reliability • Operators/Managers of High Reliability Orgs assume that each day could be a bad day and act accordingly. Regarding safety information, they – – – – – Actively seek it Messengers are trained and rewarded Responsibility is shared Failures lead to far-reaching reforms New ideas are welcomed 5 Habits of Highly Reliable Organizations • Don't be tricked by your success – Preoccupation with failure • Defer to experts on the front line – Sensitivity to operations • Let the unexpected circumstances provide your solution – Resilience means having a steady head • Embrace complexity – Reluctance to simplify • Anticipate -- but also anticipate your limits – You can prepare for unexpected events, within limits Characteristics of HROs • Safety oriented culture • Operations are a team effort • Communications are highly valued • Always prepared for the unexpected • Multidisciplinary review of near- misses and adverse outcomes Characteristics of HROs • Strong, well defined leadership • safety oriented culture • operations are a team effort • flattened hierarchy for safety • communications revered • emergencies and the unexpected are rehearsed • place value on the “near miss” • multidisciplinary review • top brass (senior management) endorses • substantial budget devoted to training Characteristics of HROs • Safety oriented culture • Operations are a team effort How Different Organizations Handle Safety Information • Pathologic Culture Don’t want to know, messengers are shot, responsibility is shirked, failure is punished or concealed, new ideas are actively discouraged • Bureaucratic Culture May not find out, messengers are listened to... if they arrive, responsibility is compartmentalized, failures lead to local repairs, new ideas often present problems • Generative Culture Actively seek it, messengers are trained and rewarded, responsibility is shared, failures lead to far-reaching reforms, new ideas are welcomed Tools • Commander's Intent – Intended to help people read the your mind if they run into uncertainty about how to carry out the orders under “field” conditions. – – – – – • Situation (Here is what I think we face). Task (Here is what I think we should do). Intent (Here is why) Concerns (Here is what we should keep our eye on) Calibration (Now, talk to me (about your questions and concerns)) Pre-mortems – Tool for anticipating the unexpected Summary • High reliability is a process, not a result • Safety is a dynamic non-event • Error management has two components: – Error reduction and – Error containment • Engineering a Safety Culture – – – – reporting culture -> providing the data for navigation just culture -> trust, deciding when there is fault flexible culture -> adjust learning culture -> plan, do, check, act • If eternal vigilance is the price of freedom, then chronic unease is the price of safety References • The Logic of Failure, Dietrich Dorner • Intuition at Work, Gary Klein • The Vulnerable System: An Analysis of the Tenerife Air Disaster, Karl E. Weick • The 1996 Mount Everest climbing disaster: The breakdown of learning in teams, D. Christopher Kayes, Into Thin Air, Jon Krakauer, Outside Online • Managing the Risks of Organizational Accidents, James Reason • Managing the Unexpected, Weick and Suttcliff • Inviting Disaster, James Chiles • http://www.highreliability.org/