Human Error From Taking Risk to Running Risk Prof Patrick Hudson Centre for Safety Studies Department of Psychology Leiden University Introduction - Structure • Two Types of Risk • Case studies – Piper Alpha & Herald of Free Enterprise • • • • • • • Human Error The Organisational Accident Model Examining the sources of risks Case study DAL 39 Solutions to human error What to look for Conclusion Where am I coming from? • Psychology – Why do people do what they do? • Human error – How can people get things so wrong? • Oil and Gas industry, Aviation & Medicine – Extremely high hazard industries • The organisational model of accidents – Reason’s Swiss Cheese Model What is safety all about? • • • • • • Preventing harm to people Safeguarding assets Protecting environment Preserving reputation If things didn’t go wrong it would be easy Safety and profits are about risk management Managing risks • Safety is about managing risks to people, the environment etc - what risks do you take? • The alternative is to run the risks and hope for the best - can we run the risks? • What happens to companies that run risks? – The best make profits, the worst go bankrupt • So, we need to have a risk management process - we need understanding of the types of risk and where they come from Risks • We can distinguish two ways to approach risk • We take a risk – We can decide the return is worth it • We run a risk – We can become victims if things go wrong • People who take risks are not always the same as those who run them Case Study Piper Alpha • A major disaster • Changed the way Oil and Gas industry operates • Created the requirements for Safety Management Systems and Safety Cases to be ‘living system’ and ‘living document’ • Had legal effects as far as Australia Piper Alpha Piper Alpha Disaster • In July 1987 the Piper Alpha platform was destroyed with 167 fatalities • The immediate cause was leaking gas condensate • The disaster was made worse by a total failure of defences • By 1990 Occidental was out of business in the UK The next morning Why do accidents happen? • Accidents are quite infrequent • An accident is often seen as being caused by one or more individuals • But ---• In Piper alpha the major problems were the platform design and the permit to work system • Piper Alpha had also been audited and passed by the regulator 7 days earlier What were the risks? • Many people died because they followed procedures • The platform management failed to provide a safe workplace • The regulator had failed to audit the system Case Study Herald of Free Enterprise • Herald of Free Enterprise sank outside Zeebrugge harbour • The Assistant Bosun was asleep • The bow doors were still open • 186 people died Herald of Free Enterprise TRIMMING PROBLEM SHIP HEAD DOWN MANAGEMENT HIGH BOW WAVE NO CHECKING SYSTEM 15 MINUTES EARLIER 5 MINUTES LATE ACCELERATION CHIEF OFFICER LEAVES G-DECK CAPSIZE MASTER ASSUMES SHIP READY LOADING OFFICER DOOR PROBLEM ASSISTANT BOSUN DOORS OPEN BOSUN ASSISTANT BOSUN ASLEEP NO INDICATION Herald Analysis • The assistant bosun was overworked • The masters had asked for indicators • The management had refused on grounds of cost • A Townsend Thoreson vessel left Dover with the bow doors open the next day! Active vs Latent Failures • Analysis of disasters indicates the need to distinguish two types of human failure • Active Failures - Errors and violations that impact directly on the system and victims • Latent Failures - Accidents waiting to happen From Error to Underlying Cause Slips Latent Conditions Planning Design Procedures Unintended Actions Lapses Unsafe Acts Decisions Active Errors Mistakes Intended Actions Violations Training Planning Communication Accountability Latent Conditions Types of risk • The individuals making the active failures are frequently running the risks • Those accepting the latent failures are those who have taken the original risk • They expect that all will go well • Weaknesses in the system allow problems to happen • The unsafe acts of individuals are the obvious human errors - running risks The Causes of Incidents • • • • • • Triggers Defences Unsafe Acts Preconditions Underlying Causes Decisions made Immediate Causes Underlying Causes Why do Accidents Happen? • Equipment – Breakdowns – Doesn’t work • People – Incompetence – Sloppiness – Risk Taking • Organisation – Allowing failures to propagate – Accidents waiting to happen Latent Conditions = Underlying Causes • Latent Conditions represent accidents waiting to happen • Many problems are to be found. E.g.: – – – – – Poor procedures (Incorrect, unknown, out of date) Bad design accepted Commercial pressures not well balanced Organisation incapable of supporting operation Maintenance poorly scheduled • Latent conditions make errors more likely or the consequences worse • Individuals are the recipients of somebody else’s problems • Taking a risk involves accepting latent conditions, running the risk involves becoming a recipient of those problems Classifying Latent Conditions • We can group underlying causes - Whys • Hows refer to the immediate causes • Underlying causes refer to the organisational level • Concentrating on why means we no longer concentrate upon individuals • The categories are dependent upon what you are going to do with the information Preconditions • The reasons why an individual or group may make an error • Preconditions influence the probability • There are few effects of individual differences (accident proneness does not exist) • Preconditions that induce or make errors more likely are the result of (failure to) control • The question is: Why are the preconditions for error present? Preconditions II • • • • • • • • Haste Ignorance Design Unusual situations Fatigue Habit “Strong but Wrong” These are the symptoms of s deeper problem Accident Causation Model Fallible Decisions Latent Conditions Preconditions Unsafe Acts Defences Local triggers Environmental conditions Reason’s Swiss cheese model of accident causation Some holes due to active failures Losses Hazards Other holes due to latent conditions Successive layers of defences, barriers, & safeguards HSE Management Hazard/ Risk Taking risks Barriers or Controls WORK Running risks Undesirable outcome Shell’s Bow-tie Concept Events and Circumstances BARRIERS H A Z A R D Harm to people and damage to assets or environment Undesirable event with potential for harm or damage Engineering activities Maintenance activities Operations activities C O N S E Q U E N C E S Case Study DAL 39 Schiphol • An example of multiple failures • The criminal appeal found that the 3 Air Traffic Controllers were guilty of an infringement • There was no punishment (so no further appeal) • Consider what the conventional and actual risks were • Would you have spotted these? • Would they appear in a conventional risk analysis? DAL 39 • A Delta 76 aborted take-off at Amsterdam Schiphol on discovering 747 being towed across the runway • Reduced visibility conditions (Phase - B) • The tower controller was in training, under the tower supervisor • There was another trainee and of the 11 people in the tower five were changing out to rest • The incident happened between the inbound and outbound morning peaks DAL 39 continued • The marshalling vehicle called in unexpectedly as Charlie-8 with a towed KLM 747 from a parking apron • Radio communications were unclear and C-8 did not state exactly where he was • C-8 was given clearance • The stopbar light control box confused everyone in the tower (it was a new addition) • The controller, thinking that the tow had crossed successfully, gave DAL 39 clearance • The DAL pilots saw the 747 and stopped in time DAL 39 Initial Analysis • Tow failed to report exact position or destination • Tow not announced in advance (as per procedures for phase B) • Assistant ATCo believed tow from right to left (did not know that a tunnel was in use) • Controllers completely unfamiliar with new control box • Ground radar pictures set up to cover different arrival and departure runways meant tow not visible on one screen • Controller was meshing the tow between both take-offs and landings • The tow, given clearance 1m 40 sec earlier, started off once the stopbars went out Why did all this happen - 1? • Tow was in violation, but this appears to be routine • No clear protocols for ground vehicles and no hazard analysis • Different language for aircraft (English) and ground vehicles (Dutch) • Poor quality of ground radio • Clearances appeared to be unlimited once given • Tower supervisor was also OTJ trainer in the middle of the rush hour • Altered control box not introduced to ATC staff Why did all this happen - 2? • No briefings about alterations at Schiphol (It has been a building site for years) • Too many trainees in the tower in rush hour under low visibility conditions • Differences in definition of low visibility between aerodrome and ATC • No management apparent of the change in use of the S-Apron • No operational audits by LVNL or Schiphol, of practice as opposed to paper • Schiphol designed requiring crossing and the use of multiple runways for noise abatement reasons The DAL 39 event scenario Pilots see 747 and abort Routine violation of tow take-off procedures Tunnel brought into use without briefings Airport structure Airport decides to change airport structure Controller gives clearance without assurance of tow position Tower combining training and operations during difficult periods How can we manage errors? • Risks refer to things that can go wrong • Errors represent ways in which people can fail to control the hazards • An inspector/auditor should be looking at two levels – Are the standards being adhered to? – Are the standards appropriate? – Have any hazards been missed or managed ineffectively? Safety Management Cycle Leadership and Commi tment Policy and Strategic Objectives PLAN Organisation, Responsibilities Resources, Standards & Documentation Hazards and Effects Management DO Planning and Procedures Implementation FEEDBACK Corrective Action Monitoring Audit Corrective Action and Improvement Management Review Corrective Action And Improvement CHECK Error Management Avoid Reduce Learn Identify Support Check Error management and inspection • We can uncover problems from a wide range of sources of information – – – – Accidents Near misses History Brainstorming • We can see if the best control methods are being applied • If we leave everything to the individual we have already created major problems Error Management II What Why How Identify Avoid What Why Reduce What Why How Who Where When Support How Who Where When Check Who Where When Learn What Why How Who Where When What happened here? Safety Management and Safety Culture • The level of safety management is a function of the organisational safety culture • Individuals may do their best, but that may not be enough • Is the organisation organised and systematic? • Are they satisfied with their performance, or do they feel they could do better? The Evolution of Safety Culture GENERATIVE Increasing Informedness safety is how we do business round here PROACTIVE we work on the problems that we still find CALCULATIVE we have systems in place to manage all hazards REACTIVE Safety is important, we do a lot every time we have an accident PATHOLOGICAL who cares as long as we’re not caught Increasing Trust & Accountability The Edge The Edge Normally Safe Inherently Safe 6% No need 10% Normally Safe Safety Management Systems Safety Culture The Edge 15% Return on Capital Invested Conclusion • When analysing risks you have to consider the whole range – From decisions to operate etc in certain ways – To decisions to act in certain ways • When inspecting you have to examine the context, including yourself