Human Error From Taking Risk to Running Risk

advertisement
Human Error
From Taking Risk to Running
Risk
Prof Patrick Hudson
Centre for Safety Studies
Department of Psychology
Leiden University
Introduction - Structure
• Two Types of Risk
• Case studies
– Piper Alpha & Herald of Free Enterprise
•
•
•
•
•
•
•
Human Error
The Organisational Accident Model
Examining the sources of risks
Case study DAL 39
Solutions to human error
What to look for
Conclusion
Where am I coming from?
• Psychology
– Why do people do what they do?
• Human error
– How can people get things so wrong?
• Oil and Gas industry, Aviation & Medicine
– Extremely high hazard industries
• The organisational model of accidents
– Reason’s Swiss Cheese Model
What is safety all about?
•
•
•
•
•
•
Preventing harm to people
Safeguarding assets
Protecting environment
Preserving reputation
If things didn’t go wrong it would be easy
Safety and profits are about risk management
Managing risks
• Safety is about managing risks to people, the
environment etc - what risks do you take?
• The alternative is to run the risks and hope for the
best - can we run the risks?
• What happens to companies that run risks?
– The best make profits, the worst go bankrupt
• So, we need to have a risk management process - we
need understanding of the types of risk and where
they come from
Risks
• We can distinguish two ways to approach risk
• We take a risk
– We can decide the return is worth it
• We run a risk
– We can become victims if things go wrong
• People who take risks are not always the same
as those who run them
Case Study
Piper Alpha
• A major disaster
• Changed the way Oil and Gas industry
operates
• Created the requirements for Safety
Management Systems and Safety Cases to
be ‘living system’ and ‘living document’
• Had legal effects as far as Australia
Piper Alpha
Piper Alpha Disaster
• In July 1987 the Piper Alpha platform was
destroyed with 167 fatalities
• The immediate cause was leaking gas
condensate
• The disaster was made worse by a total
failure of defences
• By 1990 Occidental was out of business in
the UK
The next morning
Why do accidents happen?
• Accidents are quite infrequent
• An accident is often seen as being caused by one
or more individuals
• But ---• In Piper alpha the major problems were the
platform design and the permit to work system
• Piper Alpha had also been audited and passed by
the regulator 7 days earlier
What were the risks?
• Many people died because they followed
procedures
• The platform management failed to provide
a safe workplace
• The regulator had failed to audit the system
Case Study
Herald of Free Enterprise
• Herald of Free Enterprise sank outside
Zeebrugge harbour
• The Assistant Bosun was asleep
• The bow doors were still open
• 186 people died
Herald of Free Enterprise
TRIMMING PROBLEM
SHIP HEAD DOWN
MANAGEMENT
HIGH BOW WAVE
NO CHECKING SYSTEM
15 MINUTES EARLIER
5 MINUTES LATE
ACCELERATION
CHIEF OFFICER
LEAVES G-DECK
CAPSIZE
MASTER ASSUMES
SHIP READY
LOADING OFFICER
DOOR PROBLEM
ASSISTANT
BOSUN
DOORS OPEN
BOSUN
ASSISTANT BOSUN ASLEEP
NO INDICATION
Herald Analysis
• The assistant bosun was overworked
• The masters had asked for indicators
• The management had refused on grounds of
cost
• A Townsend Thoreson vessel left Dover
with the bow doors open the next day!
Active vs Latent Failures
• Analysis of disasters indicates the need to
distinguish two types of human failure
• Active Failures - Errors and violations that
impact directly on the system and victims
• Latent Failures - Accidents waiting to
happen
From Error to Underlying Cause
Slips
Latent
Conditions
Planning
Design
Procedures
Unintended
Actions
Lapses
Unsafe
Acts
Decisions
Active Errors
Mistakes
Intended
Actions
Violations
Training
Planning
Communication
Accountability
Latent
Conditions
Types of risk
• The individuals making the active failures are
frequently running the risks
• Those accepting the latent failures are those who
have taken the original risk
• They expect that all will go well
• Weaknesses in the system allow problems to
happen
• The unsafe acts of individuals are the obvious
human errors - running risks
The Causes of Incidents
•
•
•
•
•
•
Triggers
Defences
Unsafe Acts
Preconditions
Underlying Causes
Decisions made
Immediate Causes
Underlying Causes
Why do Accidents Happen?
• Equipment
– Breakdowns
– Doesn’t work
• People
– Incompetence
– Sloppiness
– Risk Taking
• Organisation
– Allowing failures to propagate
– Accidents waiting to happen
Latent Conditions =
Underlying Causes
• Latent Conditions represent accidents waiting to happen
• Many problems are to be found. E.g.:
–
–
–
–
–
Poor procedures (Incorrect, unknown, out of date)
Bad design accepted
Commercial pressures not well balanced
Organisation incapable of supporting operation
Maintenance poorly scheduled
• Latent conditions make errors more likely or the consequences worse
• Individuals are the recipients of somebody else’s problems
• Taking a risk involves accepting latent conditions, running the risk involves
becoming a recipient of those problems
Classifying Latent Conditions
• We can group underlying causes - Whys
• Hows refer to the immediate causes
• Underlying causes refer to the organisational
level
• Concentrating on why means we no longer
concentrate upon individuals
• The categories are dependent upon what you are
going to do with the information
Preconditions
• The reasons why an individual or group may make
an error
• Preconditions influence the probability
• There are few effects of individual differences
(accident proneness does not exist)
• Preconditions that induce or make errors more
likely are the result of (failure to) control
• The question is: Why are the preconditions for
error present?
Preconditions II
•
•
•
•
•
•
•
•
Haste
Ignorance
Design
Unusual situations
Fatigue
Habit
“Strong but Wrong”
These are the symptoms of s deeper problem
Accident Causation Model
Fallible
Decisions
Latent
Conditions
Preconditions
Unsafe Acts
Defences
Local triggers
Environmental conditions
Reason’s Swiss cheese model of
accident causation
Some holes due
to active failures
Losses
Hazards
Other holes due to
latent conditions
Successive layers of defences, barriers, & safeguards
HSE Management
Hazard/
Risk
Taking risks
Barriers
or Controls
WORK
Running risks
Undesirable
outcome
Shell’s Bow-tie Concept
Events and
Circumstances
BARRIERS
H
A
Z
A
R
D
Harm to people and
damage to assets
or environment
Undesirable event with
potential for harm or damage
Engineering activities
Maintenance activities
Operations activities
C
O
N
S
E
Q
U
E
N
C
E
S
Case Study
DAL 39 Schiphol
• An example of multiple failures
• The criminal appeal found that the 3 Air Traffic
Controllers were guilty of an infringement
• There was no punishment (so no further appeal)
• Consider what the conventional and actual risks were
• Would you have spotted these?
• Would they appear in a conventional risk analysis?
DAL 39
• A Delta 76 aborted take-off at Amsterdam
Schiphol on discovering 747 being towed across
the runway
• Reduced visibility conditions (Phase - B)
• The tower controller was in training, under the
tower supervisor
• There was another trainee and of the 11 people in
the tower five were changing out to rest
• The incident happened between the inbound and
outbound morning peaks
DAL 39 continued
• The marshalling vehicle called in unexpectedly as
Charlie-8 with a towed KLM 747 from a parking
apron
• Radio communications were unclear and C-8 did not
state exactly where he was
• C-8 was given clearance
• The stopbar light control box confused everyone in
the tower (it was a new addition)
• The controller, thinking that the tow had crossed
successfully, gave DAL 39 clearance
• The DAL pilots saw the 747 and stopped in time
DAL 39 Initial Analysis
• Tow failed to report exact position or destination
• Tow not announced in advance (as per procedures for
phase B)
• Assistant ATCo believed tow from right to left (did not
know that a tunnel was in use)
• Controllers completely unfamiliar with new control box
• Ground radar pictures set up to cover different arrival and
departure runways meant tow not visible on one screen
• Controller was meshing the tow between both take-offs
and landings
• The tow, given clearance 1m 40 sec earlier, started off
once the stopbars went out
Why did all this happen - 1?
• Tow was in violation, but this appears to be routine
• No clear protocols for ground vehicles and no hazard
analysis
• Different language for aircraft (English) and ground
vehicles (Dutch)
• Poor quality of ground radio
• Clearances appeared to be unlimited once given
• Tower supervisor was also OTJ trainer in the middle
of the rush hour
• Altered control box not introduced to ATC staff
Why did all this happen - 2?
• No briefings about alterations at Schiphol (It has
been a building site for years)
• Too many trainees in the tower in rush hour under
low visibility conditions
• Differences in definition of low visibility between
aerodrome and ATC
• No management apparent of the change in use of
the S-Apron
• No operational audits by LVNL or Schiphol, of
practice as opposed to paper
• Schiphol designed requiring crossing and the use
of multiple runways for noise abatement reasons
The DAL 39 event scenario
Pilots see 747
and abort Routine violation
of tow
take-off
procedures
Tunnel brought into use
without briefings
Airport structure
Airport decides
to change
airport structure
Controller gives clearance
without assurance of tow
position
Tower combining training
and operations
during difficult periods
How can we manage errors?
• Risks refer to things that can go wrong
• Errors represent ways in which people can fail to
control the hazards
• An inspector/auditor should be looking at two
levels
– Are the standards being adhered to?
– Are the standards appropriate?
– Have any hazards been missed or managed
ineffectively?
Safety Management Cycle
Leadership and Commi tment
Policy and Strategic Objectives
PLAN
Organisation, Responsibilities
Resources, Standards & Documentation
Hazards and Effects
Management
DO
Planning and Procedures
Implementation
FEEDBACK
Corrective Action
Monitoring
Audit
Corrective Action
and Improvement
Management Review
Corrective Action
And Improvement
CHECK
Error Management
Avoid
Reduce
Learn
Identify
Support
Check
Error management and
inspection
• We can uncover problems from a wide range of
sources of information
–
–
–
–
Accidents
Near misses
History
Brainstorming
• We can see if the best control methods are being
applied
• If we leave everything to the individual we have
already created major problems
Error Management II
What
Why
How
Identify
Avoid
What
Why
Reduce
What
Why
How
Who
Where
When
Support
How
Who
Where
When
Check
Who
Where
When
Learn
What
Why
How
Who
Where
When
What happened here?
Safety Management and
Safety Culture
• The level of safety management is a
function of the organisational safety culture
• Individuals may do their best, but that may
not be enough
• Is the organisation organised and
systematic?
• Are they satisfied with their performance, or
do they feel they could do better?
The Evolution of Safety Culture
GENERATIVE
Increasing
Informedness
safety is how we do business
round here
PROACTIVE
we work on the problems that
we still find
CALCULATIVE
we have systems in place to
manage all hazards
REACTIVE
Safety is important, we do a lot
every time we have an accident
PATHOLOGICAL
who cares as long as we’re not
caught
Increasing
Trust &
Accountability
The Edge
The Edge
Normally Safe
Inherently
Safe
6%
No need
10%
Normally Safe
Safety Management Systems
Safety Culture
The Edge
15%
Return on
Capital
Invested
Conclusion
• When analysing risks you have to consider
the whole range
– From decisions to operate etc in certain ways
– To decisions to act in certain ways
• When inspecting you have to examine the
context, including yourself
Download