80% Human Error 30% Individual 20% Equipment Failures Human

advertisement
A Project to Develop a Handbook of DOE Operational Safety Event and Accident
Investigation Techniques
Concept White Paper
D. Pegram and E. Carnes
Office of Health Safety and Security (HSS), Office of Corporate Safety Programs HS-31
Accident Prevention and Investigation Program
October 15, 2010
The objective of this project is to convert the existing DOE AI “Workbook” into a DOE
Technical Standard as defined in DOE-TSPP-5, dated July 1, 2009, and update the material to
include current thinking, methods, and approaches for analysis and the conduct of investigations.
The DOE AI Workbook has not been updated since 1999.
To accomplish this HS-31 has formed a task team composed of HSS Subject Matter Experts,
EFCOG Contractor Subject Matter Experts, and DOE Program Office’s Accident Investigation
Points of Contact.
This white paper addresses the overall concepts to be introduced, approach, and focus of the
Handbook on: Accident Prevention; Highly Reliable Organizations (HRO)/Organizational
Learning and, Human Performance Improvement (HPI).
This activity supports the establishment of a DOE Community of Interest that promotes the role
of stakeholders in owning, developing, and using the “Handbook” as a living document, to be
updated as knowledge matures from using the techniques.
Why Adopt the Principles of a High Reliability Organization (HRO) and Employ
Operational Safety Event and Accident Analysis?
“It’s impossible to solve significant problems using the same level of knowledge that created
them!” A. Einstein.
A key characteristic of successful technology centered organizations is their ability to learn and
improve. The importance of quality organizational learning has been researched and written
about by W. Edwards Deming, Peter Senge and a host of others. The Integrated Safety
Management Feedback and Improvement function envisions organizations that are continually
monitoring performance, identifying deviations or questionable conditions, self assessing and
using quality analysis to improve.
1
Operational Safety Event and Accident Analysis
Is a process using a set of tools that have been built through consensus, academic disciplines,
industry, application and practice, that have been found valuable in determining safety
management systems and human performance conditions, deficiencies, and related causal
factors, that can be utilized to prevent accidents.
Organizational/Operational Safety analysis is an essential component of learning as are learning
from accidents and near misses. We often call this activity self-assessment, a familiar process
within DOE. Highly reliable organizations seek to learn from non-consequential deviations from
expectations. Treating small, unexpected things as possible precursors of declining conditions is
a characteristic of organizations that excel.
These organizations develop learning capabilities and learning analytical tools that span the full
range of operational performance from normal operations to accidents and all points in between.
DOE has compiled a suite of analytical tools and years of experience in accident investigation.
DOE’s accident investigation program and tools have served as models for many other
government accident investigation programs. The broader organizational learning applicability
of the investigation tools and their theoretical underpinnings is not however widely understood
within DOE.
The proposed new technical standard intends to bring together the strengths of the existing AI
workbook, experiences gained in accident investigation and analyses of lower level events, and
performance improvement insights from HRO’s and the DOE HPI applications into a single
reference document. The intent is to communicate how these tools and concepts may be used to
self assess and improve across the full operational performance spectrum.
The High Reliability Organization (HRO)
A 2009 study of nuclear power organizational evaluating stresses key aspects of organizational
learning in safety critical domains; typical self assessments “…do not pay enough attention to the
low probability high consequence incidents. Those “obscure hazards” should be identified better
since they are also the probable causes of accidents after the more easily observed high
probability hazards have been controlled.
This requires going behind the surface levels and analyzing the hazards the organization initially
considers not significant. It also requires a good understanding of the technical features of the
systems as well as the social system (creating hazards through human action or inaction).
Further, going beyond the surface level of the organization requires adequate evaluation tools
combined with an ability to use them correctly.” (Teemu Reiman & Pia Oedewald: Evaluating
Safety Critical Organizations, VTT, 2009)
Organizational learning is a core competency of what are known as high reliability organizations
(HROs). Research on HRO’s was introduced into DOE ISM work through the 2004 DNFSB
2
document TECH 35 DNFSB/TECH-35, Safety Management of Complex, High-Hazard
Organizations. The DOE ISM Manual also discusses what DOE might learn from HROs;
“Experience and research with safety cultures and high-reliability organizations (HRO) over the
past ten or more years have raised new insights and deeper understanding relevant to the desired
work environment for effective safety management.”
High reliability concepts also generated a number of the concepts and techniques found in the
DOE Human Performance Improvement Program enhancements used to add further robustness
to earlier ISM implementation.

A High Reliability Organization (HRO) is one in which in spite of the fact that it deals
with hazardous, high consequence operations, does so successfully, and demonstrates a
trend of continuous safety performance improvement.

The HRO recognizes that near misses and adverse trends in safety performance are an
indicator of potential for occurrence of high consequence events.

We seek to look at work as it is “actually” preformed, rather than how we “imagine” it to
be preformed.

A key attribute of being a HRO is to learn from the organization’s mistakes.

Operational Safety /Accident Analysis tools, such as: those already developed and, those
being currently documented and refined, by the joint DOE/EFCOG project, will be used
to learn from areas of management system/performance concerns, information rich
events, occurrences, and accidents.
HRO requires a “Just Culture”

An environment that recognizes human potential for error and clearly defines acceptable,
performance, safe behavior in a consistent manner.

Recognition of fairness related to the identification and resolution of human performance
problems.

Distinction between honest mistakes and intentional shortcuts with respect to discipline.

Free flow of plant information across all levels of an organization.

High level of self-reporting
3
HRO Concepts
Accidents can be prevented by proper organizational design supported by a proactive learning
culture and operational safety management systems.
1) Manage the System, Not the Parts.
2) Introduce Stability - Reduce Variability in HRO System.
3) Foster a Culture of Reliability.
4) “The Learning Organization” learns and adapts as an organization.
5) Focus is on the Organization not individuals.
6) Recognition that everyone makes mistakes.
7) Encouragement/ reward personnel to disclose errors without consequences.
The Role of Human Performance Improvement (HPI)
Human performance as it applies to the individual is a series of behaviors executed to
accomplish specific task objectives.
Organizationally, human performance is the aggregate system of processes, influences,
Behaviors, and results that become manifest in the way work is conducted.
We accept that the greatest cause of human error is weaknesses in the organizations management
systems not lack of skill or knowledge.
Latent Organizational Weakness, are hidden deficiencies in management control process or
values creating workplace conditions that can provoke an error and/or degrade the integrity of
defenses.
4
HPI Event Analysis - A High Reliability Organization fixes the organization not “the people”.

Review from the perspective of both: how people involved in the event (context) and,
how the safety management system preformed.

Evaluate the organization performance prior to and leading up to the event.

Recognize the event or accident is the effect or symptom of embedded, latent
organizational deficiencies, deeper trouble in the organization, and are not random
chance events.

If the management response to the event only focuses on the single apparent or root
cause, the other management systems, human performance errors leading to contributing
causes will not be addressed.
The recommended approach and analytical tools to perform this safety management system
analysis will be collected, and contained in the, to be developed: DOE-HDBK-XXXX-2011,
HANDBOOK FOR DOE OPERATIONAL SAFETY EVENT ANALYSIS AND ACCIDENT
INVESTIGATION.
The basis for the Handbook will be the conversion of the existing Accident Investigation
Workbook into a DOE Technical Standard. The project will be a joint activity sponsored by
HSS, the PSO AI POCs, and EFCOG.
HPI investigation techniques are being added to our current event and accident investigation
processes. The value of investigating an event or accident using HPI is in identifying
organizational weaknesses not found by other methods. HPI’s broader view identifies human
performance and systemic causes in management systems that impact the organization, not just
address areas to prevent the specific event from recurring.
The concepts and principles recommended are contained in: DOE-HDBK-1028-2009, June 2009,
DOE STANDARD HUMAN PERFORMANCE IMPROVEMENT HANDBOOK VOLUME 1:
CONCEPTS AND PRINCIPLES.
The examples of the Analytical Methods and Approaches recommended are contained in: DOEHDBK-1028-2009, June 2009, DOE STANDARD HUMAN PERFORMANCE
IMPROVEMENT HANDBOOK VOLUME 2: HUMAN PERFORMANCE TOOLS FOR
INDIVIDUALS, WORK TEAMS, AND MANAGEMENT.
5
HRO/HPI and ISM Integration Pathway
HRO concept needs to be integrated into the site’s ISM System Description. Incorporating
HRO/HPI strengthens our ISM system. ISM System attributes need to include HRO/HPI
concepts, analysis methods, feedback and improvement practices.
HRO/HPI Principles that complement ISM Core Functions and Guiding Principles are:
1)
2)
3)
4)
5)
6)
Each employee instinctively feels responsible for safety.
Leaders demonstrate commitment to safety.
Trust towards each other.
Decision-making reflects safety as the overriding priority.
An inquisitive operational safety attitude and behavior is essential.
A disciplined authorization basis system is in place to ensure all hazards are identified
and mitigated before work begins.
7) Organizational learning is embraced.
8) We openly examine our operations and solicit feedback on errors in defining work that
can lead to mistakes in analyzing hazards, and produce human performance errors.
9) We dig to determine the human performance errors in the presence of flawed defenses
that can have high consequences.
6
ISM - Define the Scope of Work / HPI - Determine the Context
Determine location, day of week, time of day, considered routine or special, Error Precursors,
Latent Org Weaknesses, relationship to other work scheduled. (Utilize the HPI Human Error
Precursor List).
ISM - Identify and Analyze the Hazards / HPI - Identify Critical Steps
Use Operational Safety Event and Accident Analysis Methods to identify where if Human Error
has or will occur would result in an unwanted outcome.
ISM - Develop and Implement Hazards Controls / HPI - Integrate HPI Tools
Review the HPI tool box for tools that are best suited for the given critical steps and integrate
into the job safety hazard, task analysis, and work control process.
ISM – Perform Work within Controls / HPI – Utilize HPI Tools
Implement enhanced training, pre-job briefings, job aids that incorporate the use of Human
Performance Improvement Tools.
ISM – Provide Lessons Learned Feedback and Improvement / HPI - Capture and
Communicate Lessons Learned
Feedback information on the adequacy of Human Performance Tools is gathered; opportunities
for improving the definition and planning of work are identified and implemented. Safety
Management Systems, barriers, and defenses are evaluated for integration of lessons learned.
7
Download