Human Reliability Assessment 2703

Human Reliability
“The probability that a person will correctly perform
some system-required activity during a given time
period (assuming time is a limiting factor) without
performing any extraneous activity that can
degrade the system”
(Hollnagel, 2002)
Human Reliability Assessment (HRA)
Assessment of the impact of human errors on systems safety
and, if warranted, the specification of ways to reduce
human error impact and/or frequency.
HRA is far from being a precise science, but is a useful means
for identifying and prioritizing safety vulnerabilities for
HE, thereby reducing the frequency of accidents.
Hybrid area: - Engineering & reliability
- Psychology
- Ergonomics
Probabilistic Safety Analysis (PSA)
 Engineering approach
 Quantitative statement of finding expected
frequencies of accidents and then compared against
predefined risk criteria.
 HRA must be incorporated into PSA if risk is to be
properly estimated…hence the need for the theoretical
framework (psychology and ergonomics)
HRA History
 Started early 1960s…expanded greatly since 1979. Why?
 Followed exact same procedure as conventional reliability
analysis  human tasks substituted for equipment failures
 Greater variability and interdependence for human
performance (‘human factor’)
 How can we get this ‘variability’ and ‘interdependence’
Understanding HRA
 Accident sequence analyzed represented as an
event tree (slide on next page).
 Nodes represent specific tasks/functions with 2
outcomes (success/failure)
 Engineering approaches (PRA/PSA) can calculate
failure probabilities in terms of material & process
information, but HRA must also account for the
“human factor” to determine if the human AS A
COMPONENT will fail.
Event tree structure
(Hollnagel, 2002)
Understanding HRA…cont (2)
 Traditional approach:
 Determine HEP (human error probability) using tables (from
collected data), HR models, or expert judgement
 Account for the influence of Performance Shaping Factors
 Calculate probability of an erroneous action using:
Understanding HRA…cont (3)
Using this formula we must make the following
1.) Probability of failure can be determined for
specific types of actions independently of context.
2.) Effects of context are additive (various
performance conditions don’t influence each
Understanding HRA…cont (4)
As a result, several models have been used to improve
1.) Behavioral (Human Factors ) models
Focus on simple manifestations (error modes)
Described in terms of omissions, commissions,
extraneous actions
Derive probability that specific errors will occur
Causal models very simple
Weak in accounting for context
Understanding HRA…cont (5)
2.) Information processing models
 Focus on internal mechanisms
 i.e. decision making, reasoning, etc.
 Explain flow of causes and effects through models
 Problems:
 Models often complex
 Limited predictive power (hypothetical basis)
 Little concern for quantification
 Context not considered explicitly
 Better suited for retrospective analysis than predictions
Understanding HRA…cont (6)
3.) Cognitive models
 Focus on relation between error modes and causes
 Models and (relatively) simple and context specific
 Premise:
Cognition is the reason why performance is efficient (or
Operator seen as acting in anticipation of future event
 Well suited for predictions and retrospective analysis
HRA Framework
 10 Steps with 3 MAIN GOALS
1.) Human error identification (What can go wrong?)
2.) Human error quantification (How often will a human
error occur?)
3.) Human error reduction (How can human error be
prevented from reoccurring or its impact on the system
be reduced?)
HRA Generic Methodology
Systematic way of approaching HRA logically
Will help ensure the problem is dealt with reliably
while minimizing error (biases)
Encompasses 10 steps from identifying the problem
to final documentation
Steps in
the HRA
HRA Steps
1.) Defining the problem
Define precisely the problem and its setting in terms of the system goals and
overall forms of human of human-caused deviations from these goals.
2.) Task analysis
Define explicitly the data, equipment, behaviour, plans and interfaces used by the
operators to achieve system objectives, and to identify factors affecting human
performance within tasks.
3.) Human error analysis
Identify all significant human errors affecting performance of the system and
finding ways in which human errors can be recovered.
HRA Steps cont…
4.) Representation
Model human errors and recovery paths in a logical manner for quantitative
measurement (integrate human errors with hardware failures)
5.) Screening
Define the level of detail and effort with which the quantification will be conducted
by defining all significant human errors and interactions and ruling out
insignificant errors
6.) Quantification
Quantify human error probabilities and human error recovery probabilities (to
determine likelihood of success in achieving system goals)
HRA Steps cont…
7.) Impact assessment
Determine sig. of human reliability to achieve system goals, to decide if
improvements in human reliability are required, and (if so) what are the
primary errors/factors negatively affecting the system.
8.) Error reduction
Identify error reduction mechanisms, the likelihood or error recovery,
improving human performance in achieving system goals.
9.) Quality assurance
Ensure the enhanced system satisfactorily meets system performance
criteria NOW and in the future.
HRA Steps cont…
10.) Documentation
Detail all information necessary to allow the assessment to be
understandable, auditable, and reproducible.
1. Problem Definition
2 parts of defining a problem:
1) Identify the HR problem
2) Identify the HR context
- Once HR problem identified and defined within the
systems context discussions with designers, engineers
& operational managers should occur
- Define system goals at various levels operator action is
required – this defines higher goals which the operator
was aiming for and can get to the operator intentions
at the time of the event and the root of the problem
Problem Definition cont. (2)
 Must investigate the “safety culture” of a plant – this
can dramatically influence HR and is important to
defining the problem
 If HRA is being carried out as part of an overall risk
assessment the HR analyst will probably be given a set
of scenarios to assign risk and HE to.
 By the end of the process the problem should be
explicitly defined in its respective system context:
- Scenarios to be addressed
- Overall tasks requires to achieve safety goals within each
2. Task Analysis
Provide a complete, comprehensive description of
the tasks that have to be performed by the operator
to achieve system goals
- Many forms of TA, some notable ones include:
1. Sequential – chronological order of events
2. Hierarchal – considers tasks in a hierarchy
3. Tabular – dynamic situations (operators actions
during a power plant emergency)
Task Analysis cont. (2)
Methods of deriving info from task analysis:
 Interaction with all levels (operators, maintenance,
supervisors, managers, system designers, etc.)
Observation (structured & unstructured interviews)
Procedure analysis
Incident analysis
Walkthrough/explanation of procedures from
Examination of system documentation
Task Analysis cont. (3)
 Important not to completely rely on
procedures/operating instructions – practical, real life
procedures often differ
 Operator/employee knowledge (tacit knowledge)
gained through experience vital in the TA process.
3. Human Error Analysis (HEA)
Stage to identify all errors associated with the
Most critical part of HRA. WHY?...
- If significant errors are omitted, will not appear in
analysis and may seriously UNDERESTIMATE
EFFECTS of human error on the system
3. HEA cont…(2)
Method example #1
Simplest approach to HEA…consider ‘external error modes’
1.) Error of omission:
Act omitted (not carried out)
2.) Error of commission:
Act carried out inadequately
Act carried out in wrong sequence
Act carried out too early/late
Error of quality (too little/too much)
3.) Extraneous error:
Wrong (unrequired) act performed
3. HEA cont…
 Once the factors which influence HRA are identified
the next step is representing them in such a way to
indicate their effects on the system goals…
4. Representation
Visual representation of events/actions in a scenario
 Can be used to represent simple or complex failure
 Skill & proper knowledge is needed of “tree”
construction – trees can become extremely complex
and off focus is not carefully put together
 A smaller scenario with a low number of errors may
not need or benefit from this type of representation
4. Representation cont…(2)
Fault Tree
 Typical representation of HE & its effect on a system is to use a
fault tree
 Logical structure that defines what events must occur in order of
an undesirable outcome
 Undesirable event located at the top of the “tree” (most
2 different types of gate that allow events to proceed to the next
1) “OR” gate – Only used if any of the events joined below it by this
gate occur
2) “AND” gate – Only used if all events joined below this gate occur
Fault Tree
5. Screening
 Identifies where major effort in the quantification
analysis should be applied.
 Filters out tasks/scenarios which may have little
contribution to system failure
 Screening methods have the ability to potentially
eliminate studying important errors and interactions
in the analysis
 As a general rule, when applying any screening
technique – if in doubt, leave the human error in the
fault/event tree
6. Quantification
 Human reliability needs to be quantified to something that can
compared across the HRA spectrum
 The metric for HRA quantification is Human Error Probability (HEP)
HEP = (# of errors occurred)/(# opportunities for error to occur)
 Expressed as a number between 0 and 1.
Little recorded industrial HEP data available because:
 Difficulty in estimating opportunities for error in realistic complex tasks
(denominator error)
 Confidentiality and unwillingness to publish poor performance data
 Lack of awareness regarding the usefulness of human error data (hence no
fiscal incentives)
There are lots of ways to
determine HEP
 For this course we keep it simple!
6. Quantification…cont.(2)
Problems with simulator data in determining HEPs:
 Personnel using simulators usually highly motivated and
know what’s on training curriculum
 Reliability of emergency training/responses on simulator
compared to the real situation (‘cognitive fidelity issue’).
Experiments are also usually controlled investigating only
one or two variables  generalization risky!!!
Lack of ‘generalizability’ has led to:
 Non-data-dependent approaches
 i.e. Expert opinion/judgment
7) Impact Assessment
 System risk or reliability is calculated
 Compared to acceptable levels/standards established
 Each event is analyzed and classified into a fault tree
 Both HE & hardware/software analysis are taken into
account for the best combination to improve the
 If HE dominates error reduction methods must be
 If HE cannot be reduced to acceptable levels –
redesign of the system is necessary
8. Error Reduction
Not required if:
 Human reliability is adequate (what’s adequate?)
 It’s not the most effective means of achieving system
performance (other modifications more suitable)
 Not within scope of assessment
If required, then:
 Focus on reducing impact/frequency of human errors
 Implement a more general error reduction strategy
Steps in
the HRA
8. Error Reduction…cont (2)
Ways of reducing impact of critical errors on system
 Prevent hardware/software changes
 Increase system tolerance
 Enhance error recovery
 Reduce error at source
8. Error Reduction…cont (3)
Additional considerations:
 Positive error reducing strategies should be factored
back into quantitative analysis
 Check that HEP(s) and overall system calculated risk
become acceptable
As part of the quality assurance phase:
 Provide ‘operational definition’ for each error reduction
 Ensure strategy is properly implemented and
maintained over time
9. Quality Assurance
 Effectiveness of error reduction mechanism
implementation should be ensured by:
Performance verification (at later stage)
Reliability/Validity Analysis (can be hard…why?)
 Continuous performance monitoring systems present
powerful quality assurance systems. Why?
Gradual performance standard degradation
Increased maintenance loading
Loss of personnel
Impromptu changes (since startup)
Increasing ‘retrofit’ changes
9. Quality Assurance cont…(2)
Long-term performance monitoring allows
Identifying WHEN in time results of HRA my
be outdated.
Signify need for further (or new) evaluation
Justifying acceptability of risk associated with
Avoid gradual deterioration of safety barriers.
 i.e. BHOPAL SYNDROME (India)
10. Documentation
Formally document all results of study
 Ensure auditability and justifiability of results
 Can provide database for future investigation and
Ensure assumptions and judgments included!!
 Aid new/unconnected personnel to understand
 Enable independent examination, updating, and
 Allows for learning from mistakes
Future Directions
1)Low technology Risk
2)Cognitive Errors and misdiagnosis
3) Management, organizational and sociotechincal
contributions to risk
Low Technology Risk
 HRA traditionally used for high risk, high technology
 HRA likes to focus on massive accidents that happen
less frequently – large consequences
 Not applied to high risk, low technology sectors as
much (ex. mining) – which has a larger number of
“small” accidents
 Can have very valuable applications to low technology
Cognitive Errors & Misdiagnosis
 Operators may misdiagnose a situation, not realize the
mistake and continue interpretation of the feedback
 Can make matters worse if operator overrides safety
system then if nothing was done at all
Management, Organizational &
Sociotechnical Contributions to Risk
 HRA should not only be applied to operators/workers
on the job site but also management and the
organizational design of the plant
 Bhopal, Challenger Shuttle & Chernobyl all had
significant human error BUT current HRA techniques
would not have detected risk prior to accidents
because error was neither procedural or diagnostic
Management, Organizational &
Sociotechnical Contributions to Risk
 Economics, time restraints, social pressures,
communication breakdown – personality conflicts,
etc. all add pressures on a system and its safety