Uploaded by x

Direct Cause vs Root Cause

advertisement
Direct Cause vs Root Cause
“A Problem Solving Concept”
INCOSE Enchantment Chapter Meeting
March 14,2007
Dr David E. Peercy
Sandia National Laboratories
Department 12341, Weapon System and Software Quality
Presentation Objective
Events have many potential “causes”. We tend to think of “causes” as related
mostly to “unwanted” events – but in effect, all events that occur have
“causes” – that is, the reason that the event occurs.
The objective of this short presentation/discussion is to gain a better
understanding of why it is important to understand the difference between
“direct” causes and “root” causes of events.
In so doing, we enhance our capability to influence a much larger class of
events – both in preventing unwanted events and ensuring wanted events
actually do occur.
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
2
An Example of a Problem
USAF F-22A jets grounded by software glitch
<Jeremy Epstein <jepstein@webmethods.com>> Fri, 23 Feb 2007 15:55:52 -0500
Navigational systems failed, planes forced to return to Hawaii [visually having to
follow their tankers to safety].
The problem turns out to be software (no surprise there). Fix created, "verified",
installed, and they're off again.
[Direct or Root Cause addressed?]
A spokesman for Lockheed Martin this week insisted that the navigation software
problem was minor. 'The issue was quickly identified in a matter of days and a fix
installed in the airplanes, which were flown successfully to Japan,' he said. 'There
are 87 of these exceptional fighters and they are out there performing
exceptionally well, and their pilots continue to fly them in new and greater ways.'"
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
3
Examples to Test Our Understanding
RESOURCE: http://catless.ncl.ac.uk/Risks
Peter Neumann, Stanford University Professor
RISK site provides a voluminous list of risks, many of which are computer/software related primarily interested in security and safety risks; summaries are provided with links to more detail.
Army Training Accident, June 2002
Friendly Fire Deaths, March 2002
Medical “Direct/Root” Cause Determinations
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
4
A Simple Example
Assume each of these factors is as described below:
e: car will not start
d: battery is dead
c: alternator does not function
b: alternator is well beyond its designed service life
a: car is not being maintained according to recommended service schedule
Direct Cause?
Intermediary Causes?
Root Cause?
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
5
Error, Fault/Defect, Failure
Error
– a human action or lack of action that results in the inclusion of a fault in a
product or the way it is used
– the variance between expected and actual results
Fault/Defect
– an accidental condition that causes a product to fail to perform its required
function if encountered during operational use
Failure
– an event in which a product does not perform a required function within its
specified limits during operational use
ERROR
may lead to
FAULT/DEFECT may lead to
or
FAULT TOLERANCE may lead to
Direct Cause vs Root Cause INCOSE Chapter Meeting
FAILURE
NO FAILURE
REDUCED EFFECT
March 14, 2007
6
Direct Cause
Causes of events may be natural or man-made,
active or passive, initiating or permitting, obvious
or hidden.
Those causes that lead immediately to the effect
are often called direct or proximate causes.
Examples of direct/proximate causes:
Equipment
Human
Arched
• Pushed incorrect button
Leaked
• Fell
Over-loaded
• Dropped tool
Over-heated
• Connected wires
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
7
Root Cause
Direct causes often result from another set of
causes, which could be called intermediate
causes, and these may be the result of still other
causes.
When a chain of cause and effect is followed
from a known end-state, back to an origin or
starting point, root causes are found.
The process used to find root causes is called
root cause analysis --- systematic problem
solving.
A root cause is an initiating cause of a causal
chain which leads to an outcome or effect of
interest.
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
8
The Benefits of Problem Solving!
The usual purpose of attempting to find root
causes is to solve a problem that has actually
occurred, or to prevent a less serious problem
from escalating to an unacceptable level (e.g.,
Near miss safety for aircraft).
The basic concept is that solving a problem by
addressing root causes is ultimately more
effective than merely addressing symptoms or
direct causes.
That is, a “class” of problems may be
solved/prevented by addressing root causes
rather than just direct causes.
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
9
Basic Process - Continue to Ask Why!
Continue to ask “why” until you have reached:
1. Direct, Intermediate, and Root cause(s) - including
all organizational factors that exert control over the
design, fabrication, development, maintenance,
operation, and disposal of the system.
2. A problem/cause that is not correctable by your
organization => may be promoted to higher
responsible organization.
3. Insufficient data to continue.
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
10
Example
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
11
Why-Causal Tree
Undesired
Undesired Outcome
Outcome
WHY
Event #1
Occurred
WHY
Event #1
Occurred
WHY
Condition
Existed or
Changed
WHY
WHY
WHY
WHY
WHY
WHY
WHY
WHY
WHY
Failed
Failed or
or Exceeded
Exceeded
Barrier
Barrier or
or Control
Control
Event
Event #2
#2
Condition
Condition
Event
Event #1
#1
WHY
Condition
Existed or
Changed
WHY
Event #2
Occurred
WHY
WHY
WHY
WHY
WHY
Event #2
Occurred
WHY
WHY
WHY
Direct Cause vs Root Cause INCOSE Chapter Meeting
WHY
WHY
Failed/Exceeded
Barrier or Control
WHY
WHY
WHY
WHY
Failed/Exceeded
Barrier or Control
PROXIMATE
CAUSES
INTERMEDIATE
CAUSES
WHY
ROOT CAUSES
WHY
March 14, 2007
12
Example
Lost
Lost High
High Speed
Speed Data
Data Stream
Stream From
From Satellite
Satellite
(Mission
Failure)
(Mission Failure)
Thrusters
Thrusters Oriented
Oriented
Space
Space Craft
Craft
Poor
Poor
Line
Line of
of Sight
Sight
Technician
Technician Used
Used Wrong
Wrong
Method
Method to
to Correct
Correct
Satellite
Satellite Failed
Failed
To
To Deploy
Deploy Antenna
Antenna
Power Supply
Failed
Battery Dead
Installed
Improperly
Beyond Shelf
Limit
Root Cause is Much Deeper
Keep Asking Why
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
13
Potential Problem Analysis Tools
Failure Modes and Effects Analysis (FMEA)
– an inductive engineering technique used at the component
level to define, identify, and eliminate known and/or potential
failures, problems, and errors from the system, design,
process, and/or service before they reach the customer
Fault Tree Analysis (FTA)
– FTA is a deductive analytical technique of reliability and
safety analyses and generally is used for complex dynamic
systems
Probabilistic Risk Assessment (PRA)
– PRA is a systematic, logical, and comprehensive discipline
that uses tools like FMEA, FTA, Event Tree Analysis (ETA),
Event Sequence Diagrams (ESD), Master Logic Diagrams
(MLD), Reliability Block Diagrams (RBD), and so forth to
quantify risk.
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
14
Summary
Direct Cause vs Root Cause
– Issue: level of problem solving
Problem Solving
– Direct Cause: objective is to solve an instance of a
potential class of problems
– Root Cause: objective is to solve a class of problems
– Both are useful
Analysis Methods
– Methods exist to analyze events – goal is to eliminate
occurrence of unwanted events and ensure wanted
events do occur
– FMEA, FTA, PRA
Q&A?
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
15
Examples
Army Training Accident
Incident
– Thu, 13 Jun 2002: two soldiers were killed in training at Ft
Drum. They were firing artillery shells, and were relying on
the output of the Advanced Field Artillery Tactical Data
System. When they forgot to enter the target altitude, the
system assumed an altitude of zero. (Ft Drum is 676 ft)
Direct Cause
– Soldiers forgot to enter the target altitude
Potential Root Cause(s)
– Software should not default to a valid altitude
– Software/System analysis and modeling/testing inadequate
– Software requirements not adequately specified
– System CONOPS not adequate
– Soldier training inadequate
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
17
Friendly Fire Deaths
Incident
– A U.S. Special Forces air controller was calling in GPS positioning from some sort
of battery-powered device. He had used the GPS receiver to calculate the
latitude and longitude of the Taliban position in minutes and seconds for an
airstrike by a Navy F/A-18. The bomber crew "required" a seconds calculation in
degree decimals. The crew did not have equipment to perform the minutesseconds conversion themselves.
– The air controller had recorded the correct value in the GPS receiver when the
battery died. Upon replacing the battery, he called in the degree-decimal position
the unit was showing -- without realizing that the unit is set up to reset to its *own*
position when the battery is replaced.
– The 2,000-pound bomb landed on the air controller position, killing three Special
Forces soldiers and injuring 20 others.
Direct Cause
– Taliban position was incorrectly transmitted to the Navy F/A-18 bomber crew
Potential Root Cause(s)
– GPS System Default was a valid not invalid position
– Lack of battery backup to hold values in memory during battery replacement
– Not equipping users to translate one coordinate system to another (reminiscent of
the Mars Climate Orbiter slamming into the planet when ground crews confused
English with metric)
– Using a device with such flaws in a combat situation without adequate testing
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
18
Medical Direct/Root Cause
Example 1 - Questions?
Sentinel event
A patient was given the wrong
medication and the patient
experienced an adverse
reaction. As a result, the
patient's length of stay was
extended for an additional 10
days.
Direct cause
The nurse who
administered the
medication did not
compare the name on the
patient's armband to the
name on the medication
order. The nurse did not
follow the patient
identification policy.
Direct Cause vs Root Cause INCOSE Chapter Meeting
Root cause - thoughts?
Registration staff placed
the wrong armband on the
patient's arm to begin with.
March 14, 2007
19
Medical Direct/Root Cause
Example 2 - Questions?
Sentinel event
Doctor prescribes an anti-seizure
drug (phenytoin) and the patient
develops a severe allergic
reaction known as anaphylaxis.
The symptoms were itching,
hives, swelling in the throat,
wheezing, light-headedness from
low blood pressure, nausea, and
Direct cause
Patient is allergic to
phenytoin.
Root cause - thoughts?
The doctor did not do a
thorough background check
on the patient medical history
or the patient did not inform
the doctor of his/her previous
medical history.
abdominal cramping.
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
20
Medical Direct/Root Cause
Example 3 - Questions?
Sentinel event
Medication of Lasix drip hung to
wrong patient. Patient had same
last name.
Direct cause
Interruption during
medication administration. nurse had very heavy patient
assignment and skipped
double check medication
administration with another
RN.
Direct Cause vs Root Cause INCOSE Chapter Meeting
Root cause - thoughts?
Missed the double check
process on patient
identification and medication
administration. All hospital
medication should be double
checked by two nurses.
March 14, 2007
21
Medical Direct/Root Cause
Example 4- Questions?
Sentinel event
A patient slips and falls on a
slippery floor that has been
mopped previously from another
patient having an upset stomach.
Direct cause
Root cause - thoughts?
Janitor was not able to put
signs down noting caution
before the patient walked
The sign is not down noting
down the hall because he was
the caution.
interrupted by a cafeteria
worker needing him to clean a
spill made.
Direct Cause vs Root Cause INCOSE Chapter Meeting
March 14, 2007
22
Download