Root Cause Analysis / Kelly Smith

advertisement
Root Cause Analysis / Kelly Smith-Lawless
DRAFT
June 8, 2012/ Version # 1
Root Cause Analysis
What it is/Why Important
Root cause analysis, simply put, is a careful examination of a particular situation to discern the
underlying reasons for a specific problem or variance. The BABOK defines this as a “structured
examination of the aspects of a situation to establish the root causes and resulting effects of the
problem”. Depending upon the rigorousness of the examination conducted, it is possible to identify
several layers of symptoms before reaching the underlying cause or causes of a particular situation.
When is it Used
Root causes analysis is most commonly affiliated with Problem Solving, although it can also be applied to
organizational analysis, variance analysis, process improvement and software bug fixing. Essentially,
whenever an outcome is less than ideal, it is generally possible to find a causal relationship or two or
more. Given that some of the tools to discern root cause analysis can be subjective, it can often a
judgment call as to the underlying contributing factor causing the variance—especially when the system
undergoing evaluation is complex.
Questions to Consider
By carefully seeking out the root cause to a particular problem, and then applying some mitigation to
the root cause, problems generally go away. By merely treating the symptoms of the problem, the
underlying problem is likely to manifest itself in a new way, but not go away. Take for instance (dream
up a good example). Often problems may not be severe enough to apply rigorous evaluation. In
deciding how deep or quick to dive for the root causes, here are some questions to consider:







What are the consequences of this issue/problem? Is it front page headlines? Life threatening
or merely annoying?
Is this a single occurrence or has it happened before?
What is the probability of the situation occurring again?
Were there events leading up to the problem/issue that could have served as an early warning
signal?
Was there a recent change prior to the occurrence which may have directly or indirectly
facilitated its occurrence?
Is this a system wide type of issue or is it limited to a single office or department?
Are there controls in place to detect this type of issue/problem?
Sources of Problems
Root causes can be quite vast. Often it is a series of small problems and not just one single problem.
The following list was adapted from Paul Wilson et al’s book “Root Cause Analysis” published in 1993.
Having a list of contributing factors can often times help with identifying the actual root cause.

Training (formal and informal)
Page 1 of 8
Root Cause Analysis / Kelly Smith-Lawless












DRAFT
June 8, 2012/ Version # 1
Management Methods (resource and schedule planning)
Change Management (Modifications to existing process)
Communication (effective or not)
External (factors outside of the control of the agency)
Design (equipment and systems that support the work)
Work Practices (methods used to achieve the task)
Work Organization (organizing performance and sequence of tasks)
Physical Conditions (factors impacting performance)
Procurement (getting necessary resources)
Documentation (instructions and procedures)
Maintenance/Testing (including preventative maintenance)
Man/Machine relationships (alarms and controls in place)
It is important to note that these potential root causes could be symptoms instead, and in some
situations, it is possible to have multiple root causes.
Methods
There are a number of tools that can be used in determining the root cause. Each of these
methodologies will be explained briefly, a sample chart provided to illustrate the concept, and tools for
construction and interpretation will be provided. The tools profiled include:





Fishbone diagram
Ask why 5 times
Check Sheets/Pareto Chart
Interrelationship Diagram
RPR (Rapid Problem Resolution)
Fishbone Diagram
This tool is also referred to as the Ishikawa Diagram or the Cause and effect Diagram. It was named for
Karou Ishikawa who pioneered TQM processes in the 1960s at the Kawasaki shipyards. It derives its
common name from the shape of the diagram as evidenced in the Figure below:
Sample Fishbone Diagram
People
Plant
Problem
Procedures
Policies
Page 2 of 8
Root Cause Analysis / Kelly Smith-Lawless
DRAFT
June 8, 2012/ Version # 1
To construct a Fishbone Diagram, start with the problem you are addressing near the eye of the fish.
From there, identify the primary causes of the problem. Typically, these are the 4 Ps or the 4 Ms, but
can be what makes sense for the particular problem at hand. The 4 Ps are People, Procedure, Policy
and Plant. The 4 Ms are Man, Machinery, Methods and Material. A sample chart showing an example
of why a cup of coffee “could” be bad is as follows:
C
no om
tu p
pd ute
at r
ed
Figure XX
Bad Coffee Fishbone Diagram
People
rude
Wrong fee
Procedures
Too much coffee
N
tra o
ini
ng
Too much water
Too many
grounds
Wr
on
gs
i ze
No training
s
et
ck t
Pa w e
Bad cream
Bad sugar
outd
ated
r
Bad Coffee
Dirty cups
Brew time too
long
ke
t
Coffee not
hot enough
Equipment
er
m
ar t
W no ing
k
or
w
Lids don’t fit cup y bas
t
Dir
Material
filt
e
These diagrams can easily be constructed with pen and paper, and also various charting tools such as
Visio (See business processes/cause and effect diagram). To analyze the results, looks for common
examples. Is something listed several times? In this instance, no training and poor quality inputs (e.g.
bad sugar, dirty cups, etc.) appear to be very common themes to explore further. This is an excellent
tool to use in a group setting.
Ask Why 5 Times
While it is easy to jump to a solution, it is often more difficult to pinpoint why something occurred.
One of the most commonly used root cause analysis tools is referred to as the “5 Whys”. This is based
on the premise of continually asking why. Using the dirty coffee in the previous illustration, you caould
start with the apparent problem that the coffee is bad. By asking “why is the coffee bad”, one of the
Page 3 of 8
Root Cause Analysis / Kelly Smith-Lawless
DRAFT
June 8, 2012/ Version # 1
first responses could be its weak. The next why would be, “Why is the Coffee weak”, and the reply
could be not enough coffee. In asking “why not enough coffee used”, the reply could be we ran out.
The asking of “why” continues until you get to possible root causes. To illustrate this concept, see the
figure below:
Figure XX
Root Cause Analysis Tree Diagram
Apparent
Problem
Symptom of
Problem
Possible
Root Cause
Possible
Root Cause
Symptom of
Problem
Possible
Root Cause
Symptom of
Problem
Possible
Root Cause
Possible
Root Cause
Actual Root
Case
A real life example on using this tool can be found in Washington DC at the Jefferson Memorial. The
National Park Service noted that this monument was deteriorating at a faster rate than other DC
monuments. By asking Why 5 times, they were able to get at the root of the problem as follows:






Why is the memorial deteriorating faster? Because it was being washed more frequently.
Why was the monument being washed more frequently? Because there were a lot of bird
droppings.
Why were there more bird droppings on the monument? Because birds were very attracted
to the monument.
Why were birds more attracted to the Jefferson memorial? Because of the number of fat
spiders in and around the monument.
Why are there a lot of spiders? Because of the number of insects that fly around the
monument during evening hours.
Why more inspects? Because the monument’s illumination attracted more inspects.
Page 4 of 8
Root Cause Analysis / Kelly Smith-Lawless
DRAFT
June 8, 2012/ Version # 1
In evaluating various solutions to this problem (e.g. pesticides, special coatings, different light, etc.),
groups will identify different areas to focus on. In this particular case study, the Park Service chose to
turn on the lights an hour later every evening. This one change reduced the bird dropping problem by
90%.
When using this technique, it is possible to follow different paths and derive different solutions. Should
this occur, several factors can be considered when identifying the appropriate solution, such as what is
within the group’s ability to control. In the case of the Jefferson Memorial, they had the ability to
control lighting and selected a no cost option that addressed the problem.
This technique is also used for requirements elicitation, particularly when interviewing subject matter
experts. See the section Documenting and Managing Requirements.
Check Sheets/Pareto Chart
There is an old saying “what get’s measured, gets done.” In the case of root cause analysis, the
combination of creating a simple checksheet to collect data from observations or occurrences and
charting onto a Pareto Chart can help pinpoint problem areas. In the absence of data, often perceived
or apparent problems can lead you down the wrong path. By observing and recording the frequency of
an occurrence for a specific period of time, it is possible to determine relative severity. See figure below
for an example of a checksheet.
Cashier
Paper Jam
Figure XX
Secure Items Destroyed Checksheet
Incorrect Info
Transaction
Insufficient
Cancelled After
Funds /Credit
Entered
Print
Card denied
A
IIII
II
II
B
IIII IIII
IIII
III
C
III
II
D
III
I
I
I
6
Total
21
10
6
3
40
II
Total
9
20
5
In constructing a check sheet, it’s as simple as identifying the things you want to count and then
counting as they occur. After a reasonable period of time, just count up the occurrences. In this
example, the errors identified point to a paper jam (problem with paper and equipment) and incorrect
information entered by the operator. For the paper jam—it could be the printer or it could be the
material you are trying to print (weight, material, coating, etc.). To address this problem—it will be
necessary to do several trial tests to help discern what the true root cause is. For the “incorrect info
entered” it may be as simple as retraining cashier B or adding some behind the scenes edit checks to
Page 5 of 8
Root Cause Analysis / Kelly Smith-Lawless
DRAFT
June 8, 2012/ Version # 1
look for common errors. In all instances—it is best to focus on items within your immediate control and
environment first, before trying to throw technology at the problem.
Once the data has been collected, one powerful tool you can use to document the results is called a
Pareto Diagram. The Pareto Chart displays the relative importance of problems or occurrences and is
based on the principle that 80% of the problems result from 20% of the causes. The basis for the 80-20
rule was an Italian Economist, Vilfredo Pareto who noted that 80% of the land was owned by 20% of the
people. By applying the results from the check sheet above, a sample diagram is below:
Figure XX
Sample Pareto Chart
Reasons Secure Items Destroyed
25
21
78%
20
15
53%
92%
100%
10
10
6
5
3
0
Paper Jam
Incorrect Info
Entered
Number of Occurences
Transaction
Cancelled
120%
100%
80%
60%
40%
20%
0%
Insufficient
Funds
Cummulative % of Total Occurences
Note that the figure has two vertical axes. The one on the left provides a relative count of the number
of occurrences, where the one on the right focuses on cumulative % of total occurrences. By focusing
problem solving efforts on the largest volume, the total errors will be reduced significantly.
Interrelationship Diagram
An interrelationship diagram is another valuable tool that helps to compare related issues in order to
determine which ones are driving forces (root causes) and which ones are being influenced by others
(symptoms). This exercise is best done in a group setting where you have a variety of perspectives. The
matrices can take some time to get through, but typically provide valuable insights once completed.
Using a list of symptoms/root causes, create a matrix (we are using a 5x5 example here), and then add 3
additional summary columns to the right.
For this example, we will look at causes of ineffective meetings and the 5 potential symptoms or root
causes are: lack of an agenda, lack of facilitation, wrong people at the meeting, airtime dominated by a
few and rehashing same stuff. For this example, the symptoms will vary by group and organization and
Page 6 of 8
Root Cause Analysis / Kelly Smith-Lawless
DRAFT
June 8, 2012/ Version # 1
not doubt with group input, it is possible to come up with more items. The small number is more to
demonstrate how to construct, facilitate and evaluate the results. A completed matrix is below:
Figure XX
Interrelationship Diagram
Ineffective Meetings
A.
B.
C.
A. Lack of an agenda
X

B.
 X
Lack of Facilitation
D.
E.
In
Out
Total
- -

0
2
2



1
3
4
C. Wrong people at meeting
-
 X

-
2
0
2
D. Airtime dominated by a few
-
 
X

1
2
3
E. Rehashing same stuff
 
3
0
3
-
 X
For each issue identified—ask the group the impact of each item against another. Starting with A. Lack
of Agenda against B. Lack of Facilitation, ask the group, does A drive or influence B or does B influence
A? Typically, if you have a facilitator, you often have an agenda so in this instance, an “up” arrow is
placed on row A/column B to show impact of A on B and on row B column A. you will put an “in” arrow
to show the influence of A onto B. Next you will look at Lack of an Agenda and Wrong people at the
meeting. While you “could “ make a weak case that if you had an agenda, it could be obvious that you
have the wrong people at the meeting—there are other drivers for this—so in this case—we will put in a
“-“ dash signifying no relationship. For each pair—the matrix will receive a relationship mark. Once
completed—it is time to add things up. All arrows pointing inwards (items being influenced) get added
for each row and the sum is reported in the “In” column. All arrows pointing upwards (items
influencing) get added for each row and the sum reported in the “Out” column and then both in and out
are added together. In evaluating the results, look for the largest number of “out” as your root cause.
In this example, the lack of a facilitator leads to rehashing the same stuff (meeting after meeting). This
tool is also very good for determining critical processes, as well as root causes. Instead of listing
problems or issues, record all of your processes with letters and evaluate which processes influence or
directly impact other processes.
Rapid Problem Resolution (RPR)
Page 7 of 8
Root Cause Analysis / Kelly Smith-Lawless
DRAFT
June 8, 2012/ Version # 1
This technique was designed specifically to identify the root cause of IT problems. While it is aligned
with ITIL Problem Management Process, it requires that the problem be replicated and the method is
designed to focus on a single symptom at a time until a root cause is identified. The method is
comprised of three steps: 1) Discover 2) Investigate and 3) Fix. During the discovery phase, it is
important to obtain as much information about the problem as possible (what is the problem, when
does it occur, in what environment, frequency, etc.) and settle on what is the problem we are trying to
solve. The investigate phase focuses on being able to replicate the problem so that it is possible to
discern what is causing it. In this instance, it is necessary to develop and execute a diagnostic data
capture tool so that results can be obtained to identify what is causing the problem. Once the root
cause has been determined, then it is possible to trace where it occurs through reviewing diagnostic
data. Once the problem is found, then a fix needs to be developed and implemented, and the solution
verified.
In Summary
Root cause analysis is a critical component to problem solving. If you do not treat the root cause(s) of a
problem, it is likely that the problem will not go away. By treating symptoms, the problem often
manifests itself differently, offering a new set of symptoms. Since time and resources available to solve
problems vary, it is good to have several tools available for seeking out root causes.
Page 8 of 8
Download