Slaughter the PIGs - Risk Body of Knowledge

advertisement
LEADING PIGS
TO
SLAUGHTER
Risk matrices and/ or Probability x Impact Grids (PIGs), even what some call ‘heat maps’,
have been the subject of considerable debate in professional discussions and in a few
academic studies (Cox 2008). They have their advocates, mostly on the grounds of ease of
understanding and traditional use but they are usually ill-formed and ill-used. This paper
proposes that their use should be stopped immediately and replaced with simple, yet logically
and mathematically valid estimates of the extent of consequences of risk events.
1
What are PIGs?
1
Why Should We Use Them?
1
Why Should We Not Use Them?
3
What Should We Do Instead: turning PIGs into BACON
10
References
10
3
What are PIGs?
There are many versions of PIGs. The one that appeared in Appendix E of AS 4360: 1999
Risk management (Standards Australia, 1999) given in Figure 1, seems to be the precursor to
most of the uses of this method for representing the “severity” or level of risk.
Figure 1. Risk matrix from AS 4360: 1999
There are many examples of “qualitative” scales for the likelihood of the risks (using the terms
as in 1999) or the extent of their consequences. In some examples, there are three intervals
on such scales; in others, there are as many as nine.
There are many variants of showing the combination of these scales to represent the need for
managerial action. In some, the scale intervals are multiplied; in others, they are added.
There are a few examples where the scales represent both negative and positive consequences.
In some cases, as advocated in HB436: 2004 (Standards Australia, 2004), there is a
consideration of the range of outcomes or the range of likelihoods for an event but usually it is
expected that an event is at one point on each of the scales.
Why Should We Use Them?
Justin Talbot provides a the best summary of the arguments for, and against, PIGs in his blog
post (Talbot, 2013), What’s Right with Risk Matrices,
1
Table 1. Strengths and Weaknesses of PIGs (Talbot, 2013), emphasis added
1.
2.
3.
4.
5.
X
compare only a small fraction of randomly
selected pairs of hazards
mistakenly assign identical ratings to
quantitatively different risks
mistakenly assign higher qualitative
ratings to quantitatively smaller risks, lead
to worse-than-random decisions
mistakenly allocate resources, as
effective allocation of resources to risk
treatments cannot be based on the
categories provided by risk matrices
Categorizations of severity cannot be
made
objectively
for
uncertain
consequences
√
1. risk matrices are still one of the best
practical
tools
that
we
have:
widespread (and convenient)
2. promote
robust
discussion
(the
discussion often being more useful than
the actual rating)
3. provide some consistency to prioritizing
risks
4. help keep participants in workshop on
track
5. focus decision makers on the highest
priority risks
6. present complex risk data in a concise
visual fashion
7. prioritizing the allocation of resources is
not the role of the risk matrix – that role
belongs to the selection of risk
treatments
8. any risk assessment tool can assign
identical ratings to quantitatively
different risks
9. no tool can consistently correctly and
unambiguously compare more than a
small fraction of randomly selected pairs
of hazards
10. if a risk is in the ‘High’ or the ‘Top 10’ list
it requires attention and whether it is
third or fourth on the list is not likely to
be significant
11. subjective decision making will
always be a part of the risk
assessment process no matter what
tool is used
12. risk matrices are a tool which support
risk informed decisions, not a tool for
making decisions
13. last but not least, most of the flaws listed
above only exist if risk matrices are used
in isolation, which is rarely the case
[I am not sure about this last point. I have
seen many cases when everything hung on
the presentation of a PIG to support a
recommendation to take action.]
This summary brings out the main point about PIGs: they have been extensively used for
many years, appearing in several standards and textbooks about good risk management
practices but often – if not always – misused.
There have been a (very) few academic studies about the value or the application of PIGs.
Tony Cox is probably the researcher who has done the most in studying how to use risk
matrices. His somewhat classic analysis of the design of matrices (“What’s wrong with risk
matrices?”) was based upon their description in AS 4360: 1999 (not even the 2004 version,
with ISO/IEC 31000 published just after the article). His conclusions, however, are still valid
and appear throughout this paepr.
2
Why Should We Not Use Them?
In my view, using a PIG is like giving a loaded revolver to a child or, in another way of making
my views clear: a fool with a tool is still a fool and PIGs are a foolish tool. The problems with
PIGS, justifying their slaughter are:
Wrong scales
Wrong combination
Wrong use
Wrong scales
The probability (or likelihood or even plausibility) and the impact (or consequence) scales are
often expressed in words, using what the CIA has called “estimative words”. Unfortunately,
these words representing different points on the probability scale can have different meaning
to different people, so leading to different interpretations of the level of risk.
The CIA has been struggling with the use of words such as ‘could’ or ‘might’ or ‘virtually
certain’ in their Intelligence Summaries since the ‘50s. Of course, it does not help that they do
not speak English well (they have ‘may’ and ‘might’ as points on the same scale, whereas we
all know that ‘might’ refers to probability and ‘may’ refers to permission).
Figure 2 shows one of their recent attempts to come up with a consistent use of such words.
3
Figure 2. CIA Estimative Words Scale
There have been several academic studies, some over 60 years old, of how people interpret
adjectives describing probability or extent. Some examples include Johnson (1973), as given
in Figure 3. Note the range in meaning assigned to the phrases, such as “Highly Probable”.
Figure 3. Verbal expressions of uncertainty (Johnson, 1973)
Similarly, thirty years ago, Mosteller and Youtz (1990) summarized over 20 studies of the
probabilities assigned to qualitative expressions, and found that some expressions varied
widely in how they were assigned numerical equivalents. For example, “Sometimes” varied
from about .30 to .60; “possible” had median probability of .47 and a mean of .55 but over a
range 0f .40 from highest to lowest.
4
Table 2. Example of Likelihood Scale
Likelihood
Description
Almost Certain
Confident that it will occur at least once during the activity
and has occurred on a regular basis previously
Likely/ Probable
Plausible that it may occur at least once during the activity.
It ahs occurred previously, but is not certain to occur.
Occasional/ Possible
There is potential that it may happen during the activity.
Is sporadic but not uncommon.
Rare/ Unlikely
Out of the ordinary. It may occur at some tie during the
activity. Is uncommon but could occur at some time.
Highly improbable/ Doubtful
Not likely to occur at anytime during the activity, but is not
impossible
Given this sort of debate, does the scale given in Table 2 mean the same thing to every user?
How are the phrases used to describe likelihood converted into numbers in order to combine
with the consequence scale? (Actually, in this example, drawn from a source that shall remain
nameless in order to protect then from well-deserved ridicule, they are not combined in any
logical way, just used as labels for cells in a matrix).
If the scales are fully ‘anchored’ by descriptive phrases and associated explicitly with
probabilities/ plausibilities, then they could be used more reliably but usually they are
expected to stand alone, subject to variable interpretations.
Another of the difficulties with scales is the choice of how many points they contain. There is
considerable psychometric research into the number of levels that are reliable yet precise
enough to discriminate between different levels of performance. The consensus is about 7.
Cox (2008) makes this point, in his discussion of the need to have sufficient scale points to
allow for at least three ‘colours’ in the PIG. The risk level results are ‘weakly consistent’ if, and
only if, no ‘red’ cell can share an edge with a ‘green’ cell and there is sufficient ‘betweeness’’
(discrimination) if it is possible to pass through an intermediate colour between ‘red’ and
‘green’.
The key aspect of a scale is its resolution power. You cannot tell one level of risk from another
if the impact scale is too coarse.
An example, from the same unnamed organization, of a set of consequence scales is given in
Table 3. How much difference is there between ‘multiple fatalities’ and ‘fatality or permanent
disability’? Going from 5 to 50 to 100 to 200M in large, uneven jumps means that 6M is as
‘bad’ as 49, 51 is as ‘bad’ as 99 and ‘much worse’ than 49.
If the scale is too coarse then it is not possible to discriminate between desirable and
undesirable levels of performance. Such coarse scales often make it impossible to tell whether
a suggested risk response has actually reduced the level of risk. For the above scale, an
‘improvement’ in cost from $90M to $60M still results in a scale of ‘Serious’.
The third problem with the use of scales is that they should be measurable and at least more
than nominal. If they are nominal (only labels) then no mathematical combination is possible.
Even if the labels appear numeric, as in those on the backs of footballers, they cannot be
combined – there is no mean number for footballers, they cannot be added to give anything
meaningful.
In Table 3, is it 28 days for one person or 1 day for 28 people? What is the difference between
‘long term loss’ and ‘temporary loss’. How do you measure ‘medium term damage’ compared
with ‘short term damage’; how much ‘worse’ is it than ‘damage at regional level’?
5
Table 3. Example of Consequence Scales
Catastrophic
Critical
Serious
Disruptive
Minor
Objectives/
mission
Failure to
achieve an
essential
strategic
objective.
Failure to
achieve
essential
objective with
significant
strategic
implications.
Failure to
achieve
objective.
Failure to
achieve
milestone
with
implications
for business
objectives.
Minimal
impact on
objective.
Personnel/
HR
capability
Multiple
fatalities.
Impacts on
critical
business
continuity
requiring
immediate
significant HR
restructure.
Fatality or
permanent
disability.
Impacts on
business
continuity
requiring
immediate HR
restructure.
Temporary
disability
>28 days.
Causes long
term loss of
critical
skills,
knowledge
and
productivity.
Temporary
disability <
28 days
emergency
treatment;
admission
to hospital.
Causes
temporary
loss of
critical
skills,
knowledge
and
productivity.
Temporary
injury /
illness
requiring
nonemergency
treatment at a
medical or
first aid
facility.
Causes
minimal
disruption to
productivity.
Damage or
loss of major
resources that:
Significantly
reduces
sustainability
of corporate
business.
Effects > 50%
of resource
allocation, or
> $200M.
Prevents
delivery of a
outcome for a
protracted
period
impacts
between 30%
and 50% of
resource
allocation but
< $200M.
Requires
amendments
to training
regimes.
Disrupts
outcomes.
Impacts between 10%
and 30% of
resource
allocation
but <
$100M.
Disrupts
Unit level
training.
Results in
manageable
delays in
objectives.
Impacts
between 5%
and 10% of
resource
allocation <
$50M.
Can be
resolved
through Unit
action but
results in
insignificant
delays in
organizational
objectives.
Impacts < 5%
of resource
allocation but
< $5M.
Reputation
Long term
damage to Org
or Group
reputation.
Medium term
damage to Org
or Group
reputation.
Short term
damage to
Org or
Group
reputation.
Damage at
regional
level damage with
isolated
reports in
regional and
local media.
Damage at
local and/or
Unit level with isolated
media
reports.
Environment
and heritage
Damage may
be irreparable
or take more
than two (2)
years to
remediate.
Damage can
only be
remediated
over extended
period of time
-between 6
and 24
months at
significant
cost.
Damage
requiring
significant
remediation
during a
period of
between 3
and 6
months at a
high cost.
Damage
requiring
remediation
during a
period of
between 1
and 3
months and
at moderate
cost.
Damage can
be repaired by
natural action
within one
year.
Lack of key
workforce
capability
Resources/
capability
Damage or
loss to major
resources
As advocated in HB436 Appendix C, a scale for consequences should:
6

Have end points that cover the requisite variety (well, it does not actually use that
cybernetic term), in that they should cover the extremes in outcomes that are
possible – from upper extremes that are “remarkable” to lower extremes that are
“trivial” or very poor.

Cover measures that are ‘objective’ and tangible

Avoid relative measures, such as percentages (I am not sure about this, as sometimes
it is the percentage shift in performance that is the most meaningful but you can se
the point that relative measures can hide whether the performance is good enough or
not

Have a number of levels with the necessary resolution, even grouping more tightly
around the points where concerns could most lie.
If these scales are logarithmic; that is, each level in the scale is a factor of two or 10 times the
previous level then they need to be precise near the most likely values but they can become
too spread out at the extremes to provide any effective resolution between risks.
Wrong combination
It is in the combination of likelihood and consequences scales to assess the level of risk that
most of the errors occur.
There are various ‘usual’ ways of combining point estimates of likelihood and point estimates
of consequence into an estimate of risk extent or exposure or level. All of them have a strong
chance of providing misleading results.
The most common combination is the multiplication of the scale for likelihood by the scale for
extent. This combination is, in effect, an approximation of the ‘expected value’ of the level of
consequence.
Often the words used to express the scale levels show that they are distributed exponentially.
It is a mistake to treat these scales as if they are equally appearing intervals and multiplying
them to determine the risk level result, as shown in Table 4. The results can be unbalanced:
unlikely x moderate (2 x 2 in the scale but $1,000 in effect) has the same scale result as
certain x low (4 x 1 but $10,000 in actual effect) or rare x critical (1 x 4 or $10,000). In the
translation into ‘extent of required managerial action’ (shown in the colours) then these three
combinations lie in different bands when using absolute numbers but the same band when
using scales, depending upon where the cut-offs lie.
Table 4. PIG using Absolute Numbers
Rare
Unlikely
Likely
Certain
1
2
3
4
0.0001
0.001
0.01
0.1
Critical
4
100000000
10000
100000
1000000
10000000
Severe
3
10000000
1000
10000
100000
1000000
Moderate
2
1000000
100
1000
10000
100000
Low
1
100000
10
100
1000
10000
What should be done is to add the log likelihood with the log consequence and then use the
anti-log of the addition to determine the risk level result. Of course, these calculations are not
actually made but the look up table, as shown in Table 5, should reflect these effects.
7
Table 5. PIG using Logarithmic Combination of Ratings
Rare
Unlikely
Likely
Certain
1
2
3
4
0.0001
0.001
0.01
0.1
Critical
4
100000000
4
8
12
16
Severe
3
10000000
3
6
9
12
Moderate
2
1000000
2
4
6
8
Low
1
100000
1
2
3
4
The second way is the most common when using the so-called quantitative approach. All you
do is multiply likelihood by consequence on an interval scale, such as 0-10, yielding a result
between 0-100. Alternatively, you multiply the likelihood (0-1) by the dollar value of the
consequence, to give a dollar amount. In both cases, you compare the result against a limit
established by the risk criteria process, reflecting the risk culture/ attitude of the enterprise.
This result is influenced by the errors underlying the estimates and the assumptions of the
distribution of the estimates.
Such operations are equivalent to a single point estimate of the extent of the consequence. It
is the same as a decision tree where one branch has the output of the consequence x likelihood
and the other branch has the output of 0 x (1-likelihood).
This combination actually ‘degrades’ the extent of the consequence. For example, if the scale
point for consequence covers $1,000- 2000 and the likelihood is ‘unlikely’, say 0.01 – 0.2 on
that scale point, then the combination is equivalent to an “expected value” of $100-400. If the
consequence scale point were to cover $4000 – 5000 and the likelihood interval were to be
‘very likely’ (.9-1) then the combination is equivalent to $3600-5000. It is most uncertain
whether the managers setting the ‘acceptance’ levels are aware of the effect of this
multiplication.
Some practitioners determine likelihood and impact both on a 0 – 1 scale and then they
combine the scales by using the formula: likelihood + impact – likelihood x impact (which is a
conditional probability approach). In this case, they are treating likelihood as if it represents
the chance that an event will happen and the chance that the consequence is at the given,
fixed level. Again, it comes back to the difficulties of using point estimates to represent what
is a range of estimates. If the true underlying distributions are not symmetrical or not normal
then the combination can be misleading.
There are examples where the combinations have no mathematical basis at all. In Table 6,
from the same organization as before (where some many sins have been confounded), the
cells formed by the combinations are numbered sequentially; from 1 for the cell with the
lowest pair to 25 for the highest pair, as shown below. Then the colours, representing the risk
levels, seem to be arbitrarily assigned to the cells.
Table 6. Example of Labelled Cells in a PIG
Likelihood
Catastrophic Critical
Serious
/Impact
1 - Extreme
2 - Extreme
5 - High
Almost
certain
3 - High
4 - High
8 - Substantial
Likely/
probable
6 - Substantial 7 - Substantial
12 Medium
Occasional/
possible
10 - Medium
11 - Medium
13 - Medium
Rare/
unlikely
17 - Low
18 - Low
19 - Low
Highly
improbable/
doubtful
Disruptive
Minor
9 - Substantial
16 - Medium
14 - Medium
21 - Low
15 - Medium
23 - Low
20 - Low
24 - Low
22 - Low
25 - Low
8
As acknowledged in HB 436: 2013, in its Appendix C, it is acceptable to use risk matrices that
are ‘skewed’, in that either likelihood or, preferably, impact is emphasized in the combination.
If an organization is risk averse (or its senior decision-makers are) then almost every cell in
the ‘catastrophic’ or ‘critical’ columns could be coloured red, indicating the need for
immediate managerial action. This combination appears to be multiplicative, with a
weighting to the impact scale.
Julian Talbot presents an example of a PIG that illustrates another point about how to
represent the combination of likelihood and consequence or impact. Although Figure 4 is not
very clear, it does show that each consequence scale is associated with its own likelihood scale.
That is, the wording to represent the probability of reaching a level of consequence for people
is not the same as for the impact upon financial or reputation objectives.
Figure 4. Risk Matrix with multiple scales
Wrong use
Even if PIGs re properly formed, in scales and in combination, they can be still misused. Cox
(2008) notes, apart from the problems of insufficient resolution, there are errors such as
assigning “.. higher qualitative ratings to quantitatively smaller risks …”; suboptimal resource
allocation; and ambiguous inputs and outputs.
Other aspects of misuse include the confusion about what likelihood is being assessed. In the
earliest version of PIGs, it was the likelihood of the risk event. In more recent assessments, it
is the likelihood of the level of consequence arising from an event. Of course, this estimate
can be influenced by the likelihood of the events that contribute to the consequence but it is
the result of the event that is assessed rather than of the event itself.
9
Even if the combination is validly formed, such as adding logarithmic scales, the lack of
precision the scales can cause Type II errors, saying a difference does not exist when it
actually does. That is, it can lead to missing risks that should warrant attention.
What Should We Do Instead: turning PIGs into BACON
All that we are interested in is to determine whether action should be taken in response to the
evaluated level of risk and, later, to decide which of the alternative responses should be used.
PIGs should not be used to make this first decision and must never be used for the second.
What should be used?
The first decision could be as simple as just guessing whether there is a ‘strong enough’
chance of exceeding a threshold of concern. That is, a point is set on the consequence scale
that, if exceeded, should trigger action then it is estimated whether this point will be exceeded,
given the conditions (the set of events) that could occur in the time frame of interest.
More precisely, concern should be measured on a suitable interval scale, such as number of
injury-days or adverse reports in the media or dollars of financial gain. Then the ‘chance’
should be based upon the Pearson-Tukey estimate of the expected performance on this scale.
This estimate uses three points: the 5% (E5) point on the distribution of values, the 50%
(E50) point, and the 95% (E95) point. These points are combined into the estimate of the
mean by the formula (3 x E5 + 10 x E50 + 3 x E95)/16. If this mean estimate exceeds the
threshold ‘significantly, say by about (E95-E5)/4, then the indicated action should be taken.
The second decision involves judging the level of risk after responses have been implemented,
balanced against the costs of implementing them. The level of risk is a measure of the
‘benefits’ of the risk responses and so this decision involves a form of ‘cost-benefit analysis’.
The level can be estimated as before, using the Pearson-Tukey calculation of the mean value
on the consequence metric. However, other steps are necessary to turn this value into a dollar
value that can be included in the cost-benefit result. How to carry out this conversion is the
subject of another paper.
References
Central
Intelligence
Agency
(1993),
Words
of
Estimative
https://www.cia.gov/library/center-for-the-study-of-intelligence/kentcsi/vol8no4/html/v08i4a06p_0001.htm (cited 4 Aug 15)
Probability,
Cox, L. (2008), What’s wrong with risk matrices?, Risk Analysis, 28(2), 497-512
Johnson, E. (1973), Numerical encoding of qualitative expressions of uncertainty, Report AD
780 814, Army Research Institute for the Behavioral and Sciences, Arlington, VA
Mosteller, F. and Youtz, Cleo (1990), Quantifying Probabilistic Expressions, Statistical Science,
5(1), 2-34
Standards Australia (1999), AS 4360: 2019 Risk Management, Standards Australia, Sydney,
Australia
Standards Australia (2013), Hand Book 436:2013 Companion to AS/NZS ISO/IEC 31000:
2009 Risk Management – Guidelines and Principles, Standards Australia, Sydney, Australia
Talbot,
J.
(2013),
What’s
right
with
risk
matrices?
http://www.jakeman.com.au/media/knowledge-bank/whats-right-with-risk-matrices, cited
30 Aug 13
10
Download