Turning PIGs into BACON: Estimating consequences properly Dr Edward Lewis School of Engineering and Information Technology University of NSW Canberra www.layrib.com © Layrib 2014 Overview 1. 2. 3. 4. What is needed for responding to risk What is available What is wrong with it What can we do better Do not worry about keeping up. This presentation will be available from www.layrib.com 2 Preamble I want to cover the latest thoughts about risk assessment, as expressed in IEC/ ISO 31010: 2009 – Risk management – Risk assessment techniques, but it is being revised at the moment. The working group (and me) providing that revision met in London last week … • Preparing working draft for 31010, with another meeting in Prague in October – trying to group techniques and line them up with tasks or questions to aid selection • Preparing draft standard for Open Systems Dependability (how may are aware of “dependability” standards, such as Fault Trees or Root Cause Analysis?) What is needed: Risk as control Risk is “effect of uncertainty upon objectives” Performance or Possible deviation from desired performance (current or future value) that is of concern “Risk management: coordinated activities to direct and control an organization with regard to risk” risk ISO 31000, Guide 73 So emphasis should be upon control, which is: • Set the target level of performance • Monitor deviation from that level • Respond if the deviation exceeds limits Time Now Then What is needed: It is all about risk responses We talk a lot about risk assessment Not much about treatment (or response) But isn’t that the whole point about risk management … doing something about it? What is needed: characteristics of methods What we need is a method that enables us quickly enough to determine: • If a response is needed because the deviation is of concern • What response is possible – where to intercede • Which response is ‘best’ So we need methods for risk assessment that are ‘close enough’ to raise a red flag. With sufficient resolution (even resolving) power to tell the difference between • Risk of no action vs risk of action • Alternative responses 6 What is Available: Risk Assessment methods We do have many risk assessment methods (ISO 30010), as shown in the Handout. Most of them are variations on a theme, developed by different disciplines over time. We are trying to reconcile them but it is a political fight. For example, • Beta distributions in Bayesian analysis, with links to the use of Matthews correlations of the effects of risk indices; • Fault and event trees are subsets of logic trees, useful for the analysis of Bowties: • LOPA and Accimaps are similar approaches for identifying possible responses Many are often misused and even faulty, because they have no scientific or theoretical basis – bit like boiling the quinine tree to get anti-malarial medicine. So we need to see if we can improve what we’ve got in risk assessment and then go on to the design and evaluation methods … one day. Get ready Plan the plan Establish Risk Policy Establish responsibilities Gather resources Establish the Context Determine objectives, if given Consider circumstances from the external context that could influence setting or meeting objectives (valency: threats or opportunities) List desires of key stakeholders from their concerns about circumstances (valency: strenths or weaknesses/ vulnerabilities) Consider conditions from the internal context that could influence meeting objectives or values Establish time frame List objectives and values Establish risk criteria Form measures of consequence (determine outcomes, metrics, means of measurement) Determine how to estimate level of risk (measure likelihood of event, extent of circumstance) Establish acceptable levels of risk Determine how to aggregate risks (risk factors/ profiles) Assess risk: Anticipate need for action Identify risk 0.1 x 1 x 1 x 1 x 1 x 1 x 1 x 1 x 1 x 1 x x 1 1 1 1 x 1 x 1 x 1 x 1 x 1 x 0.1 x 0. x 1 1 x List sources of risk (causes of uncertainty), drivers of events)?? Determine events (changes in conditions or circumstances) Determine consequences of events Analyse risk 1 x 1 x 0. 1 Consider detail of causes, including interdependence and confidence in information about them Consider the effect of existing controls over the time span of interest Estimate anticipated risk level Evaluate risk Compare level of risk with risk criteria Determine whether action in response is warranted Treat Risk 0.1 Determine one or more options for modifying risk or possible mix of options Assess residual risk Compare costs, disbenefits of options Prepare treatment plan, showing priority for treatments Monitor treatment measures to determine if risks introduced elsewhere or residual risk grows above tolerable level Brainstorming ** Nominal Group Technique Interviews Delphi Check-lists Primary hazard analysis Hazard and operability studies Hazard Analysis and Critical Control Points (HACCP) Environmental risk assessment Structure What If (SWIFT) Scenario Analysis Business Impact Analysis Failure mode effect analysis FMEA/FMECA) Fault tree analysis Root cause analysis Event tree analysis Cause and consequence analysis Cause-and-effect analysis Layer protection analysis Decision tree Human reliability analysis Bow tie analysis Reliability centred maintenance Sneak circuit analysis Markov analysis Monte Carlo simulation Bayesian statistics and nets FN curves Risk indices Consequence/ probability matrix ** Real option pricing Cost/benefit analysis Multi-criteria decision analysis ** Multiple objectives utility technqiue x 1 1 x 1 x 1 1 1 x 0. 1 x 1 1 1 1 x 1 x x 1 1 x 1 1 x 1 1 x 1 x 1 x 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 What is available: PIGs We are most (too) familiar with the Risk Matrix Or Probability x Impact Grid (PIG) So what can we do about it. This example is from Military Risk Management, Defence Instruction (Army) Ops 68/1 of 31 Aug 2011. It is good to see risk management introduced in accordance with the principles of ISO 31000. Pity they have not got it right, as we will see. 8 What is wrong with it: Problems with PIGs Justin Talbot , What’s Right with Risk Matrices, http://www.jakeman.com.au/media/knowledge-bank/whats-right-withrisk-matrices, 30 Aug 13 1. compare only a small fraction of randomly selected pairs of hazards 2. mistakenly assign identical ratings to quantitatively different risks 3. mistakenly assign higher qualitative ratings to quantitatively smaller risks, lead to worse-thanrandom decisions 4. mistakenly allocate resources, as effective allocation of resources to risk treatments cannot be based on the categories provided by risk matrices 5. Categorizations of severity cannot be made objectively for uncertain consequences √ 1. risk matrices are still one of the best practical tools that we have: widespread (and convenient) 2. promote robust discussion (the discussion often being more useful than the actual rating) 3. provide some consistency to prioritizing risks 4. help keep participants in workshop on track 5. focus decision makers on the highest priority risks 6. present complex risk data in a concise visual fashion 7. prioritizing the allocation of resources is not the role of the risk matrix – that role belongs to the selection of risk treatments 8. any risk assessment tool can assign identical ratings to quantitatively different risks 9. no tool can consistently correctly and unambiguously compare more than a small fraction of randomly selected pairs of hazards 10. if a risk is in the ‘High’ or the ‘Top 10’ list it requires attention and whether it is third or fourth on the list is not likely to be significant 11. subjective decision making will always be a part of the risk assessment process no matter what tool is used 12. risk matrices are a tool which support risk informed decisions, not a tool for making decisions 13. last but not least, most of the flaws listed above only exist if risk matrices are used in isolation, which is rarely the case Oh? 9 My view of PIGs Using a PIG is like giving a loaded revolver to a child Another way of saying it is … A fool with a tool is still a fool and a PIG is a foolish tool Wrong scales Wrong combination Wrong use Art from Clkr.com, Creative Commons 10 Wrong scales Wording means different things to different people – “estimative words”. Can lack resolution power: cannot tell one level of risk from another because the impact scale is too coarse. • How much difference is there between ‘multiple fatalities’ and ‘fatality or permanent disability’? • Going from 5 to 50 to 100 to 200M in large, uneven jumps means that 6M is as ‘bad’ as 49, 51 is as ‘bad’ as 99 and ‘much worse’ than 49 11 Wrong combination Where things really go wrong is when probability is combined with impact Probability of impact is f(probability of event, probability of impact if event occurs) in PIG, have range of extent of consequence (same as likelihood of getting estimate), range of likelihood of event: that’s OK What do these numbers mean? 1 2 3 9 16 1 x1 1x 2 1x3 1 x 9? 1 x 3^2? 1 x 16 1 x 4^2? 3 4 8 14 21 1+2 2+2 2+6 2^3 2x4? I give up … But often get P x I wrong Often use scales that cannot be – must not be - combined mathematically Looks like assigning numbers (and colours) to successive cells. Is an Occasional loss of $99M a ‘worse’ level of risk than a Likely loss of $49M? 12 Wrong use 1/2 This example shows a consistent use of probability x impact (additive, as suitable for log scales). Shows different probability scales for different events. But can still be misused (not saying that Justin does so) • Showing difference in level of risk to choose treatments/ controls/ response • Adding levels of risk over criteria as a score for different treatments They are not precise enough or in the right mathematical form to do more than flag the need for attention. Often not even suitable for setting priority. 13 Wrong use 2/2: Enclosing PIGs in Risk Registers The other problem with existing practice is the use of Risk Registers. Risk Registers lead to, even enforce, a silo approach. They suggest that there are single causes or consequences. They suggest that there is one response to one risk. Regard as pens for the PIGs From ISACA (2012), COBIT 5 Implementation 14 What we can do better: Link bow ties into chains Firstly, turn PIGs free to run in herds, because drivers, events, and consequences entwine The consequence of one event can drive another event. 15 Examples of the richness of risk links 1/2 Alan McLucas (2011), Failures to Learn, presentation to Risk Engineering Society Workshop, Canberra, Nov 16 Examples of the richness of risk links 2/2 (Consequence chains) Lewis and Masroof work on risk management of the use of Cloud Computing by the ATO, 2012 17 What can we do better: Design Course of Action - Cedes Cedes are risk-responses that change the coupling between antecedents – behaviour – consequences: • Either to reduce ‘bad’ things • Or make the most of ‘good’ things Course of action = combination of cedes Pick point where cede has most cumulative effect upon set of consequences Others would call these cedes ‘controls’, Others would call these cedes ‘controls’, as shown in bowties. as shown in bowties. 18 What can we do better: Bayesian Analysis of Consequences 1/2 1. 2. Set up risk criteria Determine Willingness to Pay for levels of desired performance (concerns) Performance 1. 2. 3. 4. 5. 6. 7. 1 2 3 4 5 6 7 10 Injury (Hospital days) 3 2 1 WTP (000,000) 4 8 9 Worth Set Act level of concern = extent of consequence that if exceeded indicates immediate need for action Watch level of concern = extent of consequence if exceeded suggests that action will be needed if trajectory of effect continues, anticipating likely effect of controls and their timing Replace risk registers with bow ties or even consequence chains Analyse cascade of effects of circumstances – conditions – consequences using fuzzy Bayesian analysis of ANDs, ORs links Estimate consequence, based upon set of entwined conditions, using Pearson-Tukey: Sufficient to recreate . Mean est = .185 (5%) + .625 (50%) + .185 (95%) most cdfs . Variance est =[ (1 (95%) – 1(5%))/3.2]2 Determine if estimate exceeds Act level by, say one standard deviation Design course of action that intercedes at most effective points in chain Determine response with the best Risk Adjusted Price RAP = cost of resources needed for course of action + WTP for remaining level of concern What can we do better: Bayesian Analysis of Consequences 4 10 9 8 b Cede Protect staff People Objectives/ values Performance Worth measure measure Lose skill Keep staff safe Have Backup 7 4 3 2 WTP (000,000) Less Fund for facility 1 Money Keep data secure 1 2 3 4 5 6 Breach (Names 000s) Limit access 5 8 Less Fund for service Cede d c Cede 9 6 10 Avoid incident 1 2 3 4 5 6 7 Injury (Hospital days) 3 2 1 WTP (000,000) Need WHS policy Legal Sub-objectives/ values a Change in internal condition with energy to have an effect Cede External driver Act now Watch What can we do better: Other ways of showing Registers Linking these items Generates this tree And shows this Risk List Event with most effect First link Far link BACON So turn PIGS in pens into Bayesian Analysis of (cedes in chain) of CONsequences 22 What you can do to help I am convening the Working Group in SA/NZS OB 007 for preparing the Handbook “Making Decisions with Risk”, need help in gathering ideas and judgements So send in the cards and letters about what you know and how you can help Meanwhile, have a look at this presentation and Risk-Response: User Manual for Strategic and Systemic Thinking at www.layrib.com or chat through layrib@gmail.com 23