Risk management tools Patrick Hudson Tim Hudson Hudson Global Consulting How can we manage risk? • We can manage risk by hoping it won’t happen • We can manage risk by offering sacrifices to the Gods • We can manage risk by understanding what we are doing • The first two don’t work • The third is what a Safety Management System does Risk • Risk is a complex concept • Combination of to different components – RISK = Outcome x Probability of that outcome • Outcomes – what could happen – Usually seen as a scenario – Worst case - conservative – Most credible worst case • Probability of those outcomes – – – – Often measured as frequency of occurrence Needs to be applied before anything has gone wrong Probabilities are difficult to estimate Knowing the probability may change its value Session 16 Building World Class SMS There is more to an SMS than lots of good intentions No Structure Organizati on Structure TRIPOD Road Safety Plan Unsafe Act Audit Alcohol & Drugs Policy Structure HSE Policy Audit Plans safety management system HAZARDS & Continuous Improvement EFFECTS MGMT. Health Risk Assess. HSE Objectives Incident Plan Targets EA Potential Matrix Plan Feedback Check Do Engage Production Safety Management System (SMS) Better defenses converted to increased production Protection Safety Management System (SMS) Production Best practice operations under SMS Protection Generic HSE Management System (Shell) 1- Leadership and Commitment 2 - Policy and Strategic Objectives PLAN 3 - Organisation, Responsibilities Resources and Standards 4 - Hazards & Effects Mgt (Risk Mgt) DO 5 - Planning & Procedures 6 – Implementation, FEEDBACK Corrective Action Monitoring CHECK 7 - Audit 8 - Management Review Corrective Action Corrective Action Hazard-based approach HEMP - Hazard and Effects Management Process Identify - What are the hazards? Assess - how big are those hazards? Control - how do we control the hazards? Recover - what if it still goes wrong? Step 1. Identification • First identify your hazards – What is going to hurt you? – Needs to be specific enough to manage practically • E.g. not just potential and kinetic energy – General enough to manage specifics in the same way – Accumulate in a list – Hazard Register • A range of tools and methods help here – Brainstorming - proactive – HAZID – Incident analyses - reactive • Reporting Step 2. Assess • • • • • • How big is the risk you are taking and running? A wide range of tools available Not an exact science – whatever anyone tells you Small risks can be ignored Large risks may not be taken Usually framed in terms of ALARP – As Low as Reasonably Practicable – Not intended to be as low as possible • Risk assessment should point to what to do about the hazard in question Step 3. Manage and control • Primarily preventative • Success is measured by nothing going wrong • Prevention involves a variety of approaches – Use of the hierarchy of controls – Barriers to keep hazards in place – Controls to prevent them escaping • Management is directly responsibility for the provision of controls and barriers – Requires resourcing, procurement and continuous evaluation • Front line personnel is responsible for their use once provided and supported – Requires ability to operate the controls and barriers Step 4. Recovery • Recovery is necessary after control over a hazardous process has been lost • But before the worst case consequences have been achieved • Recovery controls and barriers are reactive • The term Mitigation applies best here • These controls are usually much more expensive than preventative controls • Sometimes challenged because “We’ve never used that so we can get rid of it and save money” Tools • Risk management tools are intended to help one or more of the 4 steps – Usually applied continuously to improve – Especially on the feedback loops • • • • • • • • • • Audits Incident investigations Reporting Performance assessment for predictive improvement Identify – discover unexpected hazards Assess – evaluate what needs to be done Control – systematically list the controls to see if they are adequate to reduce the risk to acceptable levels Recover – identify what will reduce the consequences Successful risk management allows us to take the risks that enable us to get the benefits without disaster These can easily be mapped onto the ICAO components – Not just the risk management elements – Also all the other elements Minimising Regret Maximising Opportunity Regret Go No-Go Incident Missed Opportunity No Regret Normal Operations Safe Risk Assessment Matrices • A simple way of supporting the product of outcome and probability • Not a discrete set of values, but an easy way of representing the distributions of severity of outcomes and their probabilities • So – there is no single CORRECT Matrix Risk Assessment Matrix C onsequ ence Ratin g People 0 No injury 1 Slight injury Minor injury Major injury Single fatality Multiple fatality 2 3 4 5 A ssets Environ ment No No effects damage Slight Slight damage effect Minor Minor damage effect Local Localised damage effect Major Major damage effect Extensiv e Massive damage effect A Incre asin g Probabi lity B C D Never heard of in industry Incident heard of in industry Incident heard of in company Low Risk Low Risk Low Risk E Low Risk Incident happens several times per year in company Low Risk Incident happens several times per year in a location Low Risk Low Risk Low Risk Med/low Risk Med/low Risk Med/low Risk Med/low Risk Med/low Risk Med/low Risk Med/low Risk Med/low Risk Med/low Risk Medium Risk Medium Risk Medium Risk Medium Risk Medium Risk Medium Risk High Risk High Risk High Risk High Risk High Risk High Risk High Risk The colour determines the level of active risk management required Risk Calculations 0 1 2 3 4 10 11 After 5 6 8 Now 7 9 12 13 Reduced exposure Left side 14 Mitigation Right side Risk matrix alternative 0 2 2 4 4 5 8 12 15 28 8 20 40 100 200 Mitigation Right side Reduced exposure Left side The numbers are a reflection of how unacceptable the matrix cell is What is ALARP? ALARP = As Low As Reasonably Practical 120 100 Risk to stakeholders Risk 80 Cost 60 40 Legal mimimum requirements 20 0 1 2 3 4 Options 5 6 How can we understand our controls? • The Bowtie is an industry standard in many highhazard activities • Bowties cover both control and recovery • Bowties are not primarily intended to be quantitative, but can be computed with • Bowties visually express the extent and types of control and are easy for managers to understand – Is everything procedural – Does one person have to do everything Bow-tie Concept Events and Circumstances CONTROLS H A Z A R D Harm to people and damage to assets or environment Undesirable event with potential for harm or damage Engineering activities Maintenance activities Operations activities C O N S E Q U E N C E S Bow-tie Concept for a specific event Events and Circumstances RISK CONTROLS H A Z A R D Harm to people and damage to assets or environment Undesirable event with potential for harm or damage Engineering activities Maintenance activities Operations activities C O N S E Q U E N C E S A problem for aviation • Simple models have difficulty in capturing recent major commercial aviation incidents • Asiana 214, QF 32, AF 447, BA 38 A Diversion - Causality • Simple accidents are simply caused – Linear and deterministic • Complex accidents are more complex • 80-20 rule suggests simple accidents are 80% • Remaining 20% require us to recognize complexity Theory 1 - how accidents are caused • Linear causes – A causes B causes C • Deterministic - either it is a cause or it isn’t • We can compute both backwards and forwards • People are seen as the problem – human error etc • Probably good enough to catch 80% of the accidents we are likely to have • Covers most of private and GA operators Private users Theory 2 - how accidents are caused • Non-Linear causes – Cause and consequence may be disproportionate – These causes are organizational, not individual • Deterministic dynamics- either it is a cause or it isn’t • We can compute both backwards and forwards – Increasingly difficult with non-linear causes • This is the Organizational Accident Model • Probably good enough to catch 80% of the residual accidents = 96% • Probably best GA and professional operations Oilfield operations Non-linearity • The size of an effect (consequence) is linearly proportional to the input – linearity • Non-linearity is different – The size of an effect (bad consequences) gets bigger (or smaller after a while) as a function of the input – The improvement in performance gets smaller (almost always) even though the input gets bigger • Linearity works fine to start with, but only 80% of the cases Linear and non-linear functions Linear Non-linear Effect Effect Cause Cause Suddenly gets a lot worse More non-linear functions Non-linear Non-linear Effect Effect Cause It can’t get much worse Cause Both – starts bad, tails off Determinism • A Causes B • If A happens, then B will happen next Non-determinism • Move from A causes B to A makes B more likely • Causation is probabilistic • Probabilities are distributions, not points Conditionalize on latest aircraft generation Types of accidents • Theory 1 • Simple models may cover 80% of all accidents • These are the simple personal accidents • Theory 2 • The next step gets 80% of the remainder = 96% • These are the complex personal accidents and some organizational accidents • Theory 3 • The probabilistic approach may net the next 80% = 99.2% • These are the complex process accidents Theory 3 - how accidents are caused • Non-Linear causes • Non-Deterministic dynamics – Probabilistic rather than specific – Influences on outcomes by people and the organisation • • Probabilities may be distributions rather that single values We cannot compute both backwards and forwards • The dominant accidents that remain are WEIRD – WILDLY – ERRATIC – INCIDENTS – RESULTING IN – DISASTER • Prior to an event there may be a multitude of possible future outcomes Unusual or WEIRD Accidents • In commercial aviation major accidents are now extremely rare • Simple risk assessment and analysis models often fail to capture how these accidents are caused • We need to understand our risk space better • The Rule of Three is an example of how to do this The Rule of Three • • • • Accidents have many causes (50+) A number of dimensions were marginal Marginal conditions score as Orange NO-Go conditions score as Red • The Rule of 3 is Three Oranges = Red Aircraft Operation Dimensions • • • • • • • Crew Factors Experience, Duty time, CRM Aircraft Perf. Category, Aids, Fuel, ADDs Weather Cloud base, wind, density alt, icing, wind Airfield Nav Aids, ATC, Dimensions, Topography Environment Night/day, Traffic, en route situation Plan Change, Adequacy, Pressures, Timing Platform Design, Stability, Management The Rule of Three Crash Big Sky Outcome We fixed it Problem No problem 1/2 1 1/2 No of Oranges 2 1/2 3 1/2 Why does the rule work? • People use cognitive capacity to allow for increasing risk • As the oranges increase the remaining available capacity is reduced • At 3 oranges there is little available capacity remaining • Any trigger can de-stabilize the system • An accident suddenly becomes very likely How random numbers combine Load > strength Normal upper limit Normal lower limit The danger zone/safe zone – safe operating envelope concept Normal path blocked by uncommon circumstance Normal path through the safe field Known dangerzone Defined Operational Boundary Unknown dangerzone (swiss cheese defect) Enter unknown dangerzone Risk • Risk is a complex concept • Classically probability x outcome • Safety management is about: – Taking risk – acceptable (ALOS) vs unacceptable – Running risk – getting away with it – Can be based on luck or on professionalism • The granularity of the outcomes and how they can be reached is essential • Most approaches are crude – Salami slicing is a way to evade regulation Risk Space High Risk areas Low risk/resilient areas Single distribution A Known danger zone Single distribution Known danger zone B Single distribution C Known danger zone Known danger zones Combined distribution (A,B,C) Combined distribution (A,B,C) Known Known danger zones danger zone Combined distribution (A,B,C) Known Known danger zones danger zone Unexpected danger zone Simple view of combined distribution Simple view of combined distribution Low average risk despite danger zone Simple view of combined distribution Medium average risk despite danger zone Simple view of combined distribution High average risk due to sufficient granularity Mission Creep and Drift into Danger • Success with risks makes people willing to accept greater risks – This is a consequence of risk homeostasis • This can look like complacency, but is a natural consequence of their successes, so far • Failure to understand the finer detail of the risk space makes this drift into danger more likely Conclusion • Conventional risk assessment involves uncovering the potential for bad consequences • Modern commercial aviation is very safe, so the accidents we wish to avoid may not be caught by standard techniques • Advanced risk analysis involves increasing our understanding of the risk space we operate in