Safety and Reliability ISSN: 0961-7353 (Print) 2469-4126 (Online) Journal homepage: https://www.tandfonline.com/loi/tsar20 Apportioning Safety Integrity Levels Pete Stanton To cite this article: Pete Stanton (2002) Apportioning Safety Integrity Levels, Safety and Reliability, 22:3, 49-56, DOI: 10.1080/09617353.2002.11690744 To link to this article: https://doi.org/10.1080/09617353.2002.11690744 Published online: 14 Mar 2016. Submit your article to this journal Article views: 12 View related articles Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=tsar20 Apportioning Safety Integrity Levels Pete Stanton, AEA Technology Abstract The method used to determine Safety Integrity Levels (SILs) for elements of a system should ensure that the risks associated with that element are reduced to acceptable levels and that they are shown to be so. However, to accurately determine the risk associated with part of a system requires detailed analysis of the design, after which it may be too late to change the design so that it meets its integrity requirements. A method is needed that will determine the SIL requirement to sufficient accuracy before the detailed design has started. This paper describes a method for determining the SILs of products at an early stage of the design. Introduction When designing a safety-related system, it is necessary to ensure that all subsystems, functions and interfaces are designed to an appropriate level of safety for the risks that they are associated with. To do this, DEF STAN 00-56, IEC 61508, and CENELEC EN 50128 and ENV50 129 (amongst others) all recommend that Safety Integrity Levels (SILs) are assigned to subsystems and functions as well as to the system as a whole. The standards define four levels of 1>afety integrity from SIL 1 (the lowest) to SIL 4 (the highest), representing different levels of rigour in the development process. In addition, some items may not have any requirement to reduce risk. Some of the standards refer to this situation as 'SILO'. SIL determination should be done before products are developed - in order to develop the products to the correct safety integrity- so any method used to determine SILs will suffer from the detailed design not being completed. Any derivations have to use engineering judgement and should err on the side of caution by ensuring that the SIL derived is not too low. To be of use, the method employed to derive SILs has to be simple to use and reasonably accurate. Current standards are often vague about how to do this apportionment, so this paper proposes a new method to determine SILs. Figure I, taken from EN 51028 shows the process that they all largely follow. It involves determining the inherent level of risk for each element of the system under investigation and determining what the acceptable level of risk is for this element. The size of the difference between the inherent risk and the acceptable risk determines the SIL requirement for the element. 49 Frequency of Hazardous Event ,------ Increasing Frequency i '-- Consequence of Hazardous Event Risk Level: No Protective Features ~ Required Risk Reduction i Software SIL ~ ~ 0 0 I-- ,--- Increasing Consequence System SIL 0 0 ~ QJ QJ SAFETY INTEGRITY LEVELS 43210- '-- EQUIPMENT UNDER CONTROL Very High High Medium Low Non-Safety Related Figure 1: EN50128 Process Method A new method for generating functional and product SILs has been derived for. It follows the basic principle of EN51028. The method used is based around a Hazard Ranking Matrix where the two numbers representing the likelihood and consequence of a particular hazard are added together to get a measure of the overall risk. This only works if the factors for each category for both likelihood and consequence are logarithmic to the same base. For instance in the consequence values a single fatality is around 10 times worse than a single major injury which is in turn around I 0 times worse than a single minor injury. Likelihood categories can be defined to follow a similar pattern (roughly every couple of days, monthly, yearly, etc.). This gives a table along the lines of the one shown in Table l. When added together, these likelihood and consequence figures give a total risk value which has two properties: • It is logarithmic: an increase of I in the risk value equates to a I 0 times higher risk; • It is consistent: the same risk value equates to the amount of harm. 50 Consequence Negligible Value ""== -= "' ~ ..... := Minor Major Critical Catastrophic 1 2 3 4 5 6 Incredible 1 2 3 4 5 Improbable 2 3 4 5 6 7 Remote 3 4 5 6 7 8 Occasional 4 5 6 7 8 9 Probable 5 6 7 8 9 10 Table 1: Risk Ranking Table Hazard Analysis The hazards related to the system need to be identified in the usual fashion (e.g. through a Hazard Identification meeting) and then analysed using the above table. This is not necessarily easy, as the assessor has to determine the likelihood and consequence to the system of the particular identified function failing at that point only. The figures need to be based on the assumption of the function concerned being implemented without consideration of safety. Protections available from other equipment or independent functions should be included in the analysis. Tolerable Risk Having determined the risk ranking figure for a particular function, we now need to decide what level of risk is· tolerable for this function. In general, we would expect all functions to have the same tolerable risk- i.e. we are prepared to call a particular number of equivalent fatalities per year as acceptable, regardless of how they are caused. However, we can deliberately bias the results by saying that certain accident types are less (or more!) acceptable than others. This is effectively what is done in tHe UK Railway Industry where there is a higher Value for Preventing a Fatality (VPF) for multiple fatality accidents. The Tolerable Risk (TR) is derived from the system level numeric safety targets. It then has to be converted into an equivalent risk ranking figure. The overall risk has to be divided up between every hazard in the analysis that contributes to it, so the TR needs to be calculated relative to the level of the hazard analysis being performed. It is obvious (but definitely not compulsory) to divide the system TR equally amongst all the hazards. There are two possible reasons not to divide it equally: • There may be numeric system safety targets for particular system failure modes that result in a lower tolerable risk for these modes compared with other ones. 51 • To meet overall safety targets, there is no actual need to apportion to failure modes equally. Some modes may get a higher than average rate, provided enough others get a lower one. As the TR only differentiates the risk to an order of magnitude, this means to increase the TR by 1 for a particular hazard, 10 other hazards have to have it reduced by 1 to ensure that the overall rate is still at the tolerable level. SIL Derivatio11 Hazard The SIL of a hazard is calculated by taking the risk ranking figure for the function from the 5x5 grid and subtracting the risk ranking figure which equates to the tolerable risk (see Section 0) for that function. This is because of the logarithmic nature of the risk ranking figure. Subtraction is equivalent to saying: Risk Reduction Factor Required= Risk Demand on the Function Tolerable Risk For example if a function was assessed (using Table 1) to be Critical (4) and Occasional (3), then the risk ranking would be 4 + 3 = 7. If the tolerable risk had been assessed as 6, then the function would need to be 7 - 6 = SILl. Data Signals The SIL of a data signal is the maximum SIL of each of the hazards related to the signal. For instance, if the hazard analysis has 2 entries for a signal with SILs of 1 and 3 then the signal needs to be SIL3 as a minimum. Functions The SIL of a function is the maximum SIL of each of the hazards related to that function. For instance, if a function has 4 separate entries in the hazard analysis with SILs for each hazard of3, 0, 2, and 0 then the function would need to be SIL3 as a minimum. Products If the product or a product's development cannot be divided such that a product can have different SILs for different functions, then the SIL of the product is the maximum SIL of each of the functions that it performs. For instance, if an indivisible product had 5 separate entries in the hazard analysis with SILs for each hazard as 2, l, 0, 3, 2, then the product would need to be SIL3 as a minimum. Of course, where a product's functions can be separated, then each part of the product needs to be developed to the maximum SIL that is required to perform that part's functions. 52 Subsystems and any other apportionment of the system would be treated in the same way. Issues Being new, this method is bound to raise some questions. The most important of these are discussed here. Determining the Level of Risk Reduction The basis for Hazard Analysis can be the same as the basis for the Preliminary Hazard Analysis, so that the SIL apportionment can be done early on in the project's lifecycle. Earlier it is stated that the estimated figures for likelihood and consequence need to be based on the assumption of the function concerned being implemented without consideration of safety. If some integrity has already been allocated then this has to be added to the SIL rating determined from the subtraction described above. This subtraction also gives the required risk reduction for the function. As this work is done before detailed design, it may be decided to cover some of the function's safety requirement by a protection system, possibly outside the system (including operating procedures). "Salami Slicing" There is a possibility that this method may suffer from the issue of a product (or a function) having a large number o.f entries in the list which could result in the SIL for the product as a whole being too low. This should not arise as the Tolerable Risk level is itself sliced according to the level of the hazard analysis. This means that the SIL figure for a hazard is that needed to reduce that hazard to a level such that it contributes only its 'fair share' to the overall risk. So adding up over a whole product (or even the whole system) will give the total risk that is compliant with the overall safety targets. ALA RP The results that come out of this analysis are the minimum SIL rating required to perform a particular function with sufficient integrity to reduce the risk to a tolerable level, i.e. so that the system safety targets related to this function will be met. This does not remove the need to consider ALARP issues. It is quite feasible that (say) a product could come out the analysis with a requirement to be SIL2, but that does not mean that designing it to SIL3 would not be an ALARP solution. 53 SILs> 4 Depending on the value determined for the Tolerable Risk, it can happen that this method suggests that a SIL of more than 4 is needed to reduce the risk related to a particular hazard to a tolerable level. In this case, the solution has to be that both the function relating to that hazard is designed to SIL4 and that at least one additional system to protect against the hazard is implemented. It should also be noted that designers have choices when designing SIL4 functions and products. Coming up with 'SIL5' using this analysis could be interpreted as meaning that, if additional systems cannot be sensibly added, then the options within SIL4 for design and analysis of the design that give the most protection (e.g. software and hardware as diverse as possible) should be used. Weighting for Multiple Fatalities It could be argued that any function that could fail in such a way as to result in multiple fatalities should be developed to.at least SILl and, as such, the method shown in this document is flawed as it allows for the possibility of such functions being SILO. The counter argument is based upon total risk, by saying that the same initial level of risk (i.e. same number of equivalent fatalities) should have the same SIL to reduce the risk to tolerable regardless of what likelihood and consequence pair make up that risk. One solution to this would be to use weighted equivalent fatalities. The UK railway industry currently weights its Value for Preventing a Fatality (VPF) by a factor of just under 3 for events that cause multiple fatalities ('Catastrophic' in Table 1). A similar weighting could be included in the table by considering 'Catastrophic' to mean 30 fatalities rather than I 0, increasing its value to 5.5. The problem then comes in rounding this to give a SIL. One suggestion would be to round up for less than SIL2 and down for greater than SIL2. Example The risk ranking table used for this analysis is as in Table 1. To quantify each row and column, the assignments shown in Table 2 and Table 3 were made: Category Description Catastrophic Multiple Fatalities Critical Single Fatality Major Single Major Injury Minor Single Reportable (Minor) Injury Negligible Single Non-Reportable Injury 54 Table 2: Example Consequence Categories Category Approx. Rate Incredible I in I 000 years Improbable I in lOO years Remote 1 per 10 years Occasional 1 per year Probable 1 per month (-I 0 per year) Table 3: Example Likelihood Categories This table has been produced specifically for this analysis. It does not (particularly in the likelihood categories) map precisely to the similar table in the Interface Hazard Analysis. A ranking of6 (equal to 1 equivalent fatality every hundred years) has been taken to be the tolerable risk for this level of analysis. This would need to be calculated from the top-level requirement. Table 4 shows some example results. The results shown here would lead to the system as a whole being designated SIL4, with the interface with the braking system also being SIL4 and the interface with an incident investigator (i.e. the 'black box') being SIL 1. Conseque nee Llkellhood Risk Score Tolerable Risk SIL Train fails to stop in time; collision 5 5 10 6 4 Applies emergency brakes when not needed Braking shock; passengers iniured 3 5 8 6 2 ATP System>Incident Investigator Fail to provide adequate accident I incident data Possible future accident not prevented 5 2 7 6 I ATP System· >Incident Investigator Provides incorrect accident I incident data Possible future accident not prevented 5 2 7 6 I Failure Mode System Effect ATP System>Brakes Fail to apply emergency brake when needed ATP System· >Brakes Interface Table 4: Example Results 55 Conclusions The method outlined in this paper is a potential technique for detennining the Safety Integrity Level (SIL) requirements of elements of a system before the elements have been designed in detail. The Hazard Analysis work for this method is needed at the same time as preliminary hazard analysis is nonnally being perfonned on the system, so the extra work required to detennine the SILs is quite small. References UK Railway Engineering Safety Management, 'The Yellow Book', Issue 03, January 2000 DEF STAN 00-56, Safety Management Requirements For Defence Systems, 1996 IEC 61508, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems, 2000 CENELEC EN 51028, Railway Applications - Safety Related Software, 2001 CENELEC DD ENV 51029, Railway Applications- Safety Related Electronic Systems for Signalling, 1999 About The Author Pete Stanton is a consultant within the Safety and Risk Management department of AEA Technology Rail. After graduating from Oxford University with a degree in Mathematics, he worked for Plessey Telecommunications on perfonnance analysis of telephone exchanges and communication networks. He joined British Rail in 1992 as a reliability engineer for InterCity and has worked in railway safety and reliability ever since, being involved in developing safety cases for new trains and infrastructure upgrades. 56