IEC 61508 and Functional Safety System Selection Issue 2.1 4 April 2006 Author Dil Wetherill Measurement Technology Ltd. Power Court, Luton, Bedfordshire ENGLAND LU1 3JJ COPYRIGHT 2005 by Measurement Technology, Ltd. All rights reserved. No part of this publication may be copied or distributed, transmitted, transcribed, stored in a retrieval system or translated into any human or computer language in any form or by any means, electronic, mechanical, magnetic, manual or otherwise, or disclosed to third parties without the express written permission of: Measurement Technology, Ltd., Power Court, Luton, Bedfordshire, England LU1 3JJ Functional Safety System Selection ___________________________________________________________________________________ Table of Contents 1 INTRODUCTION ..................................................................................................................5 2 OVERVIEW OF FUNCTIONAL SAFETY..............................................................................6 2.1 Safety Integrity Level ............................................................................................... 7 2.2 Low and High Demand Modes................................................................................. 8 2.3 Constraints on Safety Integrity Level........................................................................ 8 2.3.1 Design Processes............................................................................................. 8 2.3.2 Proportion of Failures that are Safe................................................................... 9 2.3.3 Design Techniques and Measures.................................................................... 9 2.3.4 Tolerance of Hardware Faults........................................................................... 9 2.3.5 Safety Integrity Level and Architecture............................................................ 10 2.3.6 Probabilities of Failure .................................................................................... 11 2.4 Process Safety Time ............................................................................................. 13 2.5 Summary of Safety-related System Selection ........................................................ 14 2.5.1 System Architecture ....................................................................................... 14 2.5.2 Probability of Dangerous Failure..................................................................... 14 2.5.3 Speed of Response ........................................................................................ 15 2.6 Management Requirements................................................................................... 16 2.7 Certified Products.................................................................................................. 16 2.8 IEC 61508 and ANSI/ISA S84.01........................................................................... 16 3 APPLICATION EXAMPLE ................................................................................................. 17 3.1 Low Demand Application - Emergency Shutdown System ..................................... 17 3.1.1 Description of application................................................................................ 17 3.1.2 MOST SafetyNet System................................................................................ 18 3.1.3 Required Input and Output types .................................................................... 18 3.1.4 Configuration and Programming ..................................................................... 19 3.1.5 Probability of Dangerous Failure..................................................................... 20 3.1.6 Response Time .............................................................................................. 21 APPENDIX A – GLOSSARY OF TERMS AND ABBREVIATIONS............................................ 22 Terms and Abbreviations for IEC61508 ............................................................................ 22 List of Figures Figure 1 The Relationship between EUC Risk, Tolerable Risk and Residual Risk .................. 7 Figure 2 Probability of Failure on Demand with Proof Testing .............................................. 11 Figure 3 Determining if a Safety-Related System is suitable for the application .................... 15 Figure 4 Typical Emergency Shutdown Application.............................................................. 17 Figure 5 Typical Low Demand Application ........................................................................... 20 Figure 6 Typical ESD System Response Times ................................................................... 21 List of Tables Table 1 Safety Integrity Level with Architecture for Type A Subsystems ................................ 10 Table 2 Safety Integrity Level with Architecture for Type B Subsystems ................................ 10 Table 3 PFH and PFD for High and Low Demand Applications ............................................. 12 ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 3 of 24 Functional Safety System Selection ___________________________________________________________________________________ ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 4 of 24 Functional Safety System Selection ___________________________________________________________________________________ 1 Introduction This paper provides an introduction to IEC 61508 and describes an illustrative application example using the MOST SafetyNet System. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 5 of 24 Functional Safety System Selection ___________________________________________________________________________________ 2 Overview of Functional Safety Machinery, process plant and equipment may malfunction in such ways that people are put at risk of harm. The malfunctions may arise through physical faults (such as random hardware failures), through systematic faults (such as errors made in software) or from common cause failures (such as temperature extremes affecting a number of pieces of equipment). IEC 61508 provides a framework for: • • • Assessing the level of risk initially presented by the machinery, process plant and equipment and establishing if this risk is acceptable. Implementing a safety function that will provide a level of protection such that the risk is reduced to an acceptable level - if the initial level of risk is found to be too high. Providing a means by which the equipment selected to implement the safety function can be shown to provide the required protection. The machinery, process plant and equipment is referred to as the Equipment Under Control or EUC. The system which is used to monitor inputs from the EUC (and its Operators) and which then generates outputs, causing the EUC to operate in the desired manner, is called the EUC Control System. The risk presented by the EUC and its Control System (the EUC risk) is the starting point from which risk reduction begins. Risk reduction should initially focus on the EUC and its Control System – perhaps by re-designing the machinery, process plant or equipment. Eliminating or reducing the EUC risk itself is preferable to using protection techniques to reduce that risk. IEC 61508 concentrates on protection using electrical, electronic or programmable electronic systems. These are referred to as E/E/PE systems. Since they are used to reduce the equipment under control (or EUC) risk they are said to be “safety-related”. Other methods for providing protection can be used – either “alternative technologies” (for example hydraulic systems which are alternatives to E/E/PE systems) or “external” protection (such as bunds, firewalls or drainage systems). Neither alternative technologies nor external protection are specifically covered by IEC 61508, but their use is recognised as an integral part of reducing the EUC risk to a tolerable level. The combination of E/E/PE, alternative technology and external protection employed to reduce the EUC risk is described as “Functional Safety” – in the sense that the correct operation (or function) of the protective systems provides the required reduction in risk (i.e. the required level of safety). An E/E/PE system will normally be made up of one or more input devices such as switches or transmitters (sensors), a programmable logic solver of some form (a logic system) and one or more output devices such as pumps or valves (final elements). In practice, initiation of the safety function is by the E/E/PE system setting its outputs to a safe state for the application in question. This gives the background to the title of IEC 61508: “Functional safety of electrical/electronic/programmable electronic safety-related systems”. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 6 of 24 Functional Safety System Selection ___________________________________________________________________________________ Figure 1 shows the relationship between EUC risk, tolerable risk, residual risk and the necessary and actual risk reduction through the functional safety provided by E/E/PE, alternative technology and external protection systems. Residual risk Tolerable risk N ecessary risk reduction EUC risk Increasing risk Actual risk reduction Figure 1 The Relationship between EUC Risk, Tolerable Risk and Residual Risk 2.1 Safety Integrity Level The level of risk reduction required varies according to the risk that is to be reduced and the tolerable risk that must be achieved. The techniques described in IEC 61508 lead to a formal determination of the reduction required for each risk under consideration. Once the level of required risk reduction is found, it is normally expressed as a safety function within a particular band of Safety Integrity Level. An appropriate safety-related system can then be selected by choosing a system that falls in the appropriate band of Safety Integrity Level. Products that provide the highest degrees of protection are designated SIL 4, with SIL 3; SIL 2 and SIL 1 providing respectively lower degrees of protection. The majority of safety systems are designated SIL 3 or SIL 2; SIL 4 is rarely used. In addition to assessing the likelihood of hardware failure, Safety Integrity Level is assessed against the rigor of the design processes used to prevent systematic failures (as might occur in software) and the hardware architecture used to provide the safety function. It is not sufficient for the probability of hardware failure alone to be compatible with a particular Safety Integrity Level; it is also necessary for the manufacturer to satisfy the design process requirements, hardware fault tolerance and safe failure fraction for the target SIL. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 7 of 24 Functional Safety System Selection ___________________________________________________________________________________ 2.2 Low and High Demand Modes IEC 61508 defines two fundamental modes of operation for the safety-related system – Low Demand and High or Continuous Demand. Which of these is required, depends on the frequency with which the system might be required to perform its safety function in the given application (i.e. the likely frequency of operation of the safety function defines which mode of operation is required). High demand is defined as being more than one demand per year; low demand is defined as one demand per year or less. A typical application requiring a safety-related system operating in a high demand mode would be a guard for a machine press, where the guard prevents the operator from being at risk from personal injury. The safety system could be expected to operate significantly more than once per year. A safety-related system that operates in high demand mode is therefore required. A typical application requiring a safety-related system operating in low demand mode would be a fire and gas system that would only be required to operate in the case of a fire or gas leak. This system would be expected to operate less than once per year and a safetyrelated system operating in low demand mode would be appropriate. (Note: it could be considered that a fire and gas system is a “mitigation” system – i.e. one whose objective is to limit the damage caused by a failure, rather than preventing the failure – and therefore not subject to IEC 61508. Here, it is considered as a protection system, which prevents a fire or gas release from causing further harm.) 2.3 Constraints on Safety Integrity Level A number of constraints are defined which limit the SIL that can be claimed for any safetyrelated system. The constraints are: • • • • • the design processes used by the manufacturers of the elements of the system the design techniques and measures used to limit the effects of failures during operation the tolerance of the system to hardware faults the proportion of faults that lead to safe failure modes the probability of the system failing to provide protection Each of the above constraints is discussed in more detail in the following Sections. 2.3.1 Design Processes In order to use a product as part of a safety-related system, the end-user or system designer must establish that the manufacturer of the product has met the requirements of IEC 61508 in the processes used to manage the specification and design of the product. This is to ensure that all relevant measures have been taken to avoid failures (i.e. to ensure that failures are not inadvertently designed in to the product). ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 8 of 24 Functional Safety System Selection ___________________________________________________________________________________ 2.3.2 Proportion of Failures that are Safe IEC 61508 defines a concept known as the safe failure fraction. This is a simple measure of the proportion of hardware failures that are either safe, or dangerous but detected, compared with the total number of possible failures (the total being made up of safe, dangerous detected and dangerous undetected failures). Obviously, the proportion of undetected dangerous failures is of critical importance in a safety-related system. The level of safe failure fraction, together with hardware fault tolerance, limits the SIL that can be claimed for a particular safety-related system. For simplicity, bands of safe failure fraction are defined by the standard: <60%, 60% to <90%, 90% to <99% and ≥ 99%. IEC 61508 defines type A and type B subsystems, the difference between the two being the level of confidence in the understanding of failure modes of components, the behaviour of sub-systems under fault conditions and the field data collected to provide practical confirmation of the theoretical analysis. Type A subsystems are those for which there is a higher level of confidence, for type B systems there is less confidence, with the significant difference being that for type A subsystems, more field failure data has been collected. 2.3.3 Design Techniques and Measures IEC 61508 specifies techniques and measures that should be used in the detailed design of the product. Their purpose is to avoid failures such as software and manufacturing faults and to control failures during operation. Only by using the techniques and measures specified can manufacturers claim a particular safe failure fraction and safety integrity level. 2.3.4 Tolerance of Hardware Faults A safety-related system is said to have a hardware fault tolerance of N, when N+1 faults could cause the loss of the safety function. The level of hardware fault tolerance (either 0,1 or 2) is one of the determining factors for the safety integrity level of a particular product. Hardware fault tolerance determines the highest SIL that can be claimed for a product, but also determines whether or not the speed with which the product carries out its internal diagnostics need to be considered in relation to the process safety time – see Section 2.4. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 9 of 24 Functional Safety System Selection ___________________________________________________________________________________ 2.3.5 Safety Integrity Level and Architecture The safety integrity levels that can be claimed for given safe failure fractions - given the restrictions on design techniques and measures - and hardware fault tolerances for type A and type B systems are shown in the tables below: Safe Failure Fraction < 60% 60% to < 90% 90% to < 99% ≥ 99% 0 SIL 1 SIL 2 SIL 3 SIL 3 Hardware Fault Tolerance 1 SIL 2 SIL 3 SIL 4 SIL 4 2 SIL 3 SIL 4 SIL 4 SIL 4 Table 1 Safety Integrity Level with Architecture for Type A Subsystems Safe Failure Fraction < 60% 60% to < 90% 90% to < 99% ≥ 99% 0 Not allowed SIL 1 SIL 2 SIL 3 Hardware Fault Tolerance 1 SIL 1 SIL 2 SIL 3 SIL 4 2 SIL 2 SIL 3 SIL 4 SIL 4 Table 2 Safety Integrity Level with Architecture for Type B Subsystems Note – if any subsystem of a particular safety function is type B, then the safety function must be treated as if it were type B. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 10 of 24 Functional Safety System Selection ___________________________________________________________________________________ 2.3.6 Probabilities of Failure The probability that a safety-related system would fail to provide the required protection could be expressed either as the probability of a dangerous failure per hour (PFH) or as an average probability of failure of protection on demand (PFDavg). Which of these two measures is used depends on the nature of the hazard – if it is continually (or very often) present (as in high demand applications), then the probability of dangerous failure per hour is the most useful figure to use. If the hazard is infrequently present, then the probability of failure of protection on demand (as in low demand applications) is most appropriate. For safety functions that do not employ hardware fault tolerance, PFH is simply calculated as the sum of the undetected dangerous failure rates for each element of the safety function. Where hardware fault tolerance is used, the calculations are considerably more complicated, and have not been considered here. Probability of Failure on Demand PFDavg is calculated according to the probability of failure, but is also dependent on the proof test interval defined for the product. For simplicity, it is assumed that the probability of failure is constant – such that as time passes after the last proof test, the probability of an undetected failure having occurred increases linearly. (The probability of failure is actually an exponential, but it can be taken as approximately linear for the early part of the curve.) This probability of failure is effectively reset to zero by carrying out a proof test (this assumes that the proof test is a “complete” test, which may be only approximately true). The average probability of failure can then be found. This is shown in Figure 2. PFDAVG Proof Test Interval Time Figure 2 Probability of Failure on Demand with Proof Testing ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 11 of 24 Functional Safety System Selection ___________________________________________________________________________________ IEC 61508 defines the Safety Integrity Level required for both Continuous/High Demand applications and Low Demand applications, according to the required PFH or PFDavg. PFH or PFDavg by Safety Integrity Levels for high and low demand applications are shown in Table 3 below. Sa fety Integrity Level Continuous/ High-dema nd M ode of Opera tion (prob. of dangerous fa ilure per hour) Sa fety Integrity Level Low dema nd M ode of Opera tion (prob. of Failure on Demand) 4 >= 10 -9 to 1 0 -8 4 >= 1 0 -5 to 1 0 -4 3 >= 10 -8 to 1 0 -7 3 >= 1 0 -4 to 1 0 -3 2 >= 10 -7 to 1 0 -6 2 >= 1 0 -3 to 1 0 -2 1 >= 10 -6 to 1 0 -5 1 >= 1 0 -2 to 1 0 -1 Table 3 PFH and PFD for High and Low Demand Applications Note: When “probability of dangerous failure per hour” and “probability of failure (to protect) on demand” are given, these relate specifically to the probability that the safetyrelated system will fail to provide the necessary protection i.e. fail in a dangerous manner. These figures give no indication as to the likely level of overall failure (i.e. the availability of the system). ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 12 of 24 Functional Safety System Selection ___________________________________________________________________________________ 2.4 Process Safety Time IEC 61508 defines the concept of process safety time as “the period of time between a failure occurring in the EUC or the EUC control system (with the potential to give rise to a hazardous event) and the occurrence of the hazardous event if the safety function is not performed”. It follows that implementation of the safety function must be appropriate to the process safety time of the given risk. The time to carry out the safety function is not given a specific name within IEC 61508, but will be termed response time in this document and treated as if it were defined in the standard. It is easy to see that the response time of the safety function must be shorter than the process safety time. In high demand applications, it is also necessary to consider the length of time to detect and respond to - and/or repair - faults revealed by internal diagnostics. The time taken to detect internal faults is known as the diagnostic test interval and the time taken to respond once a fault is detected is known as the fault reaction time. Further, the mean time to repair the system must be taken into account for applications that will continue to operate – and therefore continue to present a risk – before the safety function can be repaired. Note - it is not necessary to consider the diagnostic test interval and the fault reaction time when the system is tolerant to hardware faults – but it must be considered when a single hardware fault could render the system incapable of carrying out its safety function on demand. Note – the response time of the system must include the time taken for input and output devices to respond (e.g. the time taken for a valve to close) and must make worst case assumptions for any cyclical (or non-deterministic) processes. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 13 of 24 Functional Safety System Selection ___________________________________________________________________________________ 2.5 Summary of Safety-related System Selection Establishing the suitability of a certified safety-related system to carry out a particular safety function can be simplified in to 3 basic tests: • • • Is the system architecture suitable? Is the probability of a dangerous failure low enough? Can the system respond sufficiently quickly? The following sections provide a simple summary of what is required to check the suitability of a safety-related system against these basic tests. 2.5.1 System Architecture Table 1 in Section 2.3.5 shows the maximum safety integrity level that a system can be used to provide, given the hardware fault tolerance and the safe failure fraction. 2.5.2 Probability of Dangerous Failure Table 3 in Section 2.3.6 shows PFH and PFDavg by Safety Integrity Level, for high and low demand applications. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 14 of 24 Functional Safety System Selection ___________________________________________________________________________________ 2.5.3 Speed of Response For any safety function, the process safety time must be longer than the response time. In high demand applications the process safety time must also be longer than the diagnostic test interval and the fault reaction time, if there is no hardware fault tolerance. Figure 3 summarises the steps that must be taken to determine if the performance of a safety-related system is sufficient to achieve a particular safety function. It assumes that the system architecture is suitable for the SIL being considered and that the mean time to repair need not be considered due to the nature of the application (i.e. the safety function will be carried out in the event of a fault). Further, in high demand applications, it assumes that the diagnostic test interval is at least an order of magnitude smaller than the demand rate. L No Process Safety Time > Response Time? More than once a year No Process Safety Time > Diagnostic Test Interval + Fault Reaction Time? Yes What is the Demand Rate? Yes Low Demand Mode calculate PFDavg of each safety loop High Demand Mode calculate PFH of each safety loop L No Once a year or less PFH < limit for target SIL? Yes PFDavg < limit for target SIL? No L J Figure 3 Determining if a Safety-Related System is suitable for the application ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 15 of 24 Functional Safety System Selection ___________________________________________________________________________________ 2.6 Management Requirements IEC 61508 places a number of requirements on the individuals and organisations that are involved in the design, implementation and maintenance of safety-related systems. It does not prescribe exactly how this management should be done, but it does require that formal development processes must be specified, followed and audited; see IEC 61508-1 clause 6. Organisations may have their functional safety management capability assessed and certified (for example under the CASS scheme), to demonstrate their competence. This can be particularly useful to end users and system integrators in demonstrating compliance to regulatory authorities. 2.7 Certified Products One of the advantages of using products certified to IEC 61508 by a recognised body, is that the certificate validates much more than just the actual product itself. The certification also confirms the suitability of: • • • • • the design processes used by the manufacturer of the product to avoid failures the design techniques and measures used to control failures (or limit the effects of failures) during operation the methods used to define the hardware fault tolerance the methods used to measure the safe failure fraction the methods used to measure the probabilities of failure Many more aspects are brought in to the certifying process and the certificate is sufficient proof that all requirements have been met for the safety integrity level claimed for the product. IEC 61508 does not require that certified safety products are used in safety-related systems, but if an end user or system designers elects to use non-certified products then they must take responsibility themselves for validating that all these elements have been carried out according to the standard. 2.8 IEC 61508 and ANSI/ISA S84.01 ANSI/ISA S84.01 is the process industry functional safety standard for North America and Canada, designed to be compatible with draft versions of IEC 61508. Now that IEC 61508 is published, it is expected that ANSI/ISA S84.01 will evolve further. ANSI/ISA S84.01 uses only three safety integrity levels (SIL 1 to 3), which are defined almost identically to those in IEC 61508. A further difference is that the ANSI/ISA standard does not cover the full safety lifecycle – from design to decommissioning – as does IEC 61508. IEC 61508 may be used on a voluntary basis in North America and Canada. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 16 of 24 Functional Safety System Selection ___________________________________________________________________________________ 3 Application Example 3.1 Low Demand Application - Emergency Shutdown System 3.1.1 Description of application An ESD System is intended to shut down the process safely in the event of a failure of the system controlling the process (in IEC 61508 this is termed the Basic Process Control System or BPCS), or when certain critical parameters exceed pre-set limits. It is used in order to protect against injury, loss of life, damage to the plant, and environmental damage in the event of a malfunction. The ESD System is almost always separate to the BPCS and usually has its own dedicated sensors and actuators. A typical application is shown in the diagram below. Input devices e.g. temperature or pressure transmitters MOST SafetyNet System Actuators e.g. shut-off valves, dump valves etc. Control room Figure 4 Typical Emergency Shutdown Application ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 17 of 24 Functional Safety System Selection ___________________________________________________________________________________ 3.1.2 MOST SafetyNet System The MOST SafetyNet System is a SIL2 certified “logic solver”. It comprises a number of IO Modules to which field instruments are connected and a Controller that runs the safety application programme. Data relevant to a safety application is given in the table below. General Information Manufacturer Model Logic Solver Type Configuration Architecture Type MTL Instruments Ltd MOST SafetyNet System Safety PLC 1oo1 B Certified for use up to SIL2 Hardware Fault Tolerance 0 Failure Rate Data Part Model Safety Controller AI Safety Module DI/DO Safety Module 8851-LC-MT 8810-HI-TX 8811-IO-DC λDU (dangerous undetected failure rate per 109 hours) 100 20 50 3.1.3 Required Input and Output types The input and output types discussed below are those required for the safety-related functions of the ESD System. Other input and output types (for non-safety-related functions) may also be used in the system. These must not compromise the safety function. • • • 4/20mA analogue inputs are used to interface to a number of measurement transmitters. Line fault monitoring is carried out by checking if the current input is either under- or overrange. Digital outputs are used to control valves. These will be normally energised and used with either shut-off or release valves. (Shut-off valves are kept normally open by the energised digital output. Release valves are kept normally closed.) Line fault monitoring is not required on these outputs, as they would be de-energised by any line fault. Digital inputs are used for monitoring volt-free contacts. If the field wiring to the switch became open-circuit, it would not be possible to detect the closure of the switch and if the line became short-circuit, it would not be possible to detect that the switch was open. Line fault monitoring, in conjunction with end-of-line resistors, is used to identify and report open and short circuit faults in field wiring. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 18 of 24 Functional Safety System Selection ___________________________________________________________________________________ 3.1.4 Configuration and Programming The safe state for an ESD system is for the normally energised outputs to be de-energised, which can be triggered either by the programmed application or automatically by the safety features built in to the Safety System. Depending on the particular ESD application requirements, and the nature of the detected fault, it is possible that immediately triggering a shutdown of the process is neither necessary nor desirable. It may be possible – for example – to report to the operator that a fault has been detected and then set a timer to expire after a certain period, such that if the fault is not cleared when the timer expires, then shut down will be triggered. A fault that might be treated in this way would be (for example) a line fault on an input channel. These faults are such that the system retains some level of safety functionality – but the consequences of not immediately shutting down the process must be carefully considered as part of the safety analysis. Further, taking an approach whereby the process is not immediately shutdown requires that the mean time to repair the system must be considered directly in the analysis of probabilities of failure – and the simplified procedure shown in Figure 3 cannot be used. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 19 of 24 Functional Safety System Selection ___________________________________________________________________________________ 3.1.5 Probability of Dangerous Failure This Section gives a basic introduction to calculating the average probability of failure on demand (PFDavg) for a safety function incorporating the MOST SafetyNet System. PFDavg for a particular safety function is the sum of the probabilities of the average failure on demand of each element of the system, taking in to account the proof test interval of each element. Figure 5 below includes a pressure transmitter for an input device, an 8810-HI-TX Analogue Input Module, a Safety Controller, an 8811-IO-DC Digital I/O Module configured as an output and a pilot and control valve. 8810-HI-TX Safety AI Module Pressure Transmitter λDU = 100 x10 -9 Tp = 8760 hours PFDavg = 5x10 -4 λDU = 20 x10 -9 Tp = 8760 hours PFDavg = 1x10 -4 8851-LC-MT Safety Controller 8811-IO-DC Safety DI/DO Module λDU = 100 x10 -9 Tp = 8760 hours PFDavg = 5x10 -4 λDU = 50 x10 -9 Tp = 8760 hours PFDavg = 3 x10 -4 Pilot & Control Valve λDU = 1400 x10 -9 Tp = 8760 hours PFDavg = 6.1 x10 -3 λDU for all elements is failure rate per hour, PFDavg is the average probability of dangerous failure on demand. Tp is the proof test interval - 8760 hours is 1 year. PFDavg = 1/2 * Tp * λDU Figure 5 Typical Low Demand Application PFDavg for each element is calculated according to the equation above, where λDU is the undetected dangerous failure rate per hour and Tp is the proof test interval (also in hours). Tp in this example is 8760 hours (1 year) for all components of the safety function. The value for PFDavg is half of the product of Tp and λDU – see Section 2.3.6. The overall PFDavg for the safety function is then: PFDavg = 5x10 -4 + 1x10 -4 + 5x10 -4 + 3x10 -4 + 6.1x10 -3 = 7.5 x 10 –3 –2 The PFDavg limit for SIL2 is < 10 , which is the case for this example. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 20 of 24 Functional Safety System Selection ___________________________________________________________________________________ 3.1.6 Response Time The response time requirement of typical ESD System - from detecting a fault or alarm to completion of an action by an output device can vary considerably, according to the nature of the process under control. A safety function with an input transmitter or switch as a sensor and a valve as a final element would normally give a response time better than 10 seconds – with the operating time of the valve the dominant factor. The response time of the MOST SafetyNet System – in the range 50 to 200ms – is significantly faster than that of a typical valve, this will allow for much lower response times when combined with faster acting final elements. Figure 6 below shows typical response time figures for an ESD system. Pressure Transmitter 8810-HI-TX Safety AI Module 8851-LC-MT Safety Controller 8811-IO-DC Safety DI/DO Module Pilot & Control Valve Response time 50ms Response time 30ms Response time 100ms Response time 10ms Response time 4s Figure 6 Typical ESD System Response Times The typical response time for the system outlined above is: 0.05 + 0.03 + 0.1 + 0.01 + 5 = 4.19 seconds (i.e. within the 10 second process safety time) The worst case response time for the system outlined above (which would occur when the input cycles of the transmitter and the AI module become as un-synchronised as is possible, so that their individual contribution to the response time is doubled) is: 0.05*2 + 0.03*2 + 0.1 + 0.01 + 5 = 4.27 seconds (i.e. still within the process safety time) ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 21 of 24 Functional Safety System Selection ___________________________________________________________________________________ Appendix A – Glossary of terms and abbreviations Terms and Abbreviations for IEC61508 Note: where a definition of the term or abbreviation is given in IEC61508-4 “Definitions and abbreviations”, the definition from the standard is given first in quotation marks, followed by further explanation if this is necessary. 1oo1D – a system which has no hardware fault tolerance and some level of diagnostic coverage to detect faults. 1oo2D – a system which has a hardware fault tolerance of “1” and some level of diagnostic coverage to detect faults. Average probability of failure of protection on demand – or PFDavg is the probability that a safety system will be unable to carry out its required safety function when a hazardous situation arises and a demand – in other words a request for the safety function to act – occurs. This probability is used to determine the suitability of safety systems in low demand applications. The value of PFDavg of a particular element within the safety system is determined by its intrinsic reliability, but also by the length of time between proof tests. Average probability of failure on demand (PFDavg) – “is the safety integrity failure measure for safety-related protection systems operating in low demand mode” Continuous mode – also known as high demand. A safety function for high demand or continuous mode may be required to carry out its safety function more often than once per year. The alternative is a low demand application, in which the safety function would normally be required to operate once per year, or less. Control failures – a number of techniques are specified in the standard to PFDavg during the operation of the E/E/PE safety-related system. These techniques, when combined with the techniques specified for fault avoidance in all stages of the safety life cycle, play an important part in ensuring that the E/E/PE safety-related system attains its safety integrity level. Diagnostic test interval – “interval between on-line tests to detect faults in a safety-related system that has a specified diagnostic test coverage”. The diagnostic test interval is an important factor (when combined with the fault reaction time), in determining if a particular safety-related system (with no tolerance to hardware faults) is suitable for use in a given high demand/continuous mode application. Electrical, electronic or programmable electronic system (E/E/PES) – “system for control, protection or monitoring based on one or more electrical/electronic programmable electronic devices, including all elements of the system such as power supplies, sensors and other input devices, data highways and other communication paths, and actuators and other output devices.” ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 22 of 24 Functional Safety System Selection ___________________________________________________________________________________ Equipment under control (EUC) – the equipment, plant and machinery that is the source of the risk. EUC control system – “system which responds to input signals from the process and/or from an operator and generates output signals causing the EUC to operate in the desired manner”. EUC risk – “risk arising from the EUC or its interaction with the EUC control system”. External risk reduction facility – “measure to reduce or mitigate the risks which are separate and distinct from, and do not use, E/E/PE safety-related systems or other technologies safety-related systems. Examples: A drain system, a fire wall and a bund are all external risk reduction facilities. Fault avoidance – “use of techniques and procedures which aim to avoid the introduction of faults during any phase of the safety lifecycle of the safety-related system”. Fault reaction – the time taken for safety function to perform its specified action - to achieve or maintain a safe state. This should be considered along with the diagnostic test interval and the process safety time for systems that have a hardware fault tolerance of zero and which are operating in high demand mode. Final elements – the actuators (such as valves, solenoids, solenoid valves, pumps, alarms etc.) that carry out an action to control the process or carry out the safety function. Functional safety – “part of the overall safety relating to the EUC and the EUC control system which depends on the correct functioning of the E/E/PE safety-related systems, other technology safety-related systems and external risk reduction facilities”. Hardware fault tolerance – IEC 61508 defines fault tolerance as “ability of a functional unit to continue to perform a required function in the presence of faults or errors”. Hardware fault tolerance is obviously fault tolerance specifically related to hardware. Harm – “physical injury or damage to the health of people either directly or indirectly as a result of damage to property or to the environment” Hazard – “a potential source of harm”. The standard covers harm caused in both the shortterm – such as harm from an explosion – and the long term – such as harm from the release of a toxic substance. Hazard and risk analysis – part of the development of the overall safety requirements. Hazardous event – “a hazardous situation which results in harm”. Hazardous situation – “circumstances in which a person is exposed to hazard(s) High demand – also known as continuous mode. A safety function for high demand or continuous mode may be required to carry out its safety function more often than once per year. The alternative is a low demand application, in which the safety function would normally be required to operate once per year, or less. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 23 of 24 Functional Safety System Selection ___________________________________________________________________________________ Low demand – a safety function for low demand applications may be required to carry out its safety function once per year or less. The alternative is a high demand/continuous mode application, in which the safety function would normally be required to operate more than once per year. Other technologies – IEC 61508 is concerned with the use of electrical, electronic and programmable electronic systems to provide safety systems. “Other technologies” are neither electrical, electronic nor programmable electronic, but the standard recognises that such protection based on alternative technologies – such as a hydraulic system - can be used in risk reduction. Probability of dangerous failure per hour (PFH) - “is the safety integrity failure measure for safety-related protection systems operating in high demand mode” Process safety time – “the period of between a failure occurring in the EUC or the EUC control system (with the potential to give rise to a hazardous event) and the occurrence of the hazardous event if the safety function is not performed”. Programmable electronic system – “system for control, protection or monitoring based on one or more programmable electronic devices, including all elements of the system such as power supplies, sensors and other input devices, data highways and other communication paths, and actuators and other output devices”. Proof test – “periodic test performed to detect failures in a safety-related system so that, if necessary, the system can be restored to an “as new” condition or as close as practical to this condition”. Random hardware failures – “failure, occurring at a random time, which results from one or more of the possible degradation mechanisms in the hardware”. Residual risk – “risk remaining after protective measures have been taken”. This level of risk should typically be lower than the “tolerable risk” once protective measures have been taken. Note, it is not necessary that this risk is zero – but it should be below what is considered a “tolerable risk”. Response time – the standard does not specifically define “response time”, but for convenience in this safety manual, it is taken as if it were a defined concept. Given that condition, response time is the time taken from the input to the sensor (or input device) associated with a particular safety function being set, to the output device (final element) completing its required action. This time period includes the time taken for the E/E/PE system to carry out any software applications and communicate with the sensors and final elements. Risk – “the combination of the probability of occurrence of harm and the severity of that harm”. Safe failure fraction – “of a subsystem is defined as the ratio of the average rate of safe failures plus dangerous undetected failures of the subsystem to the total average failure rate of the subsystem”. Safe state – “state of the EUC when safety is achieved”. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 24 of 24 Functional Safety System Selection ___________________________________________________________________________________ Safety function – “function to be implemented by an E/E/PE safety-related system, other technology safety-related system or external risk reduction facilities, which is intended to achieve or maintain a safe state for the EUC, in respect of a specific hazardous event”. Safety integrity level – “discrete level (one out of a possible four) for specifying the safety integrity requirements of the safety functions to be allocated to the E/E/PE safety-related systems, where safety integrity level 4 has the highest level of safety integrity and safety integrity level 1 has the lowest”. Safety life cycle – “necessary activities involved in the implementation of safety-related systems, occurring during a period of time that starts at the concept phase of a project and finishes when all of the E/E/PE safety-related systems, other technology safety-related systems and external risk reduction facilities are no longer available for use”. Safety-related systems – “designated system that both - implements the required safety functions necessary to achieve or maintain a safe state for the EUC; and is intended to achieve, on its own or with other E/E/PE safety-related systems, other technology safety-related systems or external risk reduction facilities, the necessary safety integrity for the required safety functions”. Safety requirements specification – “specification containing all the requirements of the safety functions that have to be performed by the safety-related systems”. This should include the action the safety function is required to perform and also the safety integrity requirements of the safety function. Sensors – input devices to the safety function. SIL – see safety integrity level. Systematic failure – “failure related in a deterministic way to a certain cause, which can only be eliminated by a modification of the design or of the way the manufacturing process, operational procedures, documentation or other relevant factors”. Tolerable risk – “risk which is accepted in a given context based on the current values of society” Type A system – a subsystem will be regarded as type A if, for the components used to achieve the safety function can satisfy the following requirements: (a) the failure modes of all the constituent components are well defined (b) the behaviour of the subsystems under fault conditions can be completely determined (c) there is sufficient dependable failure data from field experience to show that the claimed rates of failure for detected and undetected dangerous failures are met. Type B system - a subsystem will be regarded as type B if, for the components used to achieve the safety function: (d) the failure mode of at least one constituent component is not well defined (e) the behaviour of the subsystems under fault conditions cannot be completely determined (f) there is insufficient dependable failure data from field experience to support the claims for rates of failure for detected and undetected dangerous failures. ______________________________________________________________________________________ MTL Open Systems Technology Ltd. Issue: 1.0 04 April 2006 Page 25 of 24