Versatile Modular Redundancy (VMR) Architecture for a Safety Instrumented System (SIS) White paper - May 6, 2020 Versatile Modular Redundancy (VMR) Architecture Table of Contents Part 1 Introduction: Architecting Safety Control Part 2 Versatile Modular Redundancy Part 3 SIL 3 Single Module . . . . . . . . Part 4 Versatile Modular Redundant Options for Higher Availability . . . . . . . . . . . Part 5 Diagnostic Capability and Fault Behavior Part 6 Integration with DCS Part 7 Safety Communication . . . Part 8 Integration with Field Devices Part 9 Summary . . . . . . . . . . . . . . . www.yokogawa.com . . . 4 . 5 . 6 . . . 8 10 13 15 . 16 . 18 2 Versatile Modular Redundancy (VMR) Architecture KEY TAKEAWAYS ü While an SIS is much like a control system, it is intended for a different purpose that requires it to meet additional design requirements. ü An SIS must achieve high safety integrity and high system availability. ü The Versatile Modular Redundancy (VMR) architecture provides for SIL 3 within a simplex structure by building redundancy for safety integrity into a simplex module. ü The VMR concept further applies redundant modules for even higher availability purposes when needed. The “pair and spare” concept takes advantage of a pair of channels inside each simplex module and matches each module with a redundant partner. ü By providing for redundancy options on a per module basis, the VMR concept allows maximum possible flexibility to enable the most cost- effective safety system design. ü All hazardous process plants require control coupled with some element of an independent safety layer; the SIS is able to interface with control systems. This allows designers to decide on the level of SIS integration with—or segregation from—the control system. www.yokogawa.com 3 Part 1 Introduction: Architecting Safety Control A safety instrumented system (SIS) essentially comprises sensors, a logic solver, and final elements that perform a Safety Instrumented Function (SIF). An SIF is designed to protect against specific hazards such as high pressure or high temperature in a process. The SIS is much like a control system in that it has sensors, controllers, and actuators. However, it is intended for a very different purpose from a control system and additional design requirements must be met to fit the context of the application. One of these key design attributes is redundancy. A distributed control system (DCS) commonly uses redundancy; its main purpose is to increase availability by introducing a level of fault tolerance. In an SIS, redundancy is required not only to increase availability but also to maintain safety integrity in light of multiple failure categories. There are many ways in which to arrange the redundancy of SIS components when building a system. Some redundancy arrangements Achieving high safety integrity and system availability at the same time is a dichotomy that has been a challenge to designers. have been designed to maximize the probability of successful operation (reliability or availability). Other arrangements exist to protect against specific failure modes (fail-safe, fail-danger)1. Achieving high safety integrity and system availability at the same time is a dichotomy that has been a challenge to designers. Including extra channels can improve safety by allowing the system to make signal comparisons. Seemingly, increasing the number of channels beyond two, to three or more would make things better. However, the reality is that adding further redundancy simply increases complexity and leads to a higher false trip failure rate. For pure safety purposes, a dual redundant configuration voting “1 out of 2” (1oo2) can be shown to be the most effective of the standard architectures. However, for availability purposes the most effective architecture is a dual 1. Control Systems Safety Evaluation and Reliability, W. M. Goble, 2nd Edition, ISA, ISBN # 1-55617-636-8. redundant system voting “2 out of 2” (2oo2). The challenge is to create a system that acts as 2oo2 for availability and 1oo2 for safety. www.yokogawa.com 4 Part 2 Versatile Modular Redundancy Rising to the challenge is the Versatile Modular Redundancy (VMR) architecture for an SIS. Each simplex system uses two internal channels VMR is a concept that applies redundancy for availability purposes where it is needed. with comparison; the comparison diagnostics from the two channels yields an extremely high level of safety integrity. With additional diagnostic measures other than comparison, simplex systems using this approach reach SIL 3. VMR is a concept that applies redundancy for availability purposes where it is needed. An SIS with VMR uses a “Pair and Spare” philosophy. This concept takes advantage of the pair of 1oo2 channels inside each simplex module and matches each module with a redundant partner. The redundant set of modules provides high availability and can tolerate a failure in each individual redundant set. www.yokogawa.com 5 Part 3 SIL 3 Single Module No automatic protection system can have perfect integrity but it is clear that higher risk hazards must have higher integrity designs. According to international standards this integrity is defined by “Safety Integrity Levels” (SILs). Four levels are defined, from SIL 1 to SIL 4, with level four indicating the highest integrity and level one indicating the lowest. A SIL rated SIS is designed to act independently of the regular control system to bring the process to a safe state, to mitigate the consequences of a hazard, or to provide independent interlocking. Many commercial SISs require the addition of extra controllers, input/output modules, or other cards to achieve increased safety integrity, but with the VMR architecture, additional modules are only used when increased fault tolerance is needed for higher availability. Meeting the requirements of SIL 3 with a simplex logic solver structure, the VMR concept allows For programmable logic solvers utilized in the process industry, the highest SIL achievable is SIL 3. the designer to apply the right level of redundancy purely for availability purposes. The end user benefits from having a system that is more flexible and more efficient, with a smaller footprint than equivalent systems. For programmable logic solvers utilized in the process industry, the highest SIL achievable is SIL 3. Reaching SIL 3 with a simplex logic solver structure is not usually possible without employing 1oo2 redundancy. Many systems rely on the duplication of controller, input and output modules to achieve the SIL 3 rating. www.yokogawa.com 6 Versatile Modular Redundancy (VMR) Architecture However, an SIS using the VMR architecture provides for SIL 3 within a simplex structure by building the 1oo2 redundancy for safety integrity into the simplex module. The major benefit of this is that logic solver safety integrity is a given feature for the most stringent process applications, so designers can concentrate on other critical aspects of the safety instrumented system design. To fulfil the SIL3 requirement, each SIS module has full internal redundant circuits, comparing mechanisms between two channels, and other diagnostic measures. The comparison circuits are carefully Figure 1 – Inside a simplex structured system designed as bi-directional to avoid a single point of failure that may cause dangerous-undetected failure in the module. The diagram of a simplex structured system is shown below. Input Module CPU Module Output Module Input Circuit CPU Circuit Output Circuit Compare & Diagnostic Compare & Diagnostic Compare & Diagnostic Input Circuit CPU Circuit Output Circuit www.yokogawa.com 7 Part 4 Versatile Modular Redundant Options for Higher Availability Unlike many SISs on the market, those using the VMR concept provide for redundancy options on a per module basis. This allows for maximum possible flexibility. Users can select simplex or dual controllers and then add any mix of redundant or non-redundant input and output modules to the main rack with any mix of discrete or analog channels required. These module redundancy options are assured and based on redundant communications provided by default throughout the system on the control network and remote I/O bus. The redundant units operate in a modified active-standby configuration. The redundant control of each pair units is done individually and independent from other pairs. Due to Independent Figure 2 – Diagram of a full redundant system control, the information which is active in a pair is not needed for other pairs. This is based on the field-proven pair and spare technology. Figure 2 shows the block diagram of a full redundant system. Input Module CPU Module Output Module Input Circuit CPU Circuit Output Circuit Compare & Diagnostic Compare & Diagnostic Compare & Diagnostic Input Circuit CPU Circuit Output Circuit Redundant Control Redundant Control Redundant Control Redundant Control CPU Circuit Redundant Control Input Circuit Input Circuit Output Circuit Compare & Diagnostic Compare & Diagnostic Compare & Diagnostic Input Circuit CPU Circuit Output Circuit www.yokogawa.com 8 Versatile Modular Redundancy (VMR) Architecture Both of the redundant input modules read a field value while validating the integrity of this value. Both CPU modules read the input Because there is no wasted redundancy, the VMR architecture proves to be a very cost effective approach in multiple application scenarios. value from the active input module and calculate an output value while exchanging the value for validation. Both output modules are ready for output but only the active unit provides a real output to the field. These activities are done while executing self diagnosis by each module and each pair. With a detected fault, the active unit gives up control and the standby unit energizes. During this time, the standby CPU module is updated with key variables from the active unit. If a failure occurs in an active module and a standby module takes control, the other pairs stay healthy and are not affected. This highly flexible approach means that optimum system designs can be achieved. With safety integrity to SIL 3 assured, the designer can match the fault tolerant requirements to the application. For example, when such requirements are fulfilled by 2oo3 sensors connected to three input modules, redundant options for each input module are not required. Because there is no wasted redundancy, the VMR architecture proves to be a very cost effective approach in multiple application scenarios. www.yokogawa.com 9 Part 5 Diagnostic Capability and Fault Behavior One primary difference between a control system and SIS is the diagnostic ability and reaction to system faults detected by those With VMR, there is no degradation of safety following a single failure. diagnostics. While control systems have very good diagnostic capability, such equipment will not pass rigorous SIL 3 diagnostic requirements because they are designed only for normal process operations. A SIL 3 rated SIS will have diagnostics that can detect over 95% of potentially dangerous failures. Where a fault is detected in a safety system, a pre- defined reaction that will both highlight and solve the problem is critical. This approach can be broken down as follows: Alarm: Provide an immediate indication that a fault exists. Locate: Provide information about the exact location of the fault Classify: Provide enough information on how the fault can be repaired Reset: Provide detail on how to bring the system back to the fully healthy state. Upon suffering a system fault, many systems continue to provide protection, but often with reduced integrity. With VMR, there is no degradation of safety following a single failure. This is due to the inherent 1oo2 design of the simplex safety controller and I/O modules. In a fully redundant configuration, each redundant module pair is independent, so there is no possibility for a single failure to bring the entire system to a halt. This differs from other dual system designs that can see an entire system leg shut down when a single failure occurs in any of its associated modules. This is illustrated in Figure 3 in case of a failure in input part of the system, but behavior is same with in the case of a failure in the logic solver part or output part. www.yokogawa.com 10 Versatile Modular Redundancy (VMR) Architecture Sensor IP L OP Leg 2 IP L OP Sensor Leg 1 Traditional dual architecture OP IP L OP IP L L IP Actuator OP VMR dual architecture (Input Fault, degraded to SIL?) input module fault L IP Actuator Figure 3 – Response to a single IP (Input Fault, remains SIL 3) To see the effect of different types of safe and risky failure, Figure 4 shows how the SIS using VMR architecture reacts in various fault scenarios. System Healthy Upper side is control master If there is a single safe failure, the system remains running, without SIL degradation Subsequent safe failures can occur, Sensor L OP IP L OP IP L OP IP L OP IP L OP IP L OP IP L OP IP L OP IP IP Sensor IP IP Sensor provided they are not of the same module IP IP type as the first failure. In this case, the system continues to run and there is no SIL degradation IP Sensor IP IP L L L L L L L L IP Actuator OP IP Actuator OP IP Actuator OP IP Actuator OP Figure 4 – Safe vs. dangerous failure behavior www.yokogawa.com 11 Versatile Modular Redundancy (VMR) Architecture Figure 5 shows only an example in which the function of the safety loop would be lost. However, shut down is not always the end result because the logic solver can use pre-defined values when it loses healthy input values. There is a selection users can make to avoid unwanted shutdown. Any two failures of similar module types, or any diagnosed dangerous failure will cause either fault annunciation or an automatic response to the fail safe condition (often called shut down). Sensor IP L OP IP L OP IP IP L L IP Actuator OP Figure 5 – Example of a failure of both input modules www.yokogawa.com 12 Part 6 Integration with DCS All hazardous process plants require control, coupled with some element of an independent safety layer. When designers determine that an independent SIS should be programmable, there needs to be further consideration about the level of interaction between the control systems and the SISs. In the past, so much has been focused on independence that factors which affect operations and maintenance are forgotten. A key element for successful operations and maintenance is the level of integration between the SIS(s) and the Distributed Control System (DCS). A tight coupling of these systems can provide large advantages to the end user in terms of ease of use, cost, and plant safety. While this might sound counter-intuitive, it is a fact that there are many common requirements; the key is to provide independence for safety integrity but allow interference-free commonality at all other levels of the DCS and SIS. This commonality of design requirements is summarized in Table 1. Common DCS / SIS requirements A key element for successful operations and maintenance is the level of integration between the SIS(s) and the Distributed Control System (DCS). Alarm Handling Interfaces to 3rd party equipment Operator interface (system status) Sequence Of Events Startup and maintenance bypasses or overrides System Training Benefits of tight integration CAPEX: Capital Expenditure OPEX : Operations Expenditure Increased safety in alarm flood situations, ease of use, lower CAPEX and OPEX Lower CAPEX Ease of use, lower CAPEX Increased data accuracy following shutdowns, lower CAPEX and OPEX Safety, ease of use Ease of use, lower OPEX Table 1 – Common requirements and benefits of tight integration www.yokogawa.com 13 Versatile Modular Redundancy (VMR) Architecture Ideally, the SIS interfaces easily and simply with the DCS, without the requirement for gateways or other types of interfaces that are commonplace in some SISs. The DCS operator station could support monitoring capability for the SIS as a standard function. Moreover, the SIS would not need a separated, dedicated network for safety communication because it achieves SIL3 safety communication on the control network even when integrated with the DCS (see next section). The layout and network design is identical for both DCS and SIS. This level of tight integration provides all of the benefits listed in Table 1. To give additional assurances that safety cannot be compromised, the SIS could use a separate engineering and maintenance interface that requires its own levels of security access. Figure 6 shows an example of how DCS and SIS nodes can be integrated into a common networked architecture that maintains safety integrity at the same time as assuring effective operations. • Figure 6: Example of SIS and DCS Integration (please see text for definitions) Ethernet • • • • In Figure 6, the following definitions apply: ENG is an engineering workstation for the DCS. SENG is an engineering workstation for the SIS. HIS, “Human Interface Station,” is an operator workstation. FCS, “Field Control Station,” is a DCS controller. SCS, “Safety Control Station,” is a safety instrumented system (SIS). ENG HIS SENG Control Network FCS Optical Repeater FCS Optical Repeater FCS SCS FCS SCS SCS www.yokogawa.com 14 Part 7 Safety Communication Process Unit 1 Control Network Modern process plants often require SISs for different process units and facilities. When this is the case, there is generally a requirement to be able to send signals from one safety controller to another. This can be for pure information purposes, but often there is a need to provide trip DCS Controller signals between different units or facilities. This can be done by physical point to point wiring, but this is a very costly when many signals are SIS required or when working over long distances. A far more cost-effective approach is to use a digital communications bus. In this case, the data Process Unit 2 DCS Controller Safetyrelated trip and alarm signals SIS that is passed from one controller to another is safety-related. To communicate safety-related information between SIS nodes there must be a guarantee that data is not corrupted when transmitted and received. VMR provides special protocol checking mechanisms within each safety node in a multiple node networked system. The checking mechanism guarantees that messages cannot be corrupted, and this assurance is provided certified to SIL 3. Safety and non-safety data Figure 7 – Example communications between process units can co-exist on the same physical media, and there is no chance of interference or corruption. Information on SIS signals and the entire system status is available to all operator stations that are connected to the control network. With up to 64 connectable nodes per domain, there is virtually no limit to the types of applications that can be handled by the combination of the DCS and SIS. Operating at a network speed of 1Gbits/second, the redundant control network in a contemporary DCS provides a backbone for both safety and non-safety communications. The network technology assures transmission repeatability and timing determinism. The checking mechanism is completely independent of the physical communications media being employed; the safety-related transmission can be via standard coaxial cable or optical fiber. www.yokogawa.com 15 Part 8 Integration with Field Devices When an engineer designs each safety instrumented function (SIF), a verification calculation is made to check that the design meets the required safety integrity level (SIL). The calculation considers the instruments chosen and their failure rates, the proof test capabilities, the proof test intervals, the redundancy, and other variables. The verification calculation is done for the sensors, the SIS logic solver, and the final elements. In a study done by Exida of over 8000 SIF designs2 the logic solver has accounted for less than 10% of the average probability of failure on demand (PFDavg). Figure 8 shows the distribution. 26.05% Sensors(s) 8.52% Logic Solver 65.43% Final Element(s) Figure 8 – PFDavg distribution from over 8000 SIF designs If the PFDavg of the field devices could be improved, significant cost savings in either capital expense or operating expense (via reduced proof testing) could be obtained. The SIS provides much of the functionality needed to properly integrate field devices. For example, an IEC-61508 certified differential pressure transmitter will indicate an internal problem such as a plugged impulse line by sending a 3.8 mA signal. The SIS recognizes this signal as a diagnostic and sends an alarm to the DCS alerting the operators to fix the problem. No shutdown is needed. To achieve this functionality in a non- integrated system careful application level programming involving 2. exida Market Report – A Statistical Study of over 8000 SIF designs, exida, 2005. analog filtering and comparison is needed. www.yokogawa.com 16 Versatile Modular Redundancy (VMR) Architecture The SIS will interface to a HART digital positioner for partial valve stroke testing. This functionality can also reduce SIF PFDavg and reduce operating cost. Exida has performed a comparative study of two SIS designs. There are significant differences in PFDavg, a measure of safety integrity and MTTFS, mean time to failure spurious (false trip). An “ad hoc” solution uses conventional design techniques including three transmitters, a SIL 3 logic solver and remote actuated valve. The integrated solution utilizes a single certified pressure transmitter, a SIL 3 certified SIS logic solver and a single remote actuated valve with the HART digital positioner configured for automatic partial valve stroke testing every week (168 hours). The results show the significant differences. When one considers the lower capital cost of the integrated solution, the differences are even more significant. When life cycle costs are considered (more frequent proof testing), the integrated solution becomes even more superior. Integrated solution Ad-hoc solution Single pressure transmitter, 2oo3 Pressure transmitters, Single remote actuated ball valve with HART positioner. Single remote actuated ball valve with 3 way solenoid. 8.98E-3 / 111 2.68E-2 / 37 Proof test interval (valve) 5 years 6 months Capital cost Lower Higher Low Very High Description MTTFS PFDavg / RRF Lifecycle cost SIL 3 logic solver, 39 years SIL3 logic solver, 20 years Table 2 – Exida study of two SIS designs, Integrated vs. Ad-hoc www.yokogawa.com 17 Part 9 Summary As the last line of defense, the safety instrumented system (SIS) must act reliably, exactly when it needs to. Unlike a control system, As the last line of defense, the safety instrumented system (SIS) must act reliably, exactly when it needs to. the SIS must satisfy a number of industry standards to meet this requirement. The SIS must achieve high safety integrity and high system availability. An SIS using Versatile Modular Redundancy (VMR) architecture maintains unlimited SIL 3 operation even under multiple failure scenarios. This reliability by design allows the SIS to continue operating long after safety systems using other architectures would have failed. www.yokogawa.com 18 © 2020 Yokogawa Corporation of America