IMPLEMENTING PROTECTIVE FUNCTIONS IN BPCS AN COMBINED SYSTEMS Edward M. Marszal Christopher P. Weil Kenexis Consulting Corporation 2929 Kenny Road Suite 225 Columbus, OH 43221 (614) 451-7031 edward.marszal@kenexis.com christopher.weil@kenexis.com ABSTRACT Since the release of standards defining the proper implementation of safety instrumented systems, there has been a great deal of misunderstanding related to the actual requirements for separation of the Basic Process Control System (BPCS) from the Safety Instrumented System (SIS). This uncertainty has been amplified by proponents on both sides of the issue. One group believes that if the BPCS is designed in accordance with IEC 61508 to the appropriate Safety Integrity Level (SIL), combined BPCS and SIS is acceptable and would advocate this integrated system to be considered a “good engineering practice”. The other group feels that absolutely no safety functionality should be performed in the BPCS and an absolute separation of functionality between the BPCS and SIS should be observed. As with any standard that requires interpretation, both camps are right and wrong. There are situations where employing safety functionality in the BPCS is an appropriate decision. This paper will examine some common examples of these situations, and provide nomenclature for future shorthand descriptions of this functionality, including: • Courtesy Action • Mimic Action • BPCS-Only Protective Function • Pre-Emptive Strike • Additional (Non-Safety Critical) Inputs The benefits of combined systems have been extensively promoted. Combined systems are typically composed of IEC 61508 compliant SIS logic solver that also serves as the BPCS. If a proper risk analysis is performed on these systems, it is very difficult to justify their use. The integrity requirements for combined systems is extremely high because there are failure modes that will simultaneously generate a hazard and also disable multiple protection layers. This paper will demonstrate that for most process plants with typical likelihood, consequences, and risk acceptance criteria, if a combined system were to be used it would require a SIL 5 to SIL 6 rating which is not currently available, or defined by the standards. This paper will also demonstrate the high level of detail and effort that is required to justify a combined system, which is significantly higher than if separate systems were used. INTRODUCTION The Basic Process Control System (BPCS) is responsible for normal operation of the plant and in many instances is used in the first layer of protection against unsafe conditions. Normally if the BPCS fails to maintain control, alarms will notify operations that human intervention is needed to reestablish control within the specified limits. If the operator is unsuccessful then other layers of protection, e.g. pressure safety valves, inherently safe process design, or Safety Instrumented System need to be in place to bring the process to a safe state and mitigate any hazards. LAYERS OF PROTECTION For this hierarchy to be effective it is critical that each layer of protection be independent or separate. This means that multiple layers (e.g., BPCS and SIS) must not contain common components that in the event of a single failure would disable multiple protection layers. In the case of SIS and BPCS, the traditional design practice of separation would prevent the SIS layer from becoming disabled when the BPCS layer experiences a problem. Consider the following accident case history where failure of a single component, which was shared by the BPCS and the SIS, resulted in a situation where shutdown was required and simultaneously prevented the safety action from being taken. In the last five years a US refinery experienced the devastating effects caused by placing a demand on a safety function while simultaneously inhibiting the safety function at the same time. The scenario occurred as follows: 1. The insulation bag around flow transmitter FT-101 becomes displaced and fails to provide proper insulation 2. Flow transmitter FT-101 taps freeze, also freezing the process variable 3. FIC-101 set point is lowered 4. FIC-101 closes FV-101 in an attempt to lower the process variable 5. FT-101 and FSLL-101 fail to sense the low flow condition because the process variable is frozen in place (literally), and in turn fail to close fuel gas valve XV-102 6. Heater-101 pass tubes overheat and rupture causing a large fire and total destruction of the heater The elimination of single failures that can disable multiple protection layers has lead to many discussions about separation. This reasoning has been a leading factor in the separation of the SIS from the BPCS. Responsible designers and governing bodies have made standards that enforce this separation. OVERLAP OF BPCS AND SIS A separate SIS minimizes the risk of common cause failures and is the traditional industry standard. Though separation allows for safer operations, it is not without its shortcomings. The SIS has ownership of the necessary and sufficient equipment to execute the required safety instrumented functions to bring the process to a safe state. In many cases the SIS will also have ownership of equipment that is “additional” to the necessary and sufficient equipment. An example is show below: High temperature in Reactor-102(indicating potential runaway reaction) initiates reactant shutdown. The closing of the double block valves are the necessary and sufficient action that must function to bring the reactor to a safe state when a temperature limit is exceeded. The closing of LV-102 is an “additional” action which has no safety consequences but is an action that is good engineering which assists in easy startup. The BPCS has ownership of many functions that industry experts would agree should be executed in addition to the safety instrumented functions. The advancements in technology allowing easy data mapping and networking between the SIS and BPCS have enabled these “additional” functions or actions to be easily implemented. Courtesy Action Action taken by the BPCS on a final element controlled by the BPCS that is in the same service as an SIS final element, on command from the SIS. These actions focus on equipment associated with the Safety Instrumented Function (SIF) controlled by the SIS, but are nonsafety critical. Consider the following example. The risk analysis for the following determined that LT-102, SIS, and XV-102 were the necessary and sufficient equipment to mitigate the potential hazard of a low level in V-101 (i.e., provided 100% of the required risk reduction). In this case, a “Courtesy Action” can be accomplished by the BPCS because the final element flow control valve (LV-101) is in the same service as shutoff valve XV-102. When LT-102 senses a low level in V-101 the SIS commands XV-102 to close. The SIS communicates activation of the shutdown to BPCS, which then places LIC-101 in manual and 0% output closing LV101. In this case, the BPCS can not be considered an independent layer of protection because failure of FIC-101 loop could be the initiating event that causes the low level. Mimic Action Action taken by the BPCS to mimic SIS action. The BPCS has independent sensors and final elements in the same service as the SIS. In some cases, mimic actions may be considered independent protection layers. In many cases, the BPCS may contain sensors and final elements that are in an identical service to equipment that is utilized by the SIS. Creating a protective function employing these BPCS inputs and outputs is a simple matter of programming. This technique maximizes the effectiveness of existing assets and in some cases can be used as an independent protection layer to “buy down” the integrity requirements of the associated safety instrumented function. In the following example it was determined that PT-102, SIS, and XV-102 where the necessary and sufficient equipment required to mitigate the potential hazard of a low pressure in this service. A “Mimic” can also be accomplished by the BPCS because the sensor PT-101 is in the same service as PT-102, and final element temperature control valve TV-103 is in the same service as shutoff valve XV-102. When PT-102 senses low pressure in this service the SIS commands XV-102 to close. The BPCS is programmed to mimic the SIS in that when PT-101 senses low pressure in this service PSLL-101 commands TV-103 to close. This action can also be done simply by having a high signal from PT-101 cause TIC-103 to be placed in manual with 0% output, performing the same action with no additional equipment required. In this scenario the BPCS mimic might be considered layer of protection since failure of either PT-101 or the TIC-103 loop would not result in a demand being placed on the system. BPCS-Only Protective Function Action taken by the BPCS that has been determined require a risk reduction of less than SIL 1. Consider a tank overfill protection function for aqueous ammonia. Risk analysis for the following example demonstrated that no SIS is required to reduce the risk of overfill to a tolerable level (i.e., SIL selection results in SIL 0). Since the SIL selection resulted in 0, the user has the flexibility to employ the function in whatever hardware platform is convenient (or even not employ the function at all). Most instances of BPCS-Only protective functions are employed more for convenience to operations than for safety reasons. Pre-Emptive Strike Action taken by the BPCS to place the process into a safe state prior to the SIS taking action. This action may be considered in the SIL selection analysis if failure of the BPCS is not the initiating event that causes the hazard. In the following ethylene oxide reactor example TT-101A-F, SIS, and closing XV-101 were the necessary and sufficient equipment to mitigate the potential hazard of a high temperature in Reactor101. A “Pre-Emptive Strike” can be accomplished by the BPCS when any 5 of the 50 temperatures in the reactor are greater than set point. The BPCS signals the SIS to take the same safety actions as a high temperature in the reactor. In this example, no BPCS layer of protection can be counted in this example since the failure of TIC102 loop could be the initiating event that causes the high temperature and the final elements that are employed are not separate from the SIS. Additional (Non-Safety Critical) Inputs Inputs to the BPCS initiate shut down actions that are non-safety critical. In some cases, input measurements or process status information contained in the BPCS may indicate that the process should be shutdown. Not for safety reasons, but for convenience, for example the end of a batch. While the BPCS contains the measurements and status information, the valves that need to be moved are often “owned” by the SIS. Consider the following batch operation. A safety function which detects high temperature stops feed to the reactor. Upon completion of the batch, stopping of feed is required prior to removing the reaction products from the vessel. At this phase in the reaction, closure of the feed valves is not a safety critical action. Since the SIS already contains two valves that are capable of stopping the feed flow, a decision is made to utilize the SIS valves, by way of a DCS command, to close the valves at the end of a batch. This course of action was selected instead of supply another separate and dedicated valve for BPCS. I-1 I-1 XY 101A SV IAS FEED BATCH COMPLETE XY 10AB SV IAS UY 123 M FC FC XV-101A XV-101B COOLANT OUT COOLANT IN R-102 REACTOR TT 101A TT 101B SIS COMBINED SYSTEMS Industry guidance for separation is abundant, but not always clear. One can argue that the most recent standards allow the use of combined SIS/BPCS system, if the logic solver used meets the higher requirements of the SIS. While this may seem appropriate, it should not be undertaken without completely understanding the requirements of the standard and the dramatically increased degree of analysis. Guidelines for Safe Automation of Chemical Processes defines separation as shown below. Separation – The physical and functional isolation of all hardware and software elements. Physical separation is defined as the requirement that the basic process control function (regulatory control – BPCS) and the safety interlock function 1 (SIS) be performed in different logic solvers. Functional separation is achieved through the elimination of common-mode failures in execution of the BPCS and SIS functions. This may require the separation of BPCS and SIS sensors, final elements, I/O components, and logic solvers, the software operating systems, and the application programs. Some communications may be allowed between separate components as long as no common mode failures can occur. A number of resources that have published guidance on safety instrumented system design recommend the separation of basic process control systems. The Center for Chemical Process Safety of the American Institute of Chemical Engineers makes the following recommendation. SIS separation from the BPCS is required to ensure that safety and environmental aspects of the SIS are consistent with user, manufacturer, local, national, and international standards and guidelines because separation will: • Minimize the effects of human error on the SIS from normal BPCS activities. • Protect safety system software from unintentional changes, by isolating the PES-based SIS from process control induced programming changes (i.e., in the BPCS) • Provide access security • Ensure that SISs are maintained safely and correctly • Facilitate stand-alone testing and maintenance of the SIS and BPCS • Ensure security and integrity, allowing the PES (programmable electronic system)-based SIS to achieve a level of security and integrity equal to or better than a direct-wired SIS • Minimize common mode faults (both hardware and software). Separation issues should be considered at the early stage of control system conceptual design. The SIS should have separate identification, documentation, programming and maintenance. In addition to the guidelines and best practices, industry standards, have also developed requirements for separation of basic process control and safety. In the process industries, SIS design is typically performed in accordance with ISA 84.01 – 2004 (IEC 61511 – 2002). This standard presents the following requirements. 11.2.2 Where the SIS is to implement both safety and non-safety instrumented function(s) then all the hardware and software shall be treated as safety instrumented function(s) to the highest SIL required by the SIS. If it can be shown that there is adequate independence between the safety instrumented function(s) and non-safety instrumented function(s) (i.e., that the failure of any non-SIF does not cause a dangerous failure in the SIF) then this requirement does not have to be satisfied. NOTE 1 Wherever practicable, the safety instrumented functions should be separated from the non safety instrumented functions. 1 The term safety interlock function was replaced with the term safety instrumented function after the release of the ISA 84.01, IEC 61508 and IEC 61511 standards. NOTE 2 Adequate independence means that neither the failure of any non-safety functions nor the programming access to the non-safety software functions is capable of causing a dangerous failure of the safety instrumented function. 11.2.3 Where the SIS is to implement safety instrumented function(s) of different safety integrity levels then the shared or common hardware and software shall be treated as belonging to the highest safety integrity level unless it can be shown that the safety instrumented functions of lower safety integrity level can not negatively affect the safety instrumented functions of higher safety integrity levels. 11.2.4 If it is intended not to qualify the basic process control system to this standard, then the basic process control system shall be designed to be separate and independent to the extent that the functional integrity of the Safety Instrumented System is not compromised. NOTE 1 the SIS. Operating information may be exchanged but should not compromise the functional safety of NOTE 2 Devices of the SIS may also be used for functions of the basic process control system if it can be shown that a failure of the basic process control system does not compromise the safety instrumented functions of the safety instrumented system. 11.2.10 A device used to perform part of a safety instrumented function shall not be used for basic process control purposes, where a failure of that device results in a failure of the basic process control function which causes a demand for the safety instrumented function, unless an analysis has been carried out to confirm that the overall risk is acceptable. NOTE When a part of the SIS is also used for control purposes and a dangerous failure of the common equipment would cause a demand for the function performed by the SIS than a new risk is introduced. The additional risk is dependent on the dangerous failure rate of the shared component because if the shared component fails a demand will be created immediately that the SIS may not be capable of responding to. For that reason additional analysis will be necessary in these cases to ensure that the dangerous failure rate of the shared equipment is sufficiently low. Sensors and valves are examples of where sharing of equipment with the BPCS is often considered. The statements shown above and specifically those drawn from the relevant standards show a strong preference for separation between SIS and BPCS functions, but can be construed to allow combination of the two types of systems in certain narrow circumstances after a large amount of rigorous, detailed, and expensive analysis. The ISA 84.01 -2004 (IEC 61511 – 2002) clauses listed above provide clear requirements that must be met when combining BPCS and SIS functionality into the same device. In order to justify the use of the combined BPCS/SIS system, demonstration of the following requirements must be made. 1. The hardware and software of each individual SIF and the individual BPCS functions are separated to the greatest degree possible. All shared code or equipment is then designed to the highest SIL that the respective code or equipment is required to meet. 2. Create adequate independence between the safety and non-safety application software code so that the non-safety BPCS application code does not need to be designed and continually tested as though it were a safety function of the highest SIL contained in the logic solver (which is expected to be SIL 3). 3. The basic process control system is qualified as a safety instrumented system to the highest SIL contained in the logic solver. 4. Confirm that for all SIF that there is no single device where failure of that device results in a failure of the basic process control function which causes a demand for the safety instrumented function, other than the CPU of the logic solver. This will require that BPCS and SIS functions do not share field equipment (including I/O cards), and that BPCS and SIS software code is functionally separated. 5. Demonstrate that equipment which is shared by BPCS and SIS (i.e., CPU logic solver) is designed to the highest SIL contained in the system (expected to be SIL 3), and that the risk posed by failure of the CPU, simultaneously placing a demand on the SIF and preventing its ability to respond to that demand, is tolerably low. An analysis of the ramifications of combined BPCS and SIS is not a trivial task. This analysis will require development of a list of all scenarios under which a combination of SIS and BPCS could result in a single failure that will simultaneously create a demand on the SIS and also prevent it from being able to take action. It was determined that demonstration of the bullet points listed above, which would then demonstrate compliance with relevant sections of the applicable standards, could be performed by completing the following steps. 1. Prepare a list of safety instrumented functions (SIF List) that are to be implemented by the combined control system. The SIF List will contain a description of the action taken by the system along with all inputs and outputs of the function that are safety relevant. 2. Analyze each safety instrumented function. The SIF are reviewed to determine all of the initiating events that can place a demand on the SIF. When BPCS function failures can place a demand on the SIF, the field equipment (sensors and final elements) are reviewed to ensure that the BPCS field equipment is completely separate from the SIF field equipment (including I/O cards). If any commonality of equipment is identified, recommendations should be prepared and implemented to separate the functions. 3. Determine, for each SIF, the extent of “shared equipment”, which is only expected to include the logic solver CPU, and quantitatively verify that the risk posed by failure of the “shared equipment” is tolerable. This will be performed by preparing a fault tree analysis that will quantitatively estimate the frequency at which a logic solver CPU will result in an unwanted accident through random hardware failures. This frequency will then be compared against the site’s tolerable risk guidelines. 4. Review the impact of systematic programming failures, and methods to prevent systematic failures by functionally separating logic solver application programming will be presented. 5. Ensure that the CPU logic solver is designed to the highest SIL of any functionality that is to be implemented by the logic solver (expected to be SIL 3). 6. Ensure that the overall risk of scenarios where the BPCS/SIS failure will directly lead to a consequence is sufficiently low. While completion of steps 1 through 6 is possible, there are a number of common situations where the result of the study will show that separation is absolutely essential. In addition, in the experience of the authors, the cost of performing a study can be prohibitively expensive, in some case greater than the hardware cost. As a result, use of a combined system can not typically be justified on a financial basis (due to the additional cost of study) let alone the problems with justification on a safety basis. Example of Analysis of Combined System Common Failure Consider a situation where a combined system is being used to control the temperature in a reaction vessel where the reaction products are prone to a runaway decomposition, which will cause an explosion with significant consequences. As the reaction progresses, the basic process control system slowly increases the amount of cooling water in order to maintain the reactor temperature at its set point. The reactor is also equipped with a SIF that will detect a high temperature in the reactor and stop feed. This situation is shown in the figure below. I-1 I-1 XY 101A XY 10AB SV IAS FEED SV IAS M FC FC XV-101A XV-101B R-102 REACTOR COOLANT OUT TT 101A COOLANT IN TT 101B TIC 102 SIS TT 102 Traditional layer of protection analysis style SIL selection might yield a requirement of SIL 3 or SIL 2 depending on the effectiveness of pressure relief and other independent protection layers. But in order to justify use of a combined system, an additional analysis is required to ensure failure of common components will not result in intolerable risk levels. If the tolerable risk criteria for the facility that the SIS design is based on was an individual risk of fatality of 1 x 10-5 per year (which is conservative, but not out of line with industry), and the consequence of this event were 10 or more fatalities, then the tolerable frequency of this accident would be 1 x 10-6 per year, or about 1 x 10-10 per hour. Analysis of the failure modes of combined components, specifically the CPU for this system will demonstrate that a failure of the CPU which causes the CPU outputs to be ‘frozen’ into position will not only place a demand on the SIS, because the BPCS will not be able to regulate the temperature rise, but will also prevent the SIS from taking action. If a combined system is to be justified, the this failure mode will have to occur at a tolerably low frequency. As described earlier, this frequency is 1 x 10-10 per hour. If we further assume that a SIL 3 certified logic solver was selected, we can assume that the failure rate achieved by that logic solver will fall into the SIL 3 range. Unfortunately, a failure rate performance of 1 x 10-10 per hour is so stringent that it falls outside of the defined SIL ranges, and would be equivalent to a SIL 5 or SIL 6, if those levels of performance were defined. In this scenario, use of a combined system can not be justified because the risk posed by the process, considering common equipment failures, is too high.