Digital Instrumentation and Control Issues in Nuclear Reactor Safety George E. Apostolakis Massachusetts Institute of Technology apostola@mit.edu Presented at the National Center for Digital Government Harvard University March 1, 2004 MIT Department of Nuclear Engineering 1 Nuclear Power Plant MIT Department of Nuclear Engineering 2 1 Major Protection Functions of I&C Systems • Reactor Trip I&C Systems initiate rapid neutron poison insertion to shut down the chain reaction. • Engineered Safety Features Actuation I&C Systems initiate and control safety equipment that remove heat from the core or otherwise assist in maintaining the three physical barriers to radioactive release (cladding, reactor coolant pressure boundary, and containment). MIT Department of Nuclear Engineering 3 Background • Analog electro-mechanical systems in existing nuclear power plants are aging and are becoming obsolete. • Software-based, digital electronic systems for instrumentation and control are part of the design of advanced nuclear power plants. • They are being introduced more slowly into existing plants. • Digital I&C offer many advantages: ¾ Maintain better calibration ¾ Have improved capabilities, e.g., fault tolerance, self-testing, signal validation, and process system diagnostics ¾ Have higher data handling capacity MIT Department of Nuclear Engineering 4 2 Key Technical Issues • Potential new failure modes may be introduced. • Neither controlling the software development process nor verifying the end-product are fully satisfactory in assuring adequate quality. • There is the possibility of introducing new common-mode failures. • There is no methodology for assessing the impact of digital I&C on human performance. • There are no methods for evaluating and accepting commercial off-the-shelf digital I&C systems. [National Research Council, Digital Instrumentation and Control Systems in Nuclear Power Plants, 1997] MIT Department of Nuclear Engineering 5 Defense in Depth “Defense-in-Depth is an element of the Nuclear Regulatory Commission’s safety philosophy that employs successive compensatory measures to prevent accidents or mitigate damage if a malfunction, accident, or naturally caused event occurs at a nuclear facility.” [Commission’s White Paper, 1999] Regulatory Guide 1.174 description includes: •A reasonable balance is preserved among prevention of core damage, prevention of containment failure, and consequence mitigation. •Over-reliance on programmatic activities to compensate for weaknesses in plant design is avoided. MIT Department of Nuclear Engineering 6 3 Assuring High Reliability: General Design Criterion 35 (10CFR50 Appendix A) • A system to provide abundant emergency core cooling shall be provided. The system safety function shall be to transfer heat from the reactor core following any loss of reactor coolant at a rate such that (1) fuel and clad damage that could interfere with continued effective core cooling is prevented and (2) clad metal-water reaction is limited to negligible amounts. • Suitable redundancy in components and features, and suitable interconnections, leak detection, isolation, and containment capabilities shall be provided to assure that for onsite electric power system operation (assuming offsite power is not available) and for offsite electric power system operation (assuming onsite power is not available) the system safety function can be accomplished, assuming a single failure. MIT Department of Nuclear Engineering 7 Current Requirements • All criteria of 10CFR50 apply to safety-related digital I&C. • Additional requirements include: ¾ Electromagnetic compatibility ¾ A well-structured and well-executed software engineering development and system integration process. ¾ Defense against common-mode failures (must comply with NRC’s position on defense-in-depth and diversity). ¾ Software design errors are credible CMFs that must be specifically included in the evaluation. ¾ If a postulated CMF could disable a safety function, then a diverse means shall be required to perform this function. • “The staff does not endorse the concept of quantitative reliability goals as a sole means of meeting the NRC’s regulations for the reliability of digital computers used in safety systems.” [Standard Review Plan, Report NUREG-0800, Section 7.1, Nuclear Regulatory Commission] MIT Department of Nuclear Engineering 8 4 Use of Emerging Software Methods • Using such methods “will require careful consideration by the reviewer.” • Formal Methods use logic to prove formally that the specifications are complete and internally consistent. The staff neither requires nor will allow the use of formal methods to replace compliance with the acceptance criteria. • Expert systems, neural networks, fuzzy systems, and genetic algorithms “are not sufficiently mature at this time to support the definition of processes for evaluating conformance with the acceptance criteria…” MIT Department of Nuclear Engineering 9 Quantitative Health Objectives for Nuclear Power Plants (NRC) • The individual early fatality risk in the region between the site boundary and 1 mile beyond this boundary will be less than 5x10-7 per year (one thousandth of the risk due to all other causes). • The individual latent cancer fatality risk in the region between the site boundary and 10 miles beyond this boundary will be less than 2x10-6 per year (one thousandth of the risk due to all other causes). MIT Department of Nuclear Engineering 10 5 PRA Policy Statement (NRC, 1995) • The use of Probabilistic Risk Assessment (PRA) should be increased to the extent supported by the state of the art and data and in a manner that complements the defense-in-depth philosophy. • PRA should be used to reduce unnecessary conservatisms associated with current regulatory requirements. MIT Department of Nuclear Engineering M.I.T. Dept. of Nuclear Engineering11 Risk-Informed Regulation Insights derived from PRAs are used in combination with deterministic system analysis to focus licensee and regulatory attention on issues commensurate with their importance to safety. MIT Department of Nuclear Engineering M.I.T. Dept. of Nuclear Engineering12 6 PRA Objective To support risk management by: • Quantifying the frequencies of undesirable states. • Identifying accident scenarios. • Ranking these scenarios according to their probabilities of occurrence. • Ranking systems, structures, and components according to their contribution to various risk metrics. MIT Department of Nuclear Engineering 13 NPP End States • Various states of degradation of the reactor core. • Release of radioactivity from the containment. • Individual risk. • Numbers of early and latent deaths. • Number of injuries. • Land contamination. MIT Department of Nuclear Engineering 14 7 MIT Department of Nuclear Engineering 15 Summary of Dominant Sequences MIT Department of Nuclear Engineering M.I.T. Dept. of Nuclear Engineering16 8 Risk-Informed Changes in the Licensing Basis (Regulatory Guide 1.174) Comply with Regulations Maintain Defense-inDepth Philosophy Maintain Safety Margins Integrated Decision Making Risk Decrease, Monitor Neutral, or Small Increase MIT Department of Nuclear Engineering Performance M.I.T. Dept. of Nuclear Engineering17 Software Issues in PRA • There is a debate concerning the application of PRA techniques to systems involving software. • The issue is whether software failures can be modeled probabilistically: – “Black-box” software reliability models are generally based on analogy to hardware reliability. – “Context-based” approaches view software as an integral part of the system. • Is the concept of software reliability meaningful? MIT Department of Nuclear Engineering 18 9 Software Reliability • IEEE (1990) definition of software reliability: “ Software reliability is the probability that the software will not cause the failure of a product or of a mission for a specified time under specified conditions; this probability is a function of the inputs to and use of the product, as well as a function of the existence of faults in the software; the inputs to the product will determine whether an existing fault is encountered or not”. MIT Department of Nuclear Engineering 19 Key Observations Regarding the Definition of Software Reliability • • • First part of software reliability definition is similar to that of hardware reliability and in principle enables one to assess the reliability of systems composed of hardware and software components. The second part explicitly makes software reliability a function of the inputs, i.e., of system or environment characteristics that are external to the software itself. ¾ The potential variability of external inputs outside the software design boundaries is a much more frequent cause of failure than its conceptual equivalent -- i.e., deviation from specification boundaries - is for hardware Because software, unlike hardware, does not deteriorate with time, the passing of time is not in itself relevant to the probability that a software function may fail, whereas the occurrence or not of certain external input conditions (e.g., a condition requiring the activation of a specific execution path) within a given number of software execution cycles is. MIT Department of Nuclear Engineering 20 10 Black-Box Failure Rate Formulations • These formulations consider the software associated with a system or subsystem as one “black box,” which is characterized by one overall failure rate, regardless of which sub-function(s) the software may be executing. • Example: The detected error process for a given software module is modeled as a non-homogeneous Poisson process with an exponentially decaying rate given, in each successive time interval i, by: λ(i) = α exp (- β i ) MIT Department of Nuclear Engineering 21 Context-Based Approach • Despite the recognition that software behavior is deterministic, it is overly simplistic to say that the software is either correct or incorrect ¾ “correctness” is context-dependent; it is correct for some situations, but not for others. • The key is to identify the situations for which the software is incorrect, and then estimate the probability of being in one of those situations ¾ the focus has switched; instead of finding the probability of software failure, we are looking for the situations (i.e., the context) in which the software is likely to fail. • Example: An aircraft was damaged when the test pilot commanded the computer to raise the landing gear while the plane was standing on the runway. MIT Department of Nuclear Engineering 22 11 Quantifying the Software’s Contribution to Risk and Safety • The problem has been changed to a more complicated, but more rational, form ¾ we are no longer looking for a value for the probability of software failure (removing the need for complex statistical models for software) ¾ instead, we are looking for the probabilities of finding certain system parameters (input and environment) in states that will lead to system failure through inappropriate software action ¾ identifying these states is a major task MIT Department of Nuclear Engineering 23 Current NRC Research Activities • Digital System Performance and Reliability: Survey of the modeling methods for digital systems. • Wireless: Anticipatory research on the potential challenges and regulatory issues of using wireless communication technologies in nuclear power plants • Digital Systems Risk: Investigation of digital I&C system analysis methods. • Characterize Electromagnetic Conditions at Nuclear Power Plants • Security Tool Vulnerability Case Study: Adequacy of commercial-off-the-shelf tools for preventing cyber attack MIT Department of Nuclear Engineering 24 12