Chapter 6 Fault Tree Analysis 6.1 Introduction (1) What is fault tree analysis? Fault tree analysis (FTA) is a top-down approach to failure analysis, starting with a potential undesirable event (accident) called a TOP event, and then determining all the ways it can happen. The analysis proceeds by determining how the TOP event can be caused by individual or combined lower level failures or events. The causes of the TOP event are “connected” through logic gates In this book we only consider AND-gates and OR-gates FTA is the most commonly used technique for causal analysis in risk and reliability studies. (2) History FTA was first used by Bell Telephone Laboratories in connection with the safety analysis of the Minuteman missile launch control system in 1962 Technique improved by Boeing Company Extensively used and extended during the Reactor safety study (WASH 1400) (3) FTA main steps Definition of the system, the TOP event (the potential accident), and the boundary conditions Construction of the fault tree Identification of the minimal cut sets Qualitative analysis of the fault tree Quantitative analysis of the fault tree Reporting of results (4) Preparation for FTA The starting point of an FTA is often an existing FMECA and a system block diagram The FMECA is an essential first step in understanding the system The design, operation, and environment of the system must be evaluated The cause and effect relationships leading to the TOP event must be identified and understood (4) Preparation for FTA (5) Boundary conditions The physical boundaries of the system (Which parts of the system are included in the analysis, and which parts are not?) The initial conditions (What is the operational stat of the system when the TOP event is occurring?) Boundary conditions with respect to external stresses (What type of external stresses should be included in the analysis –war, sabotage, earthquake, lightning, etc?) The level of resolution (How detailed should the analysis be?) 6.2 Fault tree construction Define the TOP event in a clear and unambiguous way. Should always answer: What e.g., “Fire” Where e.g., “in the process oxidation reactor” When e.g., “during normal operation” What are the immediate, necessary, and sufficient events and conditions causing the TOP event? Connect via AND- or OR-gate Proceed in this way to an appropriate level (= basic events) Appropriate level: Independent basic events Events for which we have failure data example Scenarios producing the TOP event: Top Event (accident) OR Basic Event 3 Basic Event 4 Undeveloped Event 1 Basic Event 1, Basic Event 2 Intermediate Event A Intermediate Event B AND OR Basic Event 1 Basic Event 2 Basic Event 3 Undeveloped Event 1 Basic Event 4 (1) Fault tree symbols Top Event and Intermediate Events Basic Events Undeveloped Events Transfer Symbols AND Gates Output Fault (Effect) Inhibit Gates Condition Input Input Fault (Cause) OR Gates Vessel loses propulsion A Basic failure of the propelle r (1) Engine stops B Fuel supply to engine is contaminated C Contaminate d fuel in bunker tanks (3) On-board fuel cleanup system fails (4) Engine fails to operate Basic failure of the engine (stops) (2) Example: Redundant fire pumps Example: Redundant fire pumps Example: Redundant fire pumps 6.3 Qualitative assessment (1) Cut Sets A cut set in a fault tree is a set of basic events whose (simultaneous) occurrence ensures that the TOP event occurs A cut set is said to be minimal if the set cannot be reduced without loosing its status as a cut set The TOP event will therefore occur if all the basic events in a minimal cut set occur at the same time. 6.4 Qualitative assessment Qualitative assessment by investigating the minimal cut sets: Order of the cut sets Ranking based on the type of basic events involved 1. Human error (most critical) 2. Failure of active equipment 3. Failure of passive equipment Also look for “large” cut sets with dependent items (1) Notation Q0(t) = Pr( The TOP event occurs at time t) qi(t) = Pr( Basic event i occurs at time t) Qj(t) = Pr( Minimal cuts j fails at time t) Let Ei(t) denote that basic event i occurs at time t. Ei(t) may, for example, be that component i is in a failed state at time t. Note that Ei(t) does not mean that component i fails exactly at time t, but that component i is in a failed state at time t A minimal cut set is said to fail when all the basic events occur (are present) at the same time. (2) Single AND-gate (3) Single OR-gate (4) Cut set assessment (5) TOP event probability 6.4 Input Data Types of events: Non-repairable unit Repairable unit (repaired when failure occurs) Periodically tested unit (hidden failures) Frequency of events On demand probability Basic event probability: qi(t) = Pr(Basic event i occurs at time t) (1)Non-repairable unit Unit i is not repaired when a failure occurs. Input data: Failure rate λi Basic event probability: qi(t) = 1 − e−λit ≈λit (2) Repairable unit Unit i is repaired when a failure occurs. The unit is assumed to be “as good as new” after a repair. Input data: Failure rate λi Mean time to repair, MTTRi Basic event probability: qi(t)≈ λi · MTTRi (3) Periodic testing Unit i is tested periodically with test interval . A failure may occur at any time in the test interval, but the failure is only detected in a test or if a demand for the unit occurs. After a test/repair, the unit is assumed to be “as good as new”. This is a typical situation for many safety-critical units, like sensors, and safety valves. (5) Frequency Event i occurs now and then, with no specific duration Input data: Frequency fi If the event has a duration, use input similar to repairable unit. (6) On demand probability Unit i is not active during normal operation, but may be subject to one or more demands Input data: Pr( Unit i fails upon request) This is often used to model operator errors. (7)Cut set evaluation Ranking of minimal cut sets: Cut set unavailability The probability that a specific cut set is in a failed state at time t Cut set importance The conditional probability that a cut set is failed at time t, given that the system is failed at time t 6.5 Procedure for Fault Tree Analysis 1.0 Define the system of interest 8.0 Use the results in decision making 2.0 Define the TOP event for the analysis 3.0 Define the treetop structure 4.0 Explore each branch in successive levels of detail 7.0 Perform quantitative analysis (if necessary) 6.0 Identify important dependent failure potentials and adjust the model appropriately 5.0 Solve the fault tree for the combinations of events contributing to the TOP event 6.5.1 Define the system of interest Intended functions Physical boundaries Analytical boundaries Initial conditions Example system Fuse #1 Fuse #2 Hydraulic pump #1 Hydraulic pump #2 Power supply #1 Relay Power supply #2 Switch Crew member Example system definition Function of interest Provide hydraulic pressure to operate the vessel’s steering system Boundaries Physical Analytical Power supply #1 Power supply #2 Ignore wiring faults and failures Initial Conditions Relay closed Switch closed Pumps on Procedure for Fault Tree Analysis 1.0 Define the system of interest 8.0 Use the results in decision making 2.0 Define the TOP event for the analysis 3.0 Define the treetop structure 4.0 Explore each branch in successive levels of detail 7.0 Perform quantitative analysis (if necessary) 6.0 Identify important dependent failure potentials and adjust the model appropriately 5.0 Solve the fault tree for the combinations of events contributing to the TOP event 6.2.2 Define the TOP event for the analysis The TOP event must be a specific type of problem with the system Poorly-defined TOP event (no subject) Won’t start Poorly-defined TOP event (no functional failure or condition) Motor Poorly-defined TOP event (functional failure not specific enough) Motor fails Well-defined TOP event Motor fails to start Procedure for Fault Tree Analysis 1.0 Define the system of interest 8.0 Use the results in decision making 2.0 Define the TOP event for the analysis 3.0 Define the treetop structure 4.0 Explore each branch in successive levels of detail 7.0 Perform quantitative analysis (if necessary) 6.0 Identify important dependent failure potentials and adjust the model appropriately 5.0 Solve the fault tree for the combinations of events contributing to the TOP event 6.5.3 Define the treetop structure Logic structure Most direct contributors Procedure for Fault Tree Analysis 1.0 Define the system of interest 8.0 Use the results in decision making 2.0 Define the TOP event for the analysis 3.0 Define the treetop structure 4.0 Explore each branch in successive levels of detail 7.0 Perform quantitative analysis (if necessary) 6.0 Identify important dependent failure potentials and adjust the model appropriately 5.0 Solve the fault tree for the combinations of events contributing to the TOP event 6.5.4 Explore each branch in successive levels of detail Extend the analysis of each intermediate event to the next level, as if it were a TOP event Both pumps are off A Both pumps failed No current to the pumps B C Pump #1 failed Pump #2 failed 1 2 No continuity in high voltage circuit Power supply #1 fails off D Pumps improperly wired 3 Relay opened Fuse failed open E F No current to the relay Relay failed open Fuse #1 failed open Fuse #2 failed open 4 5 6 G No continuity in low voltage circuit Power supply #2 fails off H 7 Switch failed open Operator opened the switch 8 9 Procedure for Fault Tree Analysis 1.0 Define the system of interest 8.0 Use the results in decision making 2.0 Define the TOP event for the analysis 3.0 Define the treetop structure 4.0 Explore each branch in successive levels of detail 7.0 Perform quantitative analysis (if necessary) 6.0 Identify important dependent failure potentials and adjust the model appropriately 5.0 Solve the fault tree for the combinations of events contributing to the TOP event 6.5.5 Solve the fault tree for the combinations of events contributing to the TOP event Scenarios (cut sets) producing TOP Event: Top Event Intermediate Event A Basic Event 1 Basic Event 2 Basic Event 3 Basic Event 4 Undeveloped Event 1 Basic Event 1, Basic Event 2 Intermediate Event B Basic Event 3 Undeveloped Event 1 Basic Event 4 Procedure 5.1 Name all gates and basic/undeveloped events Case 1 TOP A B 1 C 2 3 4 Case 2 TOP A B 1 C 2 3 4 Procedure (cont.) 5.2 Beginning with the TOP event, expand each gate into its inputs as follows: “OR” gates — replace the gate with the sum of its inputs (+) “AND” gates — replace the gate with the product of its inputs (•) Case 1 (cont.) TOP = A = B• C TOP A B 1 C 2 3 4 Case 2 (cont.) TOP = A = B+ C TOP A B 1 C 2 3 4 Procedure (cont.) 5.3 Continue the expansion until only basic events remain in the equation (i.e., all intermediate event gates have been replaced) Case 1 (cont.) TOP = A = B• C = (1+ 2) • (3 + 4) TOP A B 1 C 2 3 4 Case 2 (cont.) TOP = A = B+ C = (1• 2) + (3 • 4) TOP A B 1 C 2 3 4 Procedure (cont.) 5.4 Simplify the equation (eliminate any parentheses) by using the associative law of multiplication Case 1 (cont.) TOP = = = = TOP A B 1 C 2 3 4 A B• C (1+ 2) • (3 + 4) 1• 3 + 1 • 4 + 2 • 3 + 2 • 4 Case 2 (cont.) TOP = = = = TOP A B 1 C 2 3 4 A B+ C (1• 2) + (3 • 4) 1• 2 + 3 • 4 Procedure (cont.) 5.5 Simplify the equation by: eliminating repeated basic events in cut sets eliminating supersets (i.e., cut sets that contain other complete cut sets) TOP A C 1 B 1 E D 2 F 2 G 3 3 2 4 3 4 4 5 TOP = = = = A+1+B (C + D) + 1 + E · 4 1·2+F·6+1+2·3·4·5·4 1·2+2·3·3·4+1+2·3·4·5·4 (repeated events) = = = = 1·2+2·3·3·4+1+2·3·4·5·4 1 · 2 + 2 · 3 · 3· 4 + 1 + 2 · 3 · 4 · 5 · 4 1 · 2 + 2 · 3 · 4 + 1 + 2 · 3 · 4· 5 1+2·3·4 (superset) Minimal cut sets TOP A B 1 C 3 2 D E F 4 5 G 7 H 8 9 6 TOP = = = = = = = A B+C (1 · 2) 1·2 1·2 1·2 1·2 + + + + + D +3 (10 + E 10 + (G 10 + (7 10 + 7 + + + + F) 4) H) (8 + + + + 3 (5 + 6) + 3 4 + 5 + 6 + 3 9) + 4 + 5 + 6 + 3 Minimal Cut Sets No repeated events or supersets, so the minimal cut sets are: 3 4 5 6 7 8 9 10 1,2 1-event cut sets 2-event cut set Procedure for Fault Tree Analysis 1.0 Define the system of interest 8.0 Use the results in decision making 2.0 Define the TOP event for the analysis 3.0 Define the treetop structure 4.0 Explore each branch in successive levels of detail 7.0 Perform quantitative analysis (if necessary) 6.0 Identify important dependent failure potentials and adjust the model appropriately 5.0 Solve the fault tree for the combinations of events contributing to the TOP event 6.5.6 Identify important dependent failure potentials and adjust the model appropriately TOP OR Low Likelihood Independent failure of A and B A and B fail due to a CCF event AB AND A fails B fails A B High Likelihood Common Cause Failures Causes of dependent failures in systems with redundancy Engineering Design Functional deficiencies Realization faults Operation Construction Manufacture Installation & commissioning Procedural Maintenance & test Operation Environmental Normal extremes Energetic events Both pumps failed Cut Sets 3 4 Both pumps have failed CCF of independently 5 both pumps B2 6 CCF 7 Pump #1 Pump #2 8 fails fails 9 1 2 10 New cut set CCF1,2 1,2 B1 1,2 Procedure for Fault Tree Analysis 1.0 Define the system of interest 2.0 Define the TOP event for the analysis 3.0 Define the treetop structure 4.0 Explore each branch in successive levels of detail 8.0 Use the results in decision making 7.0 Perform quantitative analysis (if necessary) 6.0 Identify important dependent failure potentials and adjust the model appropriately 5.0 Solve the fault tree for the combinations of events contributing to the TOP event 6.5.7 Perform quantitative analysis (if necessary) Characterization of failure mode frequency Characterization of failure mode severity Characterization of failure mode risks Event 1 2 3 4 5 6 7 8 9 10 CCF1,2 Rate of Occurrence (l) 0.1/y Avg. downtime (t) = 1hr 0.1/y Avg. downtime (t) = 1hr 1/y 0.01/y 0.001/y 0.001/y 1/y 0.1/y 1/y 0.001/y 0.01/y Cut Set 3 4 5 6 7 8 9 10 CCF1,2 1,2 TOP Event Rate of Occurrence Calculation Process Values of Cut Set Rate of Occurrence l3 l4 l5 l6 l7 l8 l9 l10 l1*CCF1,2 1/y 0.01/y 0.001/y 0.001/y 1/y 0.1/y 1/y 0.001/y 0.01/y l1(l2t2)+ l2(l1t1) Cut Set Rate of Occurrence (0.1/y)(0.1/y*[1hr*1y/8,760hr])+ (0.1/y)(0.1/y*[1hr* 1y/8,760hr]) = 2.3x10-6/y 3.1/y Procedure for Fault Tree Analysis 1.0 Define the system of interest 8.0 Use the results in decision making 2.0 Define the TOP event for the analysis 3.0 Define the treetop structure 4.0 Explore each branch in successive levels of detail 7.0 Perform quantitative analysis (if necessary) 6.0 Identify important dependent failure potentials and adjust the model appropriately 5.0 Solve the fault tree for the combinations of events contributing to the TOP event 6.5.8 Use the results in decision making Judge acceptability Identify improvement opportunities Make recommendations for improvements Justify allocation of resources for improvements The 5 Whys Event/ Condition Why? Subevent/ Condition Subevent/ Condition Why? Why? Subevent/ Condition Subevent/ Condition Why? Why? Subevent/ Condition Root Cause Subevent/ Condition Creating a Simplified Fault Tree for Root Cause Analysis 1. Define event of interest 2. Define next level of tree 3. Develop questions to examine credibility of branches 4. Gather data to answer questions 5. Use data to determine credibility of branches No 6. Is branch credible? No 8. Stop branch development Yes 7. Is branch sufficiently developed? No Yes 9. Is model sufficiently developed? Yes 10. Identify causal factors 6.6 Conclusions FTA identifies all the possible causes of a specified undesired event (TOP event) FTA is a structured top-down deductive analysis. FTA leads to improved understanding of system characteristics. Design flaws and insufficient operational and maintenance procedures may be revealed and corrected during the fault tree construction. FTA is not (fully) suitable for modelling dynamic scenarios FTA is binary (fail–success) and may therefore fail to address some problems