Dynamic Fault Tree analysis using Input/Output Interactive Markov Chains Hichem Boudali1, Pepijn Crouzen2, and Mariëlle Stoelinga1. 1Formal Methods and Tools group CS, University of Twente, NL. 2Dependable Systems and Software group, CS, Saarland University, Germany May 9, 2008 IPA Lentedagen, Rhenen 1 Introduction: Dependability Dependability: The trustworthiness of a computer system such that reliance can justifiably be placed upon the service it delivers. Reliability: The probability that a computer system does not fail within a given time bound. May 9, 2008 IPA Lentedagen, Rhenen 2 Introduction: Formal dependability Continuous-time Markov chains (CTMC) States and Markovian transitions Probability of traversing a λtransition within t time-units is: 1-e-λt Tools: Reachability analysis (among others) May 9, 2008 IPA Lentedagen, Rhenen λ μ μ λ 3 Introduction: CTMC characteristics CTMCs describe probability distributions (phase-type distributions) Phase-type distributions can approximate any arbitrary distribution arbitrarily closely Goal: Find a CTMC which describes the probability of system failure within t timeunits (i.e. the unreliability of the system) Problem: Difficult to find the CTMC that models a large system May 9, 2008 IPA Lentedagen, Rhenen λ μ μ λ 4 Introduction: Engineering dependability Fault Trees (1960’s) Graphical Easy to use Syntax: Workstation fails OR Basic events Gates AND Semantics: logical formula Problem: Not expressive enough May 9, 2008 CPU fails IPA Lentedagen, Rhenen Mem1 fails Mem1 fails 5 Introduction: Engineering dependability Dynamic Fault Trees (1992) Extension of classic fault trees Additions: System failure OR Use of spares Dependencies Order-based failure SPARE Tools: Convert to CTMC May 9, 2008 S IPA Lentedagen, Rhenen P1 P2 6 But… DFT Drawbacks Scalability Ambiguous syntax and semantics Lack of modularity: Dynamic modules can not be reused Restrictions on spares and dependencies Existing analysis technique is hard to extend or modify May 9, 2008 IPA Lentedagen, Rhenen 7 Outline Case study: FTPP system DFT approach Formalizing DFTs DFT semantics in I/O-IMCs Deep compositionality Extending the DFT formalism Conclusion Future work May 9, 2008 IPA Lentedagen, Rhenen 8 Case study: FTPP A B C D NE1 A B C A N E 2 N E 4 D B C D 16 processors divided into 4 groups 4 network elements connect the processors Per group 2 processors must be operational Different configurations are possible NE3 A May 9, 2008 B C D IPA Lentedagen, Rhenen 9 Case study: FTPP B B B B 16 processors divided into A A A A 4 groups 4 network elements NE1 connect the processors D S Per group 2 processors N D S each configuration? How reliableN is must be operational E E D S 2 4 Different configurations are possible D S NE3 Dynamic redundancy management is possible C C C C May 9, 2008 IPA Lentedagen, Rhenen 10 FTPP DFT A A A A NE1 B B B S N E 2 N E 4 B S S System Failure S NE3 OR C C C C Group 1 Failure Group 2 Failure Group 3 Failure Group 4 Failure 2/3 2/3 2/3 2/3 A B C A B S A FDEP A A FDEP A C A B B FDEP B IPA Lentedagen, Rhenen C C S FDEP NE3 B B S NE2 A B S NE1 May 9, 2008 C NE4 C C C S S S S 11 Existing DFT analysis [Dugan et al. 1992] For static fault trees binary decision diagrams can be used! Otherwise: Convert the DFT into a CTMC. Analyze CTMC using standard solution techniques. A has failed B is operational AND-gate Failure rate: 0.2 f/h A But… Starting state: CState space 0.2 explosion: A is operational B is operational CTMC grows exponentially FTPP difficult to analyze 0.4 Failure rate: B 0.4 A has failed B has failed 0.2 0.4 f/h Pr(A fails in T hours) = 1 – e-0.2•T A’s Mean time to failure = 1/0.2 = 5 hours A is operational B has failed Unreliability = Prob[Reaching in time T] May 9, 2008 IPA Lentedagen, Rhenen 12 FTPP Results System Failure A A A A Group 1 Failure Group 2 Failure Group 3 Failure Group 4 Failure 2/3 2/3 2/3 2/3 NE1 B B B S N E 2 N E 4 B S S A B C A B C A B C A B C S NE3 S C C C C S FDEP NE1 S FDEP NE2 A A A A S FDEP FDEP NE3 B B B B NE4 C C C C S S S S Analysis method Max number of states Max number of transitions Unreliability (T=10) Standard 32757 426826 2.55479 · 10-8 Compositional 1325 14153 2.55479 · 10-8 May 9, 2008 IPA Lentedagen, Rhenen 13 What’s behind it? Model local behavior We need compositional Markov chains Combination of LTS and CTMC, with I/O automata features I/O-IMC for Markovian transitions (CTMC) Basic(LTS) event Interactive transitions Action signature (IOA) ? - Input actions ! - Output actions ; - Internal actions λ failed! Input/Output Interactive Markov Chains (I/O-IMC) May 9, 2008 IPA Lentedagen, Rhenen 14 Input/Output Interactive Markov Chains Properties of IMCs: Combines stochastic behavior and interactive behavior orthogonally CSP-style synchronization + interleaving semantics Maximal progress for internal transitions Properties of IOIMCs: τ Unique outputs λ Input enabledness Outputs cannot be blocked! Maximal progress for output transitions May 9, 2008 IPA Lentedagen, Rhenen 15 DFT semantics DFT gate to I/O-IMC f(A)? f(B)? f(C)! f(B)? f(A)? f(A)? f(B)? f(C)! f(B)? May 9, 2008 IPA Lentedagen, Rhenen 16 What is deep compositionality? Semantics of a DFT arises naturally as composition of the semantics of its building blocks f(G1) Group 1 Failure f(G1) 2/3 A B C S f(NE1) f(NE2) f(NE3) f(NE4) f(NE1) … f(NE4) But: This may lead to huge models. May 9, 2008 IPA Lentedagen, Rhenen 17 Why use deep compositionality? Formally define semantics Many useful techniques Combining models: Composition Refining models: Hiding Minimizing models: Bisimulation Reusing models: Renaming Combat State-space explosion Well supported by CADP toolset (VASY/INRIA) May 9, 2008 IPA Lentedagen, Rhenen 18 Compositional Aggregation Composition + Abstraction Translation Repeat Aggregation (minimization) Analysis Result: System failure probability Aggregated system CTMC (CTMDP) May 9, 2008 IPA Lentedagen, Rhenen 19 Compositional Aggregation Example f(A)? f(B)? f(C)! Failure rate: 0.2 f/h f(B)? f(A)? Failure rate: 0.4 f/h 0.2 May 9, 2008 f(A)! 0.4 IPA Lentedagen, Rhenen f(B)! 20 Compositional Aggregation Parallel Composition C 2 1 f(B)? f(A)? 4 5 f(C)! 2||3 3 f(A)! 1||2 f(A)? f(B)? f(C)! f(B)? Inputs: f(A)? and f(B)? Outputs: f(C)! C||A 1||1 f(A)! 0.2 3||2 f(B)? Inputs: none Outputs: f(A)! 1 0.2 May 9, 2008 2 f(A)! 5||3 0.2 Synchronize on f(A) A 4||3 f(B)? 3||1 3 IPA Lentedagen, Rhenen 21 Compositional Aggregation Abstraction (hiding) C 2||3 f(A)! f(A); 1||2 f(C)! f(B)? A B 1||1 f(A)! f(A); 0.2 4||3 f(B)? 5||3 0.2 3||2 f(B)? Abstraction (hiding): Makes signal internal May 9, 2008 3||1 IPA Lentedagen, Rhenen 22 Compositional Aggregation Aggregation (weak bisimulation) Aggregation: Finding a smaller model equivalent (behaviorally) to the original 2||3 f(A); 1||2 f(C)! f(B)? 1||1 f(A); 0.2 4||3 f(B)? 5||3 0.2 Weak bisimulation: Disregard internal steps May 9, 2008 3||2 f(B)? 3||1 IPA Lentedagen, Rhenen 23 Compositional Aggregation Example (continued) C||A 1 2 f(B)? 0.2 4 f(C)! 5 3 2||1 0.2 f(B)? 1||1 0.4 0.2 2||2 C||A||B 4||3 0.4 B 1||2 1 0.2 2 f(B)! f(B)! 0.2 f(C)! 0.2 f(B)! 5||3 3 3||3 May 9, 2008 IPA Lentedagen, Rhenen 24 Compositional Aggregation Example (continued) 0.2 0.4 f(C)! C||A||B 0.4 May 9, 2008 IPA Lentedagen, Rhenen 0.2 25 DFT extensions Extensions: Inhibition Repair-policies Complex spares Complex dependencies … DSN07 Free! Adding extensions in the compositional framework is easy: Modify translation of DFT building blocks Compositional aggregation algorithm is unaltered May 9, 2008 IPA Lentedagen, Rhenen 26 Extension: Repair Basic event A AND-gate C λ r(B)? r(B)? r(A)? f(A)! r(A)! r(A)? r(C)! µ f(B)? f(A)? r(C)! f(C)! f(A)? f(B)? r(C)! r(B)? r(A)? May 9, 2008 IPA Lentedagen, Rhenen r(A)? r(B)? 27 Conclusion: How we tackled drawbacks State-space explosion. Ambiguous syntax and semantics. Lack of modularity: Compositional Aggregation DAG Formal translation Dynamic modules can not be reused. Restrictions on spares and dependencies. Existing analysis technique is hard to extend and/or modify. May 9, 2008 IPA Lentedagen, Rhenen I/O-IMC Renaming! Lifted! Extensions at the lowest level 28 Future work Fully automated tool (CORAL) More aggressive state reduction Recent work: specialized acyclic algorithm Apply deep compositionality to more advanced engineering formalisms! (see Boudali et al., DSN08) Extend DFT formalism Repair Failure modes Non-exponential failure distributions Sophisticated dependencies May 9, 2008 IPA Lentedagen, Rhenen 29 The end! Questions? May 9, 2008 IPA Lentedagen, Rhenen 30