rejuvenation correctness

advertisement
Parametric Fault Trees with Dynamic Gates and Repair Boxes
Andrea Bobbio, Università del Piemonte Orientale, Alessandria
Daniele Codetta R., Università del Piemonte Orientale, Alessandria
Key Words: Parametric fault tree, modularization, dynamic gate, repair box, Colored Petri net.
SUMMARY & CONCLUSIONS
A new approach is proposed to include s-dependencies in
Fault Tree (FT) models. With respect to previous techniques,
the approach presented in this paper is based on two peculiar
powerful features.
First, we adopt a parameterization
technique, referred to as Parametric FT (PFT), to fold equal
subtrees (or basic events) in order to resort to a more compact
FT representation. It is shown that parameterization can be
conveniently adopted as well for dynamic gates. Second, PFT
can be modularized and each module translated into a High
Level Colored Petri net in the form of a Stochastic Wellformed Net (SWN). SWN generate a lumped Markov chain
and the saving in the dimension of the state space can be very
substantial with respect to standard (non colored) Petri nets.
Translation of PFT modules into SWN has proved to be very
flexible, and various kinds of new dependencies can be easily
accommodated. In order to exploit this flexibility a new
primitive, called repair box, is introduced. A repair box,
attached to an event, causes the starting of a repair activity of
all the components that are failed as the event occurs. In
contrast to all the previous FT based models, the addition of
repair boxes enables our approach to model cyclic behaviors.
We refer to the proposed approach as Dynamic Repairable
PFT (DRPFT). A tool supporting DRPFT is briefly described
and the tool is validated by analyzing a benchmark proposed
recently in the literature for quantitative comparison (Ref. 12).
1.
INTRODUCTION
Traditional Fault-trees (FT) have gained a widespread
acceptance for the dependability and safety analysis of
complex and critical systems, since they are simple to
manipulate and are supported by powerful software tools for
the qualitative and quantitative analysis. However, traditional
FT suffer from the main limitation that basic components must
be assumed as s-independent. S-dependence in the failure
process arises when the failure behavior of a component
depends on the state of the system. This kind of s-dependence
has been recently tackled by many authors (Refs. 1, 2, 3, 4, 5).
In the Dynamic FT (DFT) approach (Refs. 1, 3), the FT is
decomposed in independent modules and each module is
analyzed by generating its state space and solving the
underlying Markov chain (CTMC), or by solving the local
dependencies by means of numerical techniques (Ref. 5). All
the above FT models are acyclic models and no previous
technique has addressed the problem of including in a FT
actions, taken after a fault, that restore the system to a
previous condition (repair, recovery, roll-back, rejuvenation),
converting an acyclic model into a cyclic one.
In order to alleviate the largeness problem, a more compact
FT representation has been proposed in Refs. 6,7.
This
compact representation (referred to as Parametric FT (PFT)) is
based on the observation that often, due to redundancies, the
system to be modeled contains similar replicated units or
subtrees. Similar subtrees may be folded and parameterized,
so that only one representative is explicitly included in the
model.
PFT can still be modularized starting from an
algorithm presented in Ref. 8, and each module can be
automatically converted into a high level Colored Petri Net
formalism called SWN (Ref. 9). SWN’s have the property
that they generate symbolic states (markings) that may be
viewed as a high level description of sets of actual markings.
The definition of symbolic markings allows us to exploit
symmetry properties in the model and to generate the
underlying Markov chain in lumped form. The degree of
saving in the state space generation depends on the
redundancies present in the system and can be very consistent
(Ref. 6).
The aim of the present paper is to present an extended
version of PFT that we call DRPFT (Dynamic Repairable
Parametric FT).
DRPFT implements dynamic gates in
compact parametric form. Moreover, DRPFT is extended to
include dependencies arising from the repair process, by
adding a new primitive called repair box.
The solution
procedure for a DRPFT is presented and a software tool
developed for the analysis is briefly described. Finally, the
DRPFT is validated through a benchmark example taken from
Ref. 12.
The quantitative results obtained from DRPFT
coincide with those published in Ref. 12, but the example
emphasizes that the dimension of the state space that is
achieved using the DRPFT approach is more than two orders
of magnitude lower that the one obtained by previous non
parametric techniques.
2.
FT
P(D)FT
DRPFT
FDEP
PAND
SEQ
WSP
SWN
ACRONYMS
fault tree
parametric (dynamic) fault tree
dynamic repairable parametric fault tree
functional dependency gate
priority and gate
sequence enforcing gate
warm spare gate
stochastic well-formed nets
3.
DYNAMIC REPAIRABLE PARAMETRIC FT
PFT have been extensively discussed in Refs. 6 and 7. By
means of the introduction of a new event, called replicator
event (and drawn as a dashed rectangle) similar subtrees (or
basic events) can be folded and parameterized, by defining in
the replicator event a parameter with its range of variation.
Replicator events are, thus, a compact construct to generate as
many identical subtrees as the cardinality of the declared
parameter.
PFT with AND, OR, and K:N gates can be
automatically converted (Ref. 6) into a Colored Petri net in
the form SWN (Ref. 9). Notice that a PFT with no replicator
events becomes a standard FT, and its automatic translation in
a SWN provides a standard (non colored) PN.
In the
following, we extend the PFT formalism to include the
dynamic gates proposed in Refs. 1 and 3. In particular, we
consider the dynamic gates FDEP (functional dependency
gate), PAND (priority and gate), SEQ (sequence enforcing
gate) and WSP (warm spare gate).
We show that the
dynamic gates can be parameterized (when the proper
conditions arise) and translated into a SWN. Finally, we
introduce a new primitive, called repair box. A repair box is
assigned a constant repair rate, and can be connected to any
event in the PFT, with the meaning of indicating the repair
(with the assigned repair rate) of all the basic components that
are failed when the event occurs.
The PFT formalism,
augmented with dynamic gates and repair boxes, is referred to
as Dynamic Repairable PFT (DRPFT). The analysis of a
DRPFT follows a classical hierarchical scheme (Refs. 2,10).
The DRPFT structure is first modularized, i.e. partitioned in sindependent subtrees, called modules. Each module is
converted into a Petri net in the form SWN and analyzed in
isolation by resorting to the underlying lumped CTMC. The
module failure probability, computed from the resulting
CTMC, is cast back into the original DRPFT, by replacing the
whole module with a single basic event. All the above steps
are automatized in a software tool and hidden to the modeler.
4.
DYNAMIC GATES IN DRPFT AND THEIR
TRANSLATION IN SWN
When suitable symmetry conditions arise, dynamic gates
can be represented in compact parametric form, and then
automatically translated into a SWN. The way in which this
procedure is implemented in DRPFT is illustrated in the
following paragraphs. The graphical symbols adopted for the
dynamic gates are those introduced in Ref. 3.
Failure of component A is represented by a token in place A
that can be determined by the firing of transition A_f (failure
Fig. 1b: Petri net representation of FDEP
of A by its own) or by the firing of transition fdep_2 (failure of
the trigger event T). In the case the dependent components
are identical, they can be folded and parameterized as in Fig.
2a. T is the trigger event while D(i) is a replicator event
providing the parametric representation of the set of dependent
components.
If D(i) has the cardinality equal to 2, the
DRPFT of Fig. 2a is coincident with the DFT of Fig 1a.
However, the parameter i can have any cardinality, and, hence,
can represent a FDEP gate
with any number of
multiple identical dependent
components. Fig. 2b shows
the corresponding SWN.
The failure of one of the
dependent components is
represented by a colored
token in place D and may
be caused either by the
Fig. 2a: PFT-FDEP gate
firing of transition D_f
(failure of one of the D(i)’s) or by the firing of transition
4.1 FDEP
A FDEP gate is
characterized by a trigger
event and a set of
dependent
events.
Dependent events may fail
by their own or by the
effect of the trigger event
failure. In Fig. 1a, T is the
trigger event while A and B
are the dependent events.
Since FDEP in Fig. 1a has
Fig. 1a: FDEP gate
no replicator events its
translation is in the form of the standard PN of Fig. 1b.
Fig. 2b: SWN representation of PFT-FDEP
fdep_2 (failure of T). Notice again that, in contrast to the FT
of Fig. 1a, the complexity of the DRPFT structure of Fig. 2a
does not depend on the number of dependent components that
only influence the cardinality of the set D(i).
4.2 PAND
4.4 WSP
PAND gate fails if all of its input
fail in a specified order (from left to
right). Let us consider the gate in Fig.
3a: its input events are A and B and
they have to fail in this order. Since
PAND in Fig. 3a has no replicator
events its translation is in the form of
the standard PN of Fig. 3b. Transition
pand_2 fires if B fails and A is still
working; this transition puts a token in
place Oper to indicate that the order has
not been respected and the gate failure
did not occur. Otherwise, if A fails
Fig. 3a: PAND
before B, transition pand_1 fires putting
gate
a token in the failure place PAND_fail. A PFT construct can
A WSP gate is characterized by a main component, and a
set of ordered spare components. When the main component
fails it is replaced by the first component available in the spare
list. A spare may be in one of the following states:
 dormant or stand-by (it is not working, but ready to
replace the main component);
 working (it is working in place of the main
component which is failed);
 failed.
The failure rate of a spare component when in a working
condition is . Denoting by 
(0 1) the dormancy factor,
the failure rate of the spare in
the dormant condition is .
Notice that  models a
cold stand-by,  models the
hot s-independent case (the
WSP behaves as an AND
gate).
The WSP gate fails
when the main component
fails and there are no available
Fig. 5a: PFT-WSP gate
spares. Assuming that there
are m identical spares, we can model the spares by means of
the replicator node SP(i), in which the parameter i of
cardinality m is defined (see Fig. 5a). Hence, in the DRPFT
representation (Fig. 5a) the WSP gate has two inputs:
- a basic event P representing the failure of the main
component;
- a replicator basic event SP(i) that is the parametric
representation of the set of spares; the cardinality m of this set
is equal to the number of spares.
The translation of the WSP gate of Fig. 5a is given in the
SWN of Fig. 5b. Place SP_na contains the coloured tokens of
the spares which are not available because failed or already
working; SP_curr contains the token relative to the spare
which is currently replacing the main component. Transition
SP_fail models the fault of a spare when in dormant condition
putting the relative token in SP_na.
When the main component P fails (token in place P_dn),
transition P_spare fires putting the token relative to the spare
to be used in SP_curr and SP_na. If later the spare fails
(firing of transition SP_fail), if place SP_na contains a number
of tokens equal to the number of spares (there are no more
available spares at the moment) transition P_fail fires
modeling the general failure of the gate, else another spare
Fig. 3b: Petri net representation of PAND
be envisaged if the PAND gate has more than two identical
ordered input events.
4.3 SEQ
Fig. 4a: SEQ gate
SEQ gate forces its input to occur in a
specified order (we assume from left
to right). The translation of the SEQ
gate of Fig. 4a into a Petri net is
shown in Fig. 4b where the transition
B_f, representing the failure of B, is
enabled and fires in the presence of a
token in place A (A is failed). In a
similar way, the failure of C is
enabled by the failure of B.
Fig. 4b: Petri net representation of SEQ
Fig. 5b: SWN representation of aWSP
starts working by means of P_spare transition.
The SWN of Fig 5b is actually more general than the
corresponding WSP gate of Fig. 5a. Indeed, assigning a color
class to place P_dn we can model a situation in which there
are n main components with m shared (or non-shared) spares.
5.
MODULES DETECTION AND CLASSIFICATION
A module is a subtree that is s-independent from the rest
of the FT. In a DRPFT, a subtree is a module when it has no
Fig. 7: reduced PFT
Fig. 6: DPFT structural modules
nodes in common with other modules, does not descend from
a dynamic gate or does not contain a repair box. However,
the parameterization of the FT hinders the search for shared
basic events, since additional conditions on the parameter
definition and propagation need to be satisfied (Ref. 7). The
example in Fig. 6 clarifies this point.
A module may be classified as static or dynamic. Static
modules contain common basic events (possibly in
parameterized form) and can be analyzed by means of suitable
combinatorial techniques.
Dynamic modules contain
dynamic gates or repair boxes and require a state-space
analysis which, in the DRPFT methodology, is obtained by
translating the dynamic module into a SWN.
Dynamic
modules are analyzed in isolation, and replaced in the original
FT by a single basic event to which the module Top event proTab. 1: modules classification
STEP
Structural 1
Module
Shared
nodes
STEP
2
Shared
Param.
nodes
STEP
3
Dyn.
Mod. Type
Gate
Descendant
Min.
SYS1
SUB(i)
SYS2
D_F
SYS3
Q_F(i)
no
yes
no
no
no
no
no
no
no
yes
no
no
yes
yes
no
yes
no
no
no
no
no
no
yes
no
yes
no
yes
yes
stat.
dyn.
dyn.
dyn.
bability is assigned.
The module detection algorithm
proceeds in three steps. In the first step, a structural analysis
of the FT is performed, neglecting the specific nature of the
gates. In this step, applying the linear algorithm described in
Ref. 8, the subtrees with no shared nodes are identified and
these are called structural modules. Structural modules are
passed through steps 2 and 3.
In step 2, appropriate
conditions on the parameters defined in the replicator events
are checked in order to verify the presence of parameterized
common events. In step 3, it is checked whether the structural
module does not descend from a dynamic gate and does not
contain repair boxes.
A dynamic module is minimal if it
does not contain modules of any nature; minimal dynamic
modules are those to be detached, analyzed apart and replaced
in the original DRPFT. Let’s consider DRPFT example in
Fig. 6.
The first step (algorithm in Ref. 8) locates the
structural modules that are encircled in dotted line in Fig. 6.
Then, each module is passed through the subsequent two
steps. The module SUB(i) has shared parameterized common
events. Indeed, the parameter declared in the replicator event
SUB(i) differs from the parameter declared in the replicator
event B(j). Hence, each replica generated by SUB(i) shares
all the replicas generated by B(j).
The minimal (static)
module is, therefore, SYS1. Structural module D_F descends
from a dynamic PAND gate, and the minimal (dynamic)
module is, therefore, SYS2. Structural module Q_F contains a
dynamic gate and turns out to be a dynamic gate. The result
of the modularization procedure is reported in Table 1. After
each dynamic node has been replaced, the reduced FT
structure is shown in Fig. 7, and can be solved by any
traditional technique.
6.
DRPFT TOOL OVERVIEW
The tool supporting the DRPFT formalism is DrawNet
(Ref. 11): DrawNet has a flexible graphical interface (that can
be adapted to any graph-like model) and saves the graphical
structure into a XML file. The XML file is passed to the
DRPFTproc block that detects the modules of the FT and their
(static or dynamic) nature.
For each minimal dynamic
module a XML file is generated and passed to the translator
block from DRPFT to SWN. Then, a transient analysis of the
SWN representing the dynamic module is performed,
Fig. 8: tool overview
computing the module top event probability at a mission time
specified by the user. The result is passed back to DRPFTproc
and the dynamic module is replaced in the original DRPFT by
a basic event whose failure probability is constant and equal to
the result of the transient analysis. This procedure is iterated
until all the dynamic modules have been analyzed and
replaced; finally, the resulting (non dynamic) PFT is analyzed
by any traditional technique for FT. Fig. 8 sketches the flow
chart of the tool.
7.
BENCHMARK ANALYSIS
In order to verify the correctness of the described
procedure and to test the quantitative results provided by the
tool described in the previous section, we have applied the
DRPFT approach to a benchmark that has been specifically
proposed in Ref. 12 for quantitative comparison (Fig. 9).
The benchmark is composed by an OR gate whose input
events are 8 WSP’s that share 2 spares (S1 and S2). Since in
reported in Fig. 9b.
The
replicator event Q_F(i), of
cardinality 8, generates 8
identical subtrees that model
the main components. The
replicator event S(j), of
cardinality 2, generates the
two spares shared by the 8
main components. It should
be remarked that the FT in
Fig. 9a (and 9b) forms a
single dynamic module and
must be analyzed as a whole
resorting to its state space
Fig. 9b: DRPFT version
representation. The advantage
of thebenchmark
of using the compact DRPFT
representation of Fig. 9b
comes from the fact that the analysis is based on the
translation into a SWN that exploits the high level of
symmetry of the example by directly generating the CTMC in
a lumped form. The lumped CTMC generated by the DRPFT
tool contains 35 states.
The number of CTMC states
generated by the tool Galileo in the original benchmark is not
known from Ref. 12, but can be guessed by unfolding the
SWN. In this way, the estimated number of states for the
CTMC generated by Fig. 9a is 5898 states.
Tab. 2: comparison of results
repair box
t
DRPFT
Galileo


Unreliability
Unreliability
1.0E-06 0.1 8766 5.65660E-05
5.66E-05
1.0E-06 0.1 43830 5.72744E-03
5.73E-03
1.0E-06 0.1 87660 3.53699E-02
3.54E-02
The saving using the DRPFT approach is more than two
orders of magnitude with respect to a pure CTMC analysis.
The results for the transient analysis at different mission
times obtained from the DRPFT tool are compared in Table 2
with those reported in Ref. 12 and obtained from the Galileo
tool, and turn out to be coincident.
8. REPAIR BOXES
Fig. 9a: benchmark (Ref.12)
Ref. 12 all the components are assumed to be identical, we can
fold them using the parameterization technique described in
paragraph 5.4. The DRPFT version of the same example is
Fig. 10a: repair box
connected to the main
component
In order to model the
repair of failed components,
we have introduced in the
DRPFT formalism a new
primitive called repair box
(Ref. 11).
A repair box
may be connected to any
event with the following
meaning: when the event
occurs, the repair box
becomes enabled and starts
repairing
all
the
components that are failed
Tab. 3: results with repair boxes
Time
(h)












Fig. 5



TE
 
 unreliability
   
   
   
   
   
in the subtree whose root is the event under consideration.
Every repair box has a repair rate () to represent the
exponentially distributed time necessary to complete the
repair.
The use and the effect of this new construct is illustrated
by means of the following example, in which the WSP gate of
Fig. 5a is taken as a base model. A single repair box is added
in Fig 10 and two repair boxes are added in Fig. 11. In Fig.
10a, a repair box is attached to the main component of the
WSP. The effect of this repair box is to model the repair of
the main component only while a spare is replacing it; when
Fig. 10
TE
unreliability
Fig. 11
TE
unreliability










We have added place SP_dn, containing the tokens relative
to failed spares, and the spare repair transition SP_repair
whose firing removes the token (of the same color) from
SP_na and SP_dn in order to return the spare to its available
state. The failure
condition of this
system
(Top
Event) is reached
when the main
component and
all the spares are
in
a
failed
condition at the
same time.
The Top Event
unreliability has
been
computed
using the DRPFT
Fig. 11a: repair boxes connected
tool for different
to main and spare components
mission times and
for the three cases
Fig. 10b: main component repair SWN
the repair ends, the spare is returned to a dormant condition
while the main component is put back in operation. Failure
of the system occurs when the main component is under repair
and there are no more available spares. With respect to Fig.
5, the SWN of Fig. 10b, resulting from the translation of the
DRPFT of Fig. 10a, contains the new transition named
P_repair. When P_repair fires, the main component turns in
the working state (P_dn gets empty) and the spare actually
replacing the main component, turns in stand-by state (its
token is removed from SP_curr) and can be used again if
necessary (its token is removed from SP_na too).
In Fig. 11a, a repair box is attached also to the (replicator)
event modeling the spares. The effect of this second repair
box, is to model the repair also of the failed spares. When a
spare fails (either in dormant or operating condition) a repair
action is started and the spare under repair is replaced by the
first available spare in the list. The resulting SWN is shown
in Fig. 11b.
Fig. 11b: SWN translated from Fig 11a
(no repair - Fig. 5, one repair box - Fig 10 and two repair
boxes - Fig. 11). The results are summarized and compared
in Table 3, where the assumed values for the failure rate  ,
the dormancy factor and the repair rate (common to the
two repair boxes) are also reported.
Looking at the results of Table. 3, the effect of the repair
boxes is to reduce the probability of reaching the system
failure state, as expected.
ACKNOWLEDGMENTS
The work documented in this paper has been partially
supported by MIUR under Grant FIRB-Perf- RBNE019N8N.
REFERENCES
J. B. Dugan, S. J. Bavuso, M. A. Boyd, “Dynamic
Fault-Tree Models for Fault-Tolerant Computer
Systems”, IEEE Transactions on Reliability, vol 41,
1992, pp 363-377.
2. Anand, A. K. Somani, “Hierarchical Analysis of
Fault
Trees
with
Dependencies,
Using
Decomposition”, Proc Annual Reliability and
Maintainability Symposium, 1998, pp 69-75
3. R. Manian, D. W. Coppit, K. J. Sullivan, J. B. Dugan,
“Bridging the Gap Between Systems and Dynamic
Fault Tree Models”, Proceedings Annual Reliability
and Maintainability Symposium, 1999, pp 105-111
4. Bobbio and L. Portinale and M. Minichino and E.
Ciancamerla, “Improving the Analysis of Dependable
Systems by Mapping Fault Trees into Bayesian
Networks”, Reliability Engineering and System
Safety, vol 71, 2001, pp 249-260
5. S. Amari and G. Dill and E. Howals, “A new
approach to solve dynamic fault-trees”, Proceedings
IEEE Annual Reliability and Maintainability
Symposium, 2003
6. Bobbio, G. Franceschinis, R. Gaeta, L. Portinale,
“Parametric Fault-Tree for the Dependability
Analysis of Redundant Systems and its High Level
Petri Net Semantics”, IEEE Transactions on Software
Engineering, vol 29, 2003, pp 270-287
7. Bobbio, G. Franceschinis, R. Gaeta, L. Portinale,
“Dependability Assessment of an Industrial
Programmable Logic Controller via Parametric FaultTree and High Level PN”, Proc 9th International
Workshop on Petri Nets and Performance Models,
2001, pp 29-38
8. Y. Dutuit, A. Rauzy, “A Linear-Time Algorithm to
Find Modules of Fault Trees”, IEEE Transactions on
Reliability, vol 45, 1996, pp 422-425
9. G. Chiola, C. Duthuillet. G. Franceschinis, S.
Haddad, “ Stochastic Well-Formed Colored Nets and
Symmetric
Modeling
Applications”,
IEEE
Transactions on Computers, vol 42, 1993, pp 13431360
10. J. B. Dugan, K. J. Sullivan, D. Coppit, “Developing a
Low-Cost High-Quality Software Tool for Dynamic
Fault-Tree Analysis”, IEEE Transactions on
Reliability, vol 49, 2000, pp 49-59
11. V. Vittorini, G. Franceschinis, M. Gribaudo, M.
Iacono, N. Mazzocca, “DrawNet: Model Objects to
Support Performance Analysis and Simulation of
Complex Systems”, 12th Int Conf Modelling Tools
and Techniques for Computer and Communication
System Performance Evaluation, Springer Verlag LNCS, Vol 2324, 2002, pp 233-238
1.
12. H. Zhu, S. Zhou, J. B. Dugan, K. J. Sullivan, “A
Benchmark for Quantitative Fault Tree Reliability
Analysis”, Proceedings Annual Reliability and
Maintainability Symposium, 2001, pp 86-93
BIOGRAPHIES
Andrea Bobbio
Dipartimento di Informatica
Università del Piemonte Orientale
Spalto Marengo, 33
15100 Alessandria, ITALY
e-mail: bobbio@unipmn.it
Andrea Bobbio graduated in Nuclear Engineering from
Politecnico di Torino.
Presently, he is full professor at
Department of Computer Science of the Università del
Piemonte Orientale, Alessandria, Italy. His activity is mainly
focused on the modeling and analysis of the performance and
reliability of stochastic systems, with particular emphasis on
Markovian and nonMarkovian models and stochastic Petri
Nets.
Bobbio has spent various research periods at the
Department of Computer Science of the Duke University
(Durham NC, USA), at the Technical University of Budapest
and at the Department of Computer Science and Engineering
at the Indian Institute of Technology in Kanpur (India). He
has been principal investigator and leader of research groups
in various research projects with public and private
institutions. He his Senior Member of IEEE, and he is author
of several papers in international journals as well as
communications to international conferences.
Daniele Codetta R.
Dipartimento di Informatica
Università del Piemonte Orientale
Spalto Marengo, 33
15100 Alessandria, ITALY
e-mail: raiteri@unipmn.it
Daniele Codetta Raiteri got his degree in Computer Science in
July 2002 at Università del Piemonte Orientale (Italy) and he
is, presently, a Ph. D. student in Computer Science at
Università di Torino (Italy). His activity concerns stochastic
models for reliability analysis, more specifically fault trees
and their evolutions and analysis.
Download