Dependability Analysis in Redundant Communication Networks using Reliability Importance Almir Guimarães

advertisement
2011 International Conference on Information and Network Technology
IPCSIT vol.4 (2011) © (2011) IACSIT Press, Singapore
Dependability Analysis in Redundant Communication Networks using
Reliability Importance
Almir Guimarães1, Paulo Maciel 2, Rubens Matos Jr2. and Kádna Camboim2
1
Campus Arapiraca, Federal University of Alagoas
Arapiraca, AL, Brazil.
Email: almir@arapiraca.ufal.br
2
Center of Informatics, Federal University of Pernambuco
Recife, PE, Brazil.
Email: apg2, prmm, rsmj, kmac@cin.ufpe.br
Abstract. In this paper, we investigate the dependability modeling in redundant communication networks.
We use Stochastic Petri Net as an enabling modeling approach for analytical evaluation of complex scenarios.
We apply our proposed modeling approach in a case study to evaluate the availability of enterprise networks
in different architectures. These architectures consider different redundancy mechanisms. A combination of
dependability models and reliability importance are used to analyze the system availability according to the
most important components. We also show the variation of system availability in accordance with
Mean Time To Failure in different architectures.
Keywords: Dependability; Availability; Reliability Importance; Stochastic Petri Net; Communication
Networks;
1. Introduction
In an ideal world, a communication network would be working perfectly at any time. In the real world,
this is not the case. Random failures affect the network and may cause service outages. Failures may be due
to physical causes, like fiber cuts, power outages, fires and earthquakes. Other types of failures may be
software failures or failures resulting from unintentional human errors. The possibility of avoiding failures,
which may put the business at risk, is an interesting track to be followed by organizations. The design,
deployment and management of communication networks infrastructure ought to meet such requirements.
In the last few years, much has been done to deal with issues relating to dependability of communication
networks. Researchers have used different approaches to deal with these problems.
[1] presents a systematic approach for quantifying the reliability of enterprise VoIP networks. [1]
provides an enhanced method and procedure of reliability calculation, using a network matrix representation
and RBD (Reliability Block Diagram). [5] discusses algorithmic methods to obtain network availability
values in a given topology and presents two tools for computation of network availability in large and
complex networks. [3] presents a new classification of dependability and security models for systems and
networks. And then it presents several individual model types such as availability, confidentiality, integrity,
performance, reliability, survivability, safety and maintainability. The dependability/security models can be
represented as combinatorial models, state-space models, and hierarchical models.
In this paper we focus on dependability aspects of redundant communication networks. Three
architectures and their redundancy mechanisms are evaluated in different scenarios using dependability
Stochastic Petri Net (SPN) [2] models. These architectures were constructed to evaluate the system
availability according to different parameters and network equipments. The model parameters used were
12
obtained from measurements in a real system. We also used values provided by manufactures of network
equipments.
The rest of the paper is organized as follows: Section 2 describes the architectures used in this work.
Section 3 describes the proposed dependability models. Section 4 presents the combination of dependability
models analysis and reliability importance values for a range of different architectures. Finally, Section 5
discusses the results of this study and introduces further ideas for future research.
2. Platform Description
Our work is based on three architectures which are shown below to obtain dependability results.
2.1. First and Second Architectures
In the first architecture, the testbed is composed of two machines, a switch and two routers (they are
named input router, R0, and output router, R2) that are connected by a single link (L0 - see Figure 1). If a
component fails, the system goes down. In the second architecture, the testbed is composed of two machines,
a switch and two routers that are connected by redundant links (L0 and L1 - see Figure 1). They are named
input router (R0) and output router (R2). When the primary link (L0) fails, the spare link (L1) assumes the
role of the primary one. After restoration, the system returns to the initial condition.
Fig. 1: First and Second Architectures.
2.2. Third Architecture
In this architecture, the testbed is composed of two machines, a switch and three routers (see Figure 2).
They are named input routers (R0 and R1) and output router (R2). When one of the primary components (R0
or L0) fails, the spare components (R1 and L1) assume the role of the primary components. This switchover
process takes a time period that represents the spare components starting operation. After restoration, the
system returns to the initial condition.
Fig. 2: Third Architecture.
3. Proposed Dependability Models
In this section we present three dependability SPN models (Figures 3, 4 and 5). The dependability
models can be evaluated using TimeNet (http://www.tu-ilmenau.de/fakia/8086.html) tool.
3.1. Dependability Model – First Architecture
Fig. 3: Dependability Model - First Architecture.
13
The model considers a system that has no redundancy. The model includes six places, which are R0_ON,
R0_OFF, with corresponding places of L0 and R2 components. Places R0_ON and R0_OFF, along with
their corresponding pairs, model component’s activity and inactivity states, respectively. These components
have two parameters, namely Mean Time To Failure (MTTF) and Mean Time To Repair (MTTR), which
represent delays associated to corresponding exponential transitions X_MTTF and X_MTTR. In this section,
label ”X” must be instantiated according to the component name (see Figure 3). This model can compute the
availability of the system through SAFA variable (System_Availability_First_Architecture - SAFA =
P{(#R0_ON>0) AND (#L0_ON>0) AND (#R2 _ON>0)}). This expression represents the probability that
the system is up.
3.2. Dependability Model – Second Architecture
The model considers a system that has redundancy at the link level based on the so-called warm-standby
redundancy approach. The model includes ten places, which are R0_ON, R0_OFF, with corresponding
places of L0, ASL0 (Active_Spare_L0), SL0 (Spare_L0) and R2 components. SL0 and ASL0 represent the
spare component of L0 in following situations: SL0 represents the spare component of L0 in active and not
operational state. ASL0 represents the spare component of L0 in active and operational state. Places R0_ON
and R0_OFF, along with their corresponding pairs, model component’s activity and inactivity states,
respectively. These components have two parameters, namely MTTF and MTTR, which represent delays
associated to exponential transitions X_MTTF and X_MTTR (see Figure 4). The spare component, in active
and not operational state, has a value of delay (SL0_MTTF) 50% higher than exponential transition
ASL0_MTTF. Exponential transition ACTSP represents the spare component starting operation. This time
period (delay), in hours, is named Mean Time to Activate (MTA). As the primary component fails, the
transition ACTSP is fired. This transition firing represents the spare component taking over the failed one. In
turn, immediate transition DCTSP represents the return to normal operation.
Fig. 4: Dependability Model - Second Architecture.
This model can compute the availability of the system through SASA variable
(System_Availability_Second_Architecture - SASA = P{(#R0_ON=1) AND (#L0_ON=1 OR #ASL0_ON=1)
AND (#R2_ON=1)}). This expression represents the probability that the system is up.
3.3. Dependability Model – Third Architecture
The model includes ten relevant places, which are R0_ON, R0_OFF, with corresponding places of R1,
R2, L0 and L1 components. Places R0_ON and R0_OFF, along with their corresponding pairs, model
component’s activity and inactivity states, respectively. These components have two parameters, namely
MTTF and MTTR, which represent delays associated to corresponding exponential transitions X_MTTF and
X_MTTR (see Figure 5). Exponential transitions ACTSPRT and ACTSPLK represent the spare components
starting operation. This time period, in hours, is named MTA. As the primary components fail, the transitions
ACTSPRT or ACTSPLK are fired. A firing in one of these transitions represent the spare component taking
over the failed one. In turn, immediate transition DCTSP represents the return to normal operation. This
model
can
compute
the
availability
of
the
system
through
SATA
variable
(System_Availability_Third_Architecture - SATA = P{((#R0_ON=1 AND #L0_ON=1) OR (#R1_ON=1
AND #L1_ON=1)) AND (#R2_ON=1)}). This expression represents the probability that the system is up.
14
Fig. 5: Dependability Model - Third Architecture.
4. Case Study
In this section, we consider the architectures in Section 2. Our goal is to compute the network
dependability using the proposed SPN models (see Section 3). The network equipment MTTF used in this
work is: Component 1, 131,000 hours. The value of MTTR of twelve hours and a WAN link MTTF of 1,188
hours are used. For MTA, a value of 0.0027 hours for the second architecture and a value of 0.0416 for the
third architecture are used. These values shall be considered as the base case throughout this section, unless
another value is specified in each scenario. Since the router failures occur much less frequently than WAN
link failures, the link MTTF is an important point for the whole system availability. Figure 6 shows the
system availability as a function of the link MTTF, in hours, in different architectures (see Section 2). First
architecture shows the lowest values of system availability. However, as the link MTTF grows, the effects of
redundancy will be less visible.
Av ailabi lity
1 .0 0 0
0 .9 9 5
0 .9 9 0
0 .9 8 5
1 s t A r c h i t e c t u r e
2 n d A r c h i t e c t u r e
3 r d A r c h i t e c t u r e
0 .9 8 0
0 .9 7 5
400
500
600
700
M
T
800
T F
900
( H )
1000
1100
1200
Fig. 6: System availability according to link MTTF for different architectures.
(a) 1st architecture
(b) 2nd architecture
(c) 3rd architecture
Fig. 7: RBD Models of each Architecture.
1 .0 0
Avai l abi l i t y
0 .9 8
0 .9 6
0 .9 4
0 .9 2
0 .9 0
0 .8 8
0
2 00
4 00
6 00
8 00
M T T F ( L0 )
100 0
120 0
Fig. 8: System availability in accordance with MTTF (L0) - First Architecture.
15
System reliability is directly related to MTTF parameter, and the system availability is directly related to
MTTF and MTTR parameters. Since MTTF >> MTTR, then we can also evaluate the system availability in
terms of the components that have the highest Reliability Importance [6] values. In this work, Reliability
Importance is calculated using Astro software package (http://www.cin.ufpe.br/~bs/DesdacToolDownload)
considering the system stationary state 1 in each architecture. Figure 7 shows the RBD models used to
calculate the Reliability Importance values. Figures 7(a), 7(b) and 7(c) show the RBD model of the first,
second and third architecture, respectively. Table 1 shows the Reliability Importance values. In the
considered time, all the architectures were in stationary state. Link L0, in the first architecture, and links
L0/L1 in second and third architectures, assume the greatest Reliability Importance values in system
stationary state. Any change in these components will have a major impact on system availability. Using the
dependability SPN models (see Section 3), we show the system availability variation in accordance with the
primary components parameters. The importance of links L0 and L1 is confirmed in Figures 8 (Link L0 –
First architecture) and 9 (Links L0/L1 – Architecture 2, scenario 1, L0 variation, scenario 2, L1 variation,
and architecture 3, scenario 3, L0 variation, scenario 4, L1 variation). In turn, Figure 10 shows that routers
have a smaller impact on system availability.
5. Conclusions
In this paper, we propose SPN models to evaluate several dependability aspects of communication
networks in different architectures. A combination of dependability models and reliability importance are
used to analyze the system availability according to the most important components. We also show the
variation of system availability in accordance with MTTF. For future work, we plan to extend these models
to include software failures and different repair policies.
Av ailability
0 .9 9 9 8
0 .9 9 9 7
0 .9 9 9 6
S
S
S
S
0 .9 9 9 5
0 .9 9 9 4
500
600
700
800
M T
T
900
1000
1100
F ( L0 / L1 )
c
c
c
c
e
e
e
e
n
n
n
n
1200
a
a
a
a
r
r
r
r
i
i
i
i
o
o
o
o
1
2
3
4
1300
1400
Fig. 9: System availability in accordance with L0/L1 MTTF – Second and third architectures.
Availability
1 .0 0 0
0 .9 9 8
0 .9 9 6
S c e n a r io 1
S c e n a r io 2
S c e n a r io 3
0 .9 9 4
0 .9 9 2
0 .9 9 0
13 0 0 0 0
1 4 0 00 0
1 5 0 00 0
1 6 0 00 0
M T T F ( R0 / R0 / R2 )
1 7 0 00 0
1 8 0 00 0
Fig. 10: System availability in accordance with router MTTF – First, second and third architectures.
Table 1: Reliability Importance Values
Architecture
1
2
3
IiB(L0)
1.0
1.0
1.0
IiB(L1)
-1.0
1.0
IiB(R0)
0.00126
0.00253
0.00126
IiB(R1)
0.00126
6. References
1
State in which the order of the Reliability Importance values will not change
16
IiB(R2)
0.00126
0.00253
0.00253
[1]
C.H.K. Chu, H. Pant, S.H. Richman and P. Wu :Enterprise VoIP Reliability”, Proc. 12th International
Telecommunications Network Strategy and Planning Symposium, NETWORKS 2006, pages 1 - 6, 2007.
[2] G. Bolch, S. Greiner, H. De Meer and K.S. Trivedi :Queuing Networks and Markov Chains: Modelling and
Performance Evaluation with Computer Science Applications, ISBN 0-471-19366-6, 2nd edition, John Wiley &
Sons, Wiley-Interscience, New York, NY, USA, 1998.
[3] K.S. Trivedi, D.S. Kim, A. Roy,, Medhi, D. :Dependability and Security Models, Proc. DRCN 2009.
[4] W. Kuo,M.J. Zuo : “Optimal Reliability Modeling: Principles and Applications”, ISBN 0-471-39761-X, John
Wiley & Sons,Inc. Hoboken, New Jersey, 2003.
[5] W. Zou, M. Janic, R. Kooij and F. Kuipers :”On the availability of networks”, In BroadBand Europe 2007,
Antwerp, Belgium, 3-6 December 2007.
17
Download