2011 International Conference on Information and Network Technology IPCSIT vol.4 (2011) © (2011) IACSIT Press, Singapore Dependability Analysis in Redundant Communication Networks using Reliability Importance Almir Guimarães1, Paulo Maciel 2, Rubens Matos Jr2. and Kádna Camboim2 1 Campus Arapiraca, Federal University of Alagoas Arapiraca, AL, Brazil. Email: almir@arapiraca.ufal.br 2 Center of Informatics, Federal University of Pernambuco Recife, PE, Brazil. Email: apg2, prmm, rsmj, kmac@cin.ufpe.br Abstract. In this paper, we investigate the dependability modeling in redundant communication networks. We use Stochastic Petri Net as an enabling modeling approach for analytical evaluation of complex scenarios. We apply our proposed modeling approach in a case study to evaluate the availability of enterprise networks in different architectures. These architectures consider different redundancy mechanisms. A combination of dependability models and reliability importance are used to analyze the system availability according to the most important components. We also show the variation of system availability in accordance with Mean Time To Failure in different architectures. Keywords: Dependability; Availability; Reliability Importance; Stochastic Petri Net; Communication Networks; 1. Introduction In an ideal world, a communication network would be working perfectly at any time. In the real world, this is not the case. Random failures affect the network and may cause service outages. Failures may be due to physical causes, like fiber cuts, power outages, fires and earthquakes. Other types of failures may be software failures or failures resulting from unintentional human errors. The possibility of avoiding failures, which may put the business at risk, is an interesting track to be followed by organizations. The design, deployment and management of communication networks infrastructure ought to meet such requirements. In the last few years, much has been done to deal with issues relating to dependability of communication networks. Researchers have used different approaches to deal with these problems. [1] presents a systematic approach for quantifying the reliability of enterprise VoIP networks. [1] provides an enhanced method and procedure of reliability calculation, using a network matrix representation and RBD (Reliability Block Diagram). [5] discusses algorithmic methods to obtain network availability values in a given topology and presents two tools for computation of network availability in large and complex networks. [3] presents a new classification of dependability and security models for systems and networks. And then it presents several individual model types such as availability, confidentiality, integrity, performance, reliability, survivability, safety and maintainability. The dependability/security models can be represented as combinatorial models, state-space models, and hierarchical models. In this paper we focus on dependability aspects of redundant communication networks. Three architectures and their redundancy mechanisms are evaluated in different scenarios using dependability Stochastic Petri Net (SPN) [2] models. These architectures were constructed to evaluate the system availability according to different parameters and network equipments. The model parameters used were 12 obtained from measurements in a real system. We also used values provided by manufactures of network equipments. The rest of the paper is organized as follows: Section 2 describes the architectures used in this work. Section 3 describes the proposed dependability models. Section 4 presents the combination of dependability models analysis and reliability importance values for a range of different architectures. Finally, Section 5 discusses the results of this study and introduces further ideas for future research. 2. Platform Description Our work is based on three architectures which are shown below to obtain dependability results. 2.1. First and Second Architectures In the first architecture, the testbed is composed of two machines, a switch and two routers (they are named input router, R0, and output router, R2) that are connected by a single link (L0 - see Figure 1). If a component fails, the system goes down. In the second architecture, the testbed is composed of two machines, a switch and two routers that are connected by redundant links (L0 and L1 - see Figure 1). They are named input router (R0) and output router (R2). When the primary link (L0) fails, the spare link (L1) assumes the role of the primary one. After restoration, the system returns to the initial condition. Fig. 1: First and Second Architectures. 2.2. Third Architecture In this architecture, the testbed is composed of two machines, a switch and three routers (see Figure 2). They are named input routers (R0 and R1) and output router (R2). When one of the primary components (R0 or L0) fails, the spare components (R1 and L1) assume the role of the primary components. This switchover process takes a time period that represents the spare components starting operation. After restoration, the system returns to the initial condition. Fig. 2: Third Architecture. 3. Proposed Dependability Models In this section we present three dependability SPN models (Figures 3, 4 and 5). The dependability models can be evaluated using TimeNet (http://www.tu-ilmenau.de/fakia/8086.html) tool. 3.1. Dependability Model – First Architecture Fig. 3: Dependability Model - First Architecture. 13 The model considers a system that has no redundancy. The model includes six places, which are R0_ON, R0_OFF, with corresponding places of L0 and R2 components. Places R0_ON and R0_OFF, along with their corresponding pairs, model component’s activity and inactivity states, respectively. These components have two parameters, namely Mean Time To Failure (MTTF) and Mean Time To Repair (MTTR), which represent delays associated to corresponding exponential transitions X_MTTF and X_MTTR. In this section, label ”X” must be instantiated according to the component name (see Figure 3). This model can compute the availability of the system through SAFA variable (System_Availability_First_Architecture - SAFA = P{(#R0_ON>0) AND (#L0_ON>0) AND (#R2 _ON>0)}). This expression represents the probability that the system is up. 3.2. Dependability Model – Second Architecture The model considers a system that has redundancy at the link level based on the so-called warm-standby redundancy approach. The model includes ten places, which are R0_ON, R0_OFF, with corresponding places of L0, ASL0 (Active_Spare_L0), SL0 (Spare_L0) and R2 components. SL0 and ASL0 represent the spare component of L0 in following situations: SL0 represents the spare component of L0 in active and not operational state. ASL0 represents the spare component of L0 in active and operational state. Places R0_ON and R0_OFF, along with their corresponding pairs, model component’s activity and inactivity states, respectively. These components have two parameters, namely MTTF and MTTR, which represent delays associated to exponential transitions X_MTTF and X_MTTR (see Figure 4). The spare component, in active and not operational state, has a value of delay (SL0_MTTF) 50% higher than exponential transition ASL0_MTTF. Exponential transition ACTSP represents the spare component starting operation. This time period (delay), in hours, is named Mean Time to Activate (MTA). As the primary component fails, the transition ACTSP is fired. This transition firing represents the spare component taking over the failed one. In turn, immediate transition DCTSP represents the return to normal operation. Fig. 4: Dependability Model - Second Architecture. This model can compute the availability of the system through SASA variable (System_Availability_Second_Architecture - SASA = P{(#R0_ON=1) AND (#L0_ON=1 OR #ASL0_ON=1) AND (#R2_ON=1)}). This expression represents the probability that the system is up. 3.3. Dependability Model – Third Architecture The model includes ten relevant places, which are R0_ON, R0_OFF, with corresponding places of R1, R2, L0 and L1 components. Places R0_ON and R0_OFF, along with their corresponding pairs, model component’s activity and inactivity states, respectively. These components have two parameters, namely MTTF and MTTR, which represent delays associated to corresponding exponential transitions X_MTTF and X_MTTR (see Figure 5). Exponential transitions ACTSPRT and ACTSPLK represent the spare components starting operation. This time period, in hours, is named MTA. As the primary components fail, the transitions ACTSPRT or ACTSPLK are fired. A firing in one of these transitions represent the spare component taking over the failed one. In turn, immediate transition DCTSP represents the return to normal operation. This model can compute the availability of the system through SATA variable (System_Availability_Third_Architecture - SATA = P{((#R0_ON=1 AND #L0_ON=1) OR (#R1_ON=1 AND #L1_ON=1)) AND (#R2_ON=1)}). This expression represents the probability that the system is up. 14 Fig. 5: Dependability Model - Third Architecture. 4. Case Study In this section, we consider the architectures in Section 2. Our goal is to compute the network dependability using the proposed SPN models (see Section 3). The network equipment MTTF used in this work is: Component 1, 131,000 hours. The value of MTTR of twelve hours and a WAN link MTTF of 1,188 hours are used. For MTA, a value of 0.0027 hours for the second architecture and a value of 0.0416 for the third architecture are used. These values shall be considered as the base case throughout this section, unless another value is specified in each scenario. Since the router failures occur much less frequently than WAN link failures, the link MTTF is an important point for the whole system availability. Figure 6 shows the system availability as a function of the link MTTF, in hours, in different architectures (see Section 2). First architecture shows the lowest values of system availability. However, as the link MTTF grows, the effects of redundancy will be less visible. Av ailabi lity 1 .0 0 0 0 .9 9 5 0 .9 9 0 0 .9 8 5 1 s t A r c h i t e c t u r e 2 n d A r c h i t e c t u r e 3 r d A r c h i t e c t u r e 0 .9 8 0 0 .9 7 5 400 500 600 700 M T 800 T F 900 ( H ) 1000 1100 1200 Fig. 6: System availability according to link MTTF for different architectures. (a) 1st architecture (b) 2nd architecture (c) 3rd architecture Fig. 7: RBD Models of each Architecture. 1 .0 0 Avai l abi l i t y 0 .9 8 0 .9 6 0 .9 4 0 .9 2 0 .9 0 0 .8 8 0 2 00 4 00 6 00 8 00 M T T F ( L0 ) 100 0 120 0 Fig. 8: System availability in accordance with MTTF (L0) - First Architecture. 15 System reliability is directly related to MTTF parameter, and the system availability is directly related to MTTF and MTTR parameters. Since MTTF >> MTTR, then we can also evaluate the system availability in terms of the components that have the highest Reliability Importance [6] values. In this work, Reliability Importance is calculated using Astro software package (http://www.cin.ufpe.br/~bs/DesdacToolDownload) considering the system stationary state 1 in each architecture. Figure 7 shows the RBD models used to calculate the Reliability Importance values. Figures 7(a), 7(b) and 7(c) show the RBD model of the first, second and third architecture, respectively. Table 1 shows the Reliability Importance values. In the considered time, all the architectures were in stationary state. Link L0, in the first architecture, and links L0/L1 in second and third architectures, assume the greatest Reliability Importance values in system stationary state. Any change in these components will have a major impact on system availability. Using the dependability SPN models (see Section 3), we show the system availability variation in accordance with the primary components parameters. The importance of links L0 and L1 is confirmed in Figures 8 (Link L0 – First architecture) and 9 (Links L0/L1 – Architecture 2, scenario 1, L0 variation, scenario 2, L1 variation, and architecture 3, scenario 3, L0 variation, scenario 4, L1 variation). In turn, Figure 10 shows that routers have a smaller impact on system availability. 5. Conclusions In this paper, we propose SPN models to evaluate several dependability aspects of communication networks in different architectures. A combination of dependability models and reliability importance are used to analyze the system availability according to the most important components. We also show the variation of system availability in accordance with MTTF. For future work, we plan to extend these models to include software failures and different repair policies. Av ailability 0 .9 9 9 8 0 .9 9 9 7 0 .9 9 9 6 S S S S 0 .9 9 9 5 0 .9 9 9 4 500 600 700 800 M T T 900 1000 1100 F ( L0 / L1 ) c c c c e e e e n n n n 1200 a a a a r r r r i i i i o o o o 1 2 3 4 1300 1400 Fig. 9: System availability in accordance with L0/L1 MTTF – Second and third architectures. Availability 1 .0 0 0 0 .9 9 8 0 .9 9 6 S c e n a r io 1 S c e n a r io 2 S c e n a r io 3 0 .9 9 4 0 .9 9 2 0 .9 9 0 13 0 0 0 0 1 4 0 00 0 1 5 0 00 0 1 6 0 00 0 M T T F ( R0 / R0 / R2 ) 1 7 0 00 0 1 8 0 00 0 Fig. 10: System availability in accordance with router MTTF – First, second and third architectures. Table 1: Reliability Importance Values Architecture 1 2 3 IiB(L0) 1.0 1.0 1.0 IiB(L1) -1.0 1.0 IiB(R0) 0.00126 0.00253 0.00126 IiB(R1) 0.00126 6. References 1 State in which the order of the Reliability Importance values will not change 16 IiB(R2) 0.00126 0.00253 0.00253 [1] C.H.K. Chu, H. Pant, S.H. Richman and P. Wu :Enterprise VoIP Reliability”, Proc. 12th International Telecommunications Network Strategy and Planning Symposium, NETWORKS 2006, pages 1 - 6, 2007. [2] G. Bolch, S. Greiner, H. De Meer and K.S. Trivedi :Queuing Networks and Markov Chains: Modelling and Performance Evaluation with Computer Science Applications, ISBN 0-471-19366-6, 2nd edition, John Wiley & Sons, Wiley-Interscience, New York, NY, USA, 1998. [3] K.S. Trivedi, D.S. Kim, A. Roy,, Medhi, D. :Dependability and Security Models, Proc. DRCN 2009. [4] W. Kuo,M.J. Zuo : “Optimal Reliability Modeling: Principles and Applications”, ISBN 0-471-39761-X, John Wiley & Sons,Inc. Hoboken, New Jersey, 2003. [5] W. Zou, M. Janic, R. Kooij and F. Kuipers :”On the availability of networks”, In BroadBand Europe 2007, Antwerp, Belgium, 3-6 December 2007. 17