See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/324531183 Petri Net Based Reliability Analysis of Thermoelectric Plant Cooling Tower System: Effects of Operational Strategies on System Reliability and Availability Conference Paper · April 2018 CITATIONS READS 7 1,903 5 authors, including: Adherbal Caminada Netto Arthur H. A. Melani University of São Paulo University of São Paulo 31 PUBLICATIONS 147 CITATIONS 49 PUBLICATIONS 230 CITATIONS SEE PROFILE Carlos Murad São Paulo State University 21 PUBLICATIONS 116 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Reliability Centered Maintenance of a Coal-fired Power Plant View project Hydropower Plants Reliability Improvement View project All content following this page was uploaded by Arthur H. A. Melani on 10 June 2018. The user has requested enhancement of the downloaded file. SEE PROFILE Proceedings of the joint ICVRAM ISUMA UNCERTAINTIES conference Florianópolis, SC, Brazil, April 8-11, 2018 Petri Net Based Reliability Analysis of Thermoelectric Plant Cooling Tower System: Effects of Operational Strategies on System Reliability and Availability A. Caminada Netto1, A. H. A. Melani1, C. A. Murad1, S. I. Nabeta1, G. F. M. Souza1 1 Escola Politécnica de São Paulo, Av. Prof. Mello Moraes, 2231 – CEP 05508-010, adherbal@usp.br, melani@usp.br, carlos.murad@usp.br, gfmsouza@usp.br Keywords. Reliability Analysis, Petri Net, Cooling Tower, Coal-Fired Power Plant. Abstract. The reliability analysis is a set of studies that characterize the behavior of a system, with respect to the occurrence of failures. One of the ways to carry out such analysis is through the development of a model of the system under study. The results obtained from the reliability analysis contribute to the decision making related to both the systems design and maintenance planning. For systems with great number of components, where usually components failure cannot be considered independent, a more complex reliability analysis tool must be used such as state machines. Stochastic Petri Net (SPN) is a mathematical modeling language widely used in different areas of knowledge, especially when it is necessary to represent nondeterministic phenomena. SPN is currently being used for systems reliability (R), availability (A), risk and performance analysis in situations where the components failure and repair rates cannot be modeled as time independent. This paper proposes the development of a reliability analysis based on Stochastic Petri Nets for the cooling tower system of a coal based thermoelectric power plant. The cooling tower system plays a fundamental role in the operation of a coal based thermoelectric plant, since it is one of the systems responsible for the thermodynamic balance of the plant. This system failure usually affects the power plant output. There are other techniques that can be used in reliability analysis, but this paper will also deal with Functional Tree and Functional Description, in order to support the Petri net model development. The cooling tower system is composed of identical cells for which components failure modes and reliability can be estimated. The obtained SPN model will be tested and validated through failures historical data. Once the model is developed, the possible operational strategies of the cooling tower system can be evaluated to attend the heat load as design intent. This evaluation aims at the optimization of the system reliability. Uncertainties in loading conditions and units failure modes are evaluated considering variations in cells reliability. Aiming at long term availability analysis, the influence of number of repair maintenance teams and application of predictive maintenance policy are also evaluated. The analyses allow the definition of the most suitable operational configuration and repair resources aiming at achieving target availability. 1 INTRODUCTION Supplying electricity has become very competitive over the years. Maintenance, availability and reliability are some of the most important factors of steam-process plant. Power plants are regarded as the core module of power systems and are responsible for producing power to be transmitted and distributed to the end consumers. Power plants need to be available and reliable all the time, and so do their constituent components, (Sabouhi, et al., 2016). If the power generation stations are not well maintained and reliably operated, a significant amount of damages would be possibly imposed to the society as a consequence of power shortage. Any improvement in system reliability is connected to an imposed amount of cost; the reliability enhancement is justified to the extent that the system unavailability cost is more than that of the normal service provided. The main goal of maintenance consists in finding balances between costs and time of maintenance or the most appropriate moment to execute maintenance, (Burhanuddin, et al., 2014). In order to decrease costs of electricity production, companies have to take every new methodology available to apply on their internal procedures to achieve better key performance indicators (KPI’s) and therefore, improve efficiency, quality and reliability of its repairable systems. In other words, produce the same amount of energy with fewer expenses. A repairable system is defined as a system that can be restored to fully satisfactory operation by parts replacements or adjustments when failing to perform one or more of its functions satisfactorily, (Pamuk and Uyaroglu, 2012). There are some attempts in the literature to develop more realistic techniques using simulation models for reliability and availability analysis of systems, (Rao and Naikan, 2014). This study proposes first the understanding of the system under study by using Functional Tree (FT) and Functional Description (FD) and then applies Stochastic Petri Net (SPN). It intends to advance the understanding of Petri net application for reliability analysis and demonstrate how maintenance team availability can play an important role on a steam-process plant scenario, (Lee and Lu, 2012). Such models are useful for simulation purposes in case of complex systems; they are often defined as stochastic Petri nets. For this reason, an identification of an event systems based on the analysis of state sequences that can be observed when the system is working, (Leclercq, et al., 2009). A. Caminada Netto, A. H. A. Melani, C. A. Murad, S. I. Nabeta, G. F. M. Souza. Cooling Tower Analysis Using Petri Net for Reliability and Availability 2 COOLING TOWER BASICS Cooling towers are heat exchangers that use water and air to transfer heat. Water to be cooled is distributed in the tower by spray nozzles, splash bars or film-type fill, which exposes a very large water surface area to atmospheric air. In a coal fired power plant, this water passes through the condenser to condensate the steam rejection from low pressure turbine. As the hot water flows through the tower, the heat is rejected to the ambient air (heat is dissipated to the atmosphere through the evaporative process) as a result the hot water is cooled, first through evaporation of a small percentage of the total water flow. Evaporation is a process by which heat is absorbed by air and the remaining condenser water is cooled to the desired exit temperature, as specifically designed for the cooling tower location, (Stanford III, 2003). They are often neglected by operation and maintenance technicians, most of the time because of the hard access to tower, resulting in low cooling efficiency. They are common in industries such as oil refining, chemical processing, power plants, steel mills, and many different manufacturing processes that require cooled water (Kiran and Muthukumar, 2017). The make-up water source is used to replenish water lost to evaporation. Hot water from heat exchangers is also sent to the cooling tower. The water exits the cooling tower and is sent back to the exchangers or to other units for further cooling. Figure 1 shows a schematic arrangement of a cooling tower application in a coal fired power plant with six cells only, where it is used to condensate exhaust steam from the low pressure turbine. Figure 1: Cooling Tower System. According to (SPX Cooling Technologies, 2017), each cell contains the basic components such as concrete structure or frame to support the exterior enclosures fan, motor, speed reducers (gearboxes). Most towers employ fills (made of plastic, wood or even ceramic) to promote the heat transfer between air and water. The cold-water basin is located at the bottom of the tower and receives the cooled water that flows down through the tower and fills. Drift eliminators capture water droplets entrapped in the air stream so it will not be lost to the atmosphere. An air inlet area is where the air enters the tower. Nozzles spray the warm water through the fills uniformly, which is essential to achieve a good heat transfer. Fans are used to move large volumes of air efficiently and with minimum vibration. Speed reducers are used because optimum speed of a cooling tower fan seldom coincides with the most efficient of the driver (motor). Electric motors are used to drive the fans on cooling towers on this study; they must be reliable under extremely adverse conditions. The efficiency of cooling towers depends on the heat rejection load with which the tower must operate. During design phase the manufacturer considers the heat transfer surface area, ambient wet bulb temperature, time duration the water is exposed to the air flow and the volume of airflow to the water, (Carazas and Souza, 2009). During operation, the quality of the water is one very important control the plant maintenance staff needs to have in mind; a proper treatment of the circulating water for biological control and corrosion must be in accordance with accepted industry practice. When water is evaporated or lost from a cooling tower, the solids and chemicals used to treat the water remain in the system, when water is bled from the system, chemicals lost through the bleed need to be replaced so the system remains protected. If the water is left unchecked, the system would lead to solids build up that would cause scale, corrosion, biological growth and sludge, not to mention loss of efficiency of the heat transfer. The main components of the cooling tower are: Proceedings of the joint ICVRAM ISUMA UNCERTAINTIES conference Florianópolis, SC, Brazil, April 8-11, 2018 -Water Distribution System (Hot and Cooled): hot water from the heat exchangers is delivered to the top of the cooling tower by condenser pump through distribution piping. Hot water is sprayed through nozzles on the heat transfer (fills) inside the cooling tower; it falls through the fill and reaches the basin cooled, ready to go back to the cycle again. -Heat Transfer (Fill): hot water from the heat exchangers is slowed down and spread out over the fill. Some of the hot water is evaporated in the fill area, which cools the water. Cooling tower fill is typically arranged in packs of thin corrugated plastic sheets supported by a framework of spaced bars. -Air Flow System: the cooling tower fun generates large volumes of air flowing through the heat transfer (fill). The size of the fun and air flow rate is selected to achieve the designed conditions of temperature, water flow rate and wetbulb temperature. -Water Treatment System: cooling water must be regularly treated with chemicals, to prevent the growth of bacteria and minimize corrosion, and inhibit the buildup of scale on the fill and piping system. Cooling tower is one of the components of large scale water cooled in coal fired power plants; its thermal efficiency can influence the coal consumption to a large extent. Efficiency operation and thermal performance of a cooling tower depend not only on mechanical maintenance, but also on cleanliness of the entire system. 3 METHODOLOGY PROPOSAL The methodology proposed will make a reliability analysis of the cooling tower in a coal-fired power plant. Figure 2 shows the steps proposed for this study. Reliability methods have been established to take into account the uncertainties involved in the analysis of an engineering problem. Figure 2: Methodology Proposed for Reliability Analysis. The failure probability and the reliability index are used to quantify risks and therefore evaluate the consequences of failure. A component functions for a certain period of time, and then it fails. It is repaired and then put back into operation again, and the whole process is repeated. A component failure occurs when it cannot perform its required function; the presence of this failure may cause the whole system to deviate from its required operation. Let MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair) represent the expected time to failure and the expected duration of the repair of the component, respectively. Both metrics can be expressed by Eq. (1) and Eq. (2): A. Caminada Netto, A. H. A. Melani, C. A. Murad, S. I. Nabeta, G. F. M. Souza. Cooling Tower Analysis Using Petri Net for Reliability and Availability = ܨܤܶܯ = ܴܶܶܯ ்௧ ் (1) ே௨ ௗ௪௦ ்௧ ் (2) ே௨ ௗ௪௦ MTBF is a basic measure of a system’s reliability. It is typically represented in units of hours. The higher the MTBF number is, the higher the reliability of the product or system. Equation (3) illustrates this relationship. ܴሺݐሻ = ݁ ିಾಳಷ (3) MTBF impacts both reliability and availability. Before MTBF methods can be explained, it is important to understand these concepts. The difference between reliability and availability is often unknown or misunderstood. High availability and high reliability often go hand in hand, but they are not interchangeable terms (Torell and Avelar, 1997). According to IEEE, 1990; reliability and availability can be defined as: Reliability: is the ability of a system or component to perform its required functions under stated conditions for a specified period of time. It is the likelihood that the system or component will succeed within its identified mission time, with no failures. Availability: is the degree to which a system or component is operational and accessible when required for use. It is the likelihood that the system or component is in a state to perform its required function under given conditions at a given instant in time. In short, availability tells information about how you use time and reliability tells information about the failure-free interval. Both are described in % values. MTTR is the expected time to recover a system from a failure. This may include the time it takes to diagnose the problem, the time it takes to get a repair technician onsite, and the time it takes to physically repair the system, it is represented in units of hours. Many people will look at the MTBF of a product and make assumptions. For example, if an MTBF is very high in hours, one may think that this system will have a long time before having to replace their product. That is not the case. MTBF is the average time between system failures of the entire sample population. It means that a component can experience failure before or even after this average. Assumptions are required to simplify the process of estimating MTBF, but they must be realistic. It would be nearly impossible to collect the data required to calculate an exact number. There are some traps when dealing with MTBF and MTTR calculation. This type of analysis always requires the removal of some dirty data, such as planned inspections or unplanned stop due to any other reason other than the cooling tower, and even the root cause analysis needs to be considered. Another common mistake is that, some companies do not consider a failure anything that takes less than a work day to fix. It is a misguided way to conduct this indicator and it will never show the truth about the system or equipment. Organizations seek to achieve higher MTBF and lower MTTR and these two metrics are important key performance indicators to measure a system performance. The failure data for the six cells in the cooling tower were obtained from the coal-fired power plant maintenance records and a computerized maintenance management system, this way MTBF and MTTR could be calculated. In complex and repairable systems, failures are considered to be those when the system does not meet the design intent. Even working within their correct operating environment, individual components fail randomly. It puts the system out of service and places it into a state of repair. Before starting any reliability analysis, one needs to fully understand the system to be analyzed, so the first phase of this methodology is the construction of a Functional Tree. The objective of the Functional Tree is to structure, in a logical and hierarchical way, the interdependence between the different components of the system under study, in order to expose how each one performs its functions, and then the functions for all equipments are listed in the Functional Description. After that the system SPN model can be tested by using MTBF and MTTR data and the software TimeNet4.0 for reliability analysis and availability. 3.1 Functional Tree and Functional Description Although all cooling towers have essentially the same systems as air circulating, hot water supply, fill, cold water supply, cooling tower structure (most of the time concrete or steel structure) and water treatment, each cooling tower design possesses its specific characteristics introduced by the design team. Therefore, a functional tree must be developed for each specific cooling tower. Functional tree is a diagram that shows the interdependencies among systems, breaking them into single components. The purpose of a functional tree is to structure; in a logical and hierarchical way, in order to expose how each one of the components performs their functions. Figure 3 shows in a shorter version the functional tree for the cooling tower under study (Souza, 2012). Proceedings of the joint ICVRAM ISUMA UNCERTAINTIES conference Florianópolis, SC, Brazil, April 8-11, 2018 Figure 3: Basic Functional Tree for a Cooling Tower. If failures are observed at the bottom of the functional tree, it could affect a subsystem and then the cooling tower will not deliver the water at designed temperature. In other words, water will be warmer and when it passes through the condenser, it will not remove heat as designed. It causes a pressure increase in the condenser (vacuum loss), which can lead to more coal consumption. The next phase on this study is to have the function description for each one of these components. At each level of system resolution, the system engineer needs to understand the full implications of the goals and constraints to formulate a representative system. It is accomplished by performing a functional description. Functional description is the systematic process of identifying, describing, and relating the functions a system must perform to fulfill its goals and objectives. It is a method for understanding product functions in complex systems by converting the activity performed in a system to functions, (NASA, 1995). In the function description, the main function of each component and subsystem is listed, according to the function tree. As an example, Fig. 4 shows the description of only one system from the cooling tower: Hot Water Supply System. Figure 4: Functional Description of the Hot Water Supply System. A. Caminada Netto, A. H. A. Melani, C. A. Murad, S. I. Nabeta, G. F. M. Souza. Cooling Tower Analysis Using Petri Net for Reliability and Availability 3.2 Stochastic Petri Net (SPN) Petri net was originated by Carl Adam Petri in 1962; it was initially used as a general purpose mathematical tool to describe the casual relationships between conditions and events in computer system, (Gu, 2002). The original PN did not include the concept of time, this way a transition would fire immediately. However, starting late in the 1970s a “time” PN called Stochastic Petri Net (SPN) was presented, (O’Connor and Kleyner, 2012). A SPN allows timed transitions which are associated to exponentially distribute firing delays. It uses some basic symbols for describing relations existing between conditions and events. Both PN and SPN can represent and analyze the behavior of a variety of systems. It is a modeling method which can show a system’s structure and dynamic behavior in formal graph. PN can model the flow of information and control of systems. Initially, it was developed for modeling and analysis of computer hardware and software, but during the last few years, PN is being increasingly used in the area of reliability systems, (Reddy, et al., 1993). The graphical representation of PN employs the following notations: Places (described by circles), Transitions (described by boxes or bars), Arcs (connect Places to Transitions and vice versa), and Tokens (described by a black dot), as shown on Fig. 5. It also shows a simple Petri net with an immediate transition (t1) and one with a timed transition (t2). Figure 5: Graphical Structure Model of a Simple Petri Net. Converting these symbols to reliability analysis, Places (P1, P2, P3 and P4) correspond to the state of the system or condition in the process, Transitions (t1 and t2) correspond to events causing the system state to change, such as component faults, maintenance, etc., Arcs connect Places to Transitions, and it is the relationship between state and event. The state in Petri net state is represented by a Token. The system model should include elements of the marking states and those causing the states to change (Mehrez, 1995). Each Place may hold either none or a positive number of Tokens. The dynamic behavior of a Petri net is described by a sequence of transition firings. Firing results in moving one token from one Place to other Place by the Transition, the transition has an exponentially distributed firing time. Stochastic Petri net is a class of Petri net where the firing times are random variables (Lee, et al., 2003). Figure 6 is a simple view of a system of Petri net that models a repairable system. Figure 6(a) shows that the system is in “OPERATING STATE” (Place with the Token), it is called situation before firing. Then when the system reaches a precondition for this transition the Mean Time Between Failures (MTBF) the system changes its state from “SYSTEM OPERATING” to the Place: “SYSTEM IN FAILURE”, situation after firing (the Token moved from SYSTEM OPERATING to the SYSTEM IN FAILURE), Fig. 6(b). As explained earlier, MTBF does not mean that a system or equipment will experience failure at this exact time; it is an average, so it may failure some time before or even after this given value. The next phase on the PN, the system is repaired, Mean Time to Repair (MTTR), which is the expected time to recover a system, and finally the system goes back to System Operating state again (the Token moved again to the Place: SYSTEM OPERATING). Figure 6: The dynamic behavior of a Petri Net. There are only two states in the lifetime model: failed or working; in some cases, before the component fails, it has abnormal behavior between good and failure, which can be detected by sensor readings or inspections. These defective states help to schedule inspections and preventive maintenance. The delay time, shown on Fig. 7, divides the failure into two stages: the first stage is the normal working stage; and the second is the failure delay time Proceedings of the joint ICVRAM ISUMA UNCERTAINTIES conference Florianópolis, SC, Brazil, April 8-11, 2018 stage, where maintenance must take action to avoid failure to occur. When analyzing the data for one year operation (8760 hours), it was observed that after a certain point of the data analysis, sensor measures started to increase before the failure occurs. This region of the data was ranging from 75% to 80% of the time before failure. For the purpose of this study, it will be assumed that MTBF1 = 0,8xMTBF2, as shown on Fig.7. This assumption will be used later to model the Petri net for the whole cooling tower. MTBF1 is the average time where a signal will be sent to the control room as an alarm, so operators in the control room understand that maintenance teams must be warned about this increase on sensor reading; it means that a tendency for a defect was observed and the maintenance activities can be carried out to prevent the failure to occur. Figure 7: MTBF1 Assumption. After analyzing all the data for one year operation, MTBF = 2253 hours and MTTR = 84 hours. This way: MTBF1 = 1802 hours and the remaining hours to reach the total MTBF is MTBF2 = 451 hours Even though the cooling tower has one single design for all the cells, it is subject to many uncertainties coming from its daily working process, variations such as the quality on components used and the knowledge of the workers who assembled these components on the cooling tower and more particularly the change in the operational durations. When analyzing the data for different sensors in the cooling tower, it was observed that despite the same design and same supplier (same fans, gearboxes, electric motors, pipes, gaskets, flanges, etc.), they had different behavior. Some cells would experience more failures than others. It becomes frustrating to deal with the uncertainty of the relationship among the components in the mechanical system. When collecting the data can also be an important factor for error or uncertainty in this process. An error is defined as a discrepancy between an observed value and the true correct value or condition, (Yang and Liu, 1998). Sometimes only the sensor fails to produce the correct signal for a given stimulus (hardware degradation, inaccurate readings, and environmental changes), but the cell still is working as designed and supplying cooled water at right temperature. Sometimes values for a certain period are either too high or too low, which is easily understood as an error on sensor readings. From the practical standpoint this is a major disadvantage, since sensors include some degree of uncertainty. These uncertainties need to be clearly understood and removed from the study so the model can be as close as possible to the real cooling tower performance. Figure 8 represents two cells of the cooling tower and only one maintenance team available to fix any failure that may occur, as an example. Figure 8: Two Cooling Tower Cells and Maintenance Team. For the understanding of the Petri net logics on this study, this example will start with two cells to understand the Petri net logics applied on this study, and then it will move to the entire cooling tower cells. The system is modeled with places and transitions, inputs to a transition are the precondition of a corresponding event. The transition will occur if the precondition happens. A. Caminada Netto, A. H. A. Melani, C. A. Murad, S. I. Nabeta, G. F. M. Souza. Cooling Tower Analysis Using Petri Net for Reliability and Availability For every transition a precondition must be written so the Petri net can run. The logic for this system works like this: • Cells 1 & 2: Tokens are in the places “EQUIPMENT OPERATING-1 & EQUIPMENT OPERATING-2” respectively. • Maintenance Team: Token is in the place “AVAILABLE”. • Supposing cell 1 reaches “MTBF1”, then token moves (fires) to the place “ALARM-1”. Since the maintenance team is available, token moves (fires) straight to the place “REPAIR-1” through immediate transition “T1”. A precondition for this transition “T1” was met. • Maintenance Team: Token also moves (fires) from the place “AVAILABLE” to “UNAVAILABLE” through immediate transition “T5”. A precondition for this transition “T5” was met. • Supposing while maintenance team is in the place “REPAIR-1”, cell 2 reaches “MTBF3”, token goes to the place “ALARM-2” and since the maintenance team is not available, this cell keeps working until it reaches “MTBF4” and then token moves to the place “EQUIPMENT IN FAILURE-2” and stay there, until maintenance team is available. • Meanwhile maintenance team working in cell1 reaches “MTTR1”, and token moves to the place “EQUIPMENT OPERATING-1”. • After that, maintenance team becomes available again, and so token moves from the place “UNAVAILABLE” to the place “AVAILABLE” through immediate transition “T6”. A precondition for this transition “T6” was met. • Maintenance team becomes available and it is ready to get busy again, so token in the place “EQUIPMENT IN FAILURE-2” moves to the place “REPAIR-2” and token from maintenance team moves to the place “UNAVAILABLE” through immediate transition “T5”. A precondition for this transition “T5” was met. • Maintenance team working on cell 2, it reaches “MTTR2” and token moves (fires) to the place “EQUIPMENT OPERATING-2”. • In case both systems experience a failure at the same time, the software TimeNET4.0 will randomly select one system to repair. As mentioned earlier, firing is a random variable by the SPN. • Finally, the whole process can start again. Figure 9 shows six cells from the cooling tower system and one maintenance team available for repairing. All the cells are in the place “EQUIPMENT OPERATING”, which means cooling tower is supplying cooled water as designed. Figure 9: Cooling Tower: Six Cells Working (One Maintenance Team). This first attempt will show the results for reliability and availability of the entire system considering only one maintenance team for repairing. For this Petri net model the data was extracted from the power plant data center. It will be used to run this model: MTBF1 = 1802 hours, MTBF2 = 451 hours, and MTTR = 84 hours, as shown earlier. Looking at Fig. 9, one can see that all tokens are in the place “EQUIPMENT OPERATING” and the maintenance team in the place “AVAILABLE”. It means the cooling tower at this time is supplying cooled water at the right temperature as it was designed. Following the same logic explained earlier in this study, but at this time considering the whole cooling tower (all six cells) and one maintenance team. Figure 10, as an example, shows a different situation where some cooling tower cells have experienced some sort of failure (chosen randomly) and the maintenance team is already busy performing other repair. Proceedings of the joint ICVRAM ISUMA UNCERTAINTIES conference Florianópolis, SC, Brazil, April 8-11, 2018 Figure 10: Cooling Tower Cells Experiencing Failures (Considering One Maintenance Team). The Petri net logic on Fig.10 works as follow: • Supposing Cell 2 was the first one to experience a failure in this example. Token fired to the place “ALARM-2” and since maintenance team is available, token fired again to place “REPAIR-2” through immediate transition “T3” and then one maintenance team fired from place AVAILABLE to UNAVAILABLE through its immediate transition “T13”. • Cell 5 was the second one to experience a failure; it has reached its “MTBF9”. Token fired to the place “ALARM-5” and since the maintenance team was unavailable, it kept working until either maintenance team becomes available and fires to the place “REPAIR-5” or until it reaches “MTBF10” and then fires to the place “EQUIPEMENT IN FAILURE-5”, which is case in this example. • Cooling tower will experience other failures and the PN logic will continue to work as follow. • Cell 1 has reached its “MTBF1” and token fired to the place “ALARM-1” and since maintenance team was still unavailable, this cell kept working until either maintenance team becomes available and fires to the place “REPAIR-1” or until it reaches “MTBF2” and then fires to the place “EQUIPEMENT IN FAILURE-1”. • Cell 6 has reached its “MTBF11” and token fired to the place “ALARM-6” and since the maintenance team was still unavailable, this cell kept working and it reached its “MTBF12” and token fired to the place “EQUIPEMENT IN FAILURE-6”, just like the example above. • When cell 2 reaches its “MTTR”, it will fire to the place “EQUIPMENT OPERATING-2” and the maintenance team becomes available to repair any other cell randomly, and so on with other cells. • Finally the process will continue to run as explained above. Figure 11 shows the reliability curve obtained for the cooling tower, considering one maintenance team for repairing. Figure 11: Cooling Tower – Reliability Curve (One Maintenance Team). Reliability (R) and availability (A) obtained for this model, are as follows: R (t) = 83.38% and A = 98.50% A. Caminada Netto, A. H. A. Melani, C. A. Murad, S. I. Nabeta, G. F. M. Souza. Cooling Tower Analysis Using Petri Net for Reliability and Availability The next SPN situation, the cooling tower system has gained another maintenance team for repairing. Now it has two maintenance teams available, as shown on Fig. 12. Figure 12: Cooling Tower: Six Cells Working (Two Maintenance Teams) Following the same logics presented earlier, some cells will experience failures as shown on Fig. 13. Figure 13: Cooling Tower Cells Experiencing Failures (Two Maintenance Team). The Petri net logic on Fig.13 works as follow: • Supposing Cell 2 was the first one to experience a failure in this example. Token fired to the place “ALARM-2” and since both maintenance teams were available, token fired again to place “REPAIR-2” through immediate transition “T3” and then one maintenance team fired randomly from place “AVAILABLE” to “UNAVAILABLE” through its immediate transition. Proceedings of the joint ICVRAM ISUMA UNCERTAINTIES conference Florianópolis, SC, Brazil, April 8-11, 2018 • • • • • • • Cell 4 was the second one to experience a failure. Token fired to the place “ALARM-4” and since one maintenance team was still available, token fired again to the place “REPAIR-4” through immediate transition “T7”. Now both maintenance teams are not available for any repair. Cooling tower will experience other failures and the PN logic will continue to work as follow. Cell 1 has reached its “MTBF1” and token fired to the place “ALARM-1” and since none of the maintenance teams were available, this cell kept working until either maintenance team becomes available and fires to the place “REPAIR-1” or until it reaches “MTBF2” and then fires to the place “EQUIPEMENT IN FAILURE-1”. Cell 5 has reached its “MTBF9” token fired to the place “ALARM-5”, none of the maintenance teams were available, so this cell kept working until it reached “MTBF10” and then fired again to the place “EQUIPEMENT IN FAILURE-5”. It stays there until any maintenance team becomes available to repair this cell. Cell 6 has reached its “MTBF11” and token fired to the place “ALARM-6” and since none of the maintenance teams were available, this cell kept working until either maintenance team becomes available and fires to the place “REPAIR-6” or until it reaches “MTBF12” and then fires to the place “EQUIPEMENT IN FAILURE-6”. When cells 2 or 4 reaches their “MTTR”, they will fire to the place “EQUIPMENT OPERATING” and either one of the maintenance team becomes available to repair any other cell randomly. Finally the process will continue to run as explained above. Figure 14 compares the reliability curve for one and two maintenance teams, obtained for the cooling tower system. Figure 14: Cooling Tower – Reliability Curve (One and Two Maintenance Team) The new values for reliability and availability can be obtained, as follow: R (t) = 92.41% and A = 100% Finally, Tab. 1 shows the results obtained from both analyses made on this study. Table 1: Reliability and Availability (One & Two Maintenance Teams) It shows that having maintenance team available to fix the cooling tower as needed; it improves the performance of the entire system. After determining these reliability R(t) and availability (A) values on Tab. 1, another proposal is to figure out how improvements over the years on equipment design and maintenance plans would affect R(t) and A. The SPN will run with improved MTTR and MTBF in 5% per year for three years, as an example. Figure 15 shows the reliability values for three consecutive years, considering 5% of improvement over the previous year. During these three years reliability growth becomes very close for both one and two maintenance teams, management could decide not to have two teams anymore due to the low reliability growth, However, to achieve those changes, component design and maintenance plans will need to change along, which usually means more investments. Before changing component design a good strategy is to improve maintenance plan by increasing the number of periodic inspections on cooling tower system such as, gearbox (oil level in reservoir, gaskets, and fasteners), pumps (flow and noise), motors (overheating, vibration, and fasteners), fans (blades vibration, noise), and water quality A. Caminada Netto, A. H. A. Melani, C. A. Murad, S. I. Nabeta, G. F. M. Souza. Cooling Tower Analysis Using Petri Net for Reliability and Availability (controls of corrosion, deposition, and microbiological growth), hot and cold water piping (leaks, flow and cleanness), etc. Experience will show that pinpointing minor problems during periodic inspections and give the proper priority to them before they become major problems can be useful. Besides all that, another way to improve those indexes is to implement a diagnose system where failure would be pointed out automatically. It would reduce the investigation timing on the root causes analysis, which can bring the fixing timing down. Figure 15: Reliability Improvement over the Years. 4 SUGGESTIONS FOR REFINEMENT The purpose of this study is to introduce reliability and availability analysis using SPN concept. To illustrate the study a cooling tower system was chosen for the analysis. The SPN model has considered two approaches, first only one maintenance team available and second two maintenance teams available for repairing. As a result reliability and availability for both hypotheses were determined. After that another analysis was made considering 5% of improvement on MTBF and MTTR over the following three years. But the SPN model did not consider all components that form this cooling tower so called here by the authors as “System”. To bring this model closer to a real cooling tower system, it would be necessary to add more components in this SPN model (more places and transitions), since sensor readings are available for most of them; and where there are no sensor readings records of regular inspections are kept by maintenance staff. This allows the addition of more systems into the study, such as the condenser, vacuum pumps, condensate extraction pumps, feed water pumps, etc. However, even without considering all components in the cooling tower analysis, the model has proven to be useful and practical to use. 5 CONCLUSION This study has attempted to model a cooling tower system in a coal-fired power plant. The proposal is based on Stochastic Petri Net model as an alternative to determine reliability and availability for this system. Petri net models provide a powerful modeling tool for representing complex systems. The control room operators from power station need a system to aid and support them to make decisions during critical situations and reduce the time delay between alarm and failure. This certainly helps maintenance management to plan their strategies on how to take action faster. A Petri net model for the cooling tower system is proposed to deal with alarms and making visible when to take action before failure occurs. The study has made assumptions with one and two maintenance teams to perform all repairs at the cooling tower. However, one has to keep in mind that maintenance teams are available for the entire power plant and this study has considered them exclusively for the cooling tower. MTTR was used as it was extracted from the data base system. On a daily basis, management will assign the proper priority to maintenance and repair jobs to be performed by these teams. So this could bring some sort of uncertainty to the SPN model itself. MTTR for the cooling tower might increase due to this repairing priority. For future study management could make a deep diving in the system, and learn the accurate number of failures and their frequency. Based on the results, management can make business-cases for hiring more professionals or to have outsourced suppliers that can come in on these emergencies for repairing. These results have shown that the more available maintenance teams are the better reliability and availability will be for the cooling tower system. Establishing the relationship between reliability improvement and coal consumption (fuel used in this plant) would be another suggestion for future study. Spare parts arrival and/or availability for maintenance play an important role in planning the operation sequences; it strongly affects the MTTR index. Maintenance teams’ technical knowledge is also an important variable on this scenario, so the better trained the maintenance teams are, the better the repair and as a consequence the indexes for reliability and availability can be improved. Another limitation for this model is that in the real world, the number of Proceedings of the joint ICVRAM ISUMA UNCERTAINTIES conference Florianópolis, SC, Brazil, April 8-11, 2018 components in the cooling tower is far bigger than the quantity considered in this Petri net model. The complex system structure and operation options will need more transitions and places in the modeling. From these results obtained on this study, it is easy to comprehend that maintenance team availability is one way to improve quality in a power plant operation. ACKNOWLEDGEMENTS This research reported here was supported in part by both the Foundation for Engineering Technological Development (FDTE) and Foundation CAPES. The authors are deeply grateful for this support. REFERENCES Burhanuddin, M. A., Ghani, M. K. A., Ahamad A., Abas Z. A., Izzah, Z., 2014, “Reliability analysis of the failure data in industrial repairable systems due to equipment risk factors”, Applied Mathematical Sciences, Vol. 8, nº 31, pp.1543-1555. Carazas F. J. G. and Souza, G. M. F., 2009. “Method for cooling towers maitenance policy selection based on RCM concepts”. 20th International Congress of Mechanical Engineering, November 15-20, Gramado, RS, Brazil. Gu, T., Bahri, P., A., 2002, “A survey of Petri net application in batch process”. Computers in Industry Vol. 47, pp. 99111. IEEE Standard Glossary of Software Engineering Terlinology, 1990. < http://www.mit.jyu.fi/ope/kurssit/TIES462/Materialit/IEEE_SoftwareEngGlossary.pdf > 06Oct17. Kiran Naik, B. and Muthukumar P., 2017, “A novel approch for performance assessment of mechanical draft wet cooling towers”. Applied Thermal Engineering. Vol. 121, pp.14-26. Leclercq, E., Lefebvre, D., Ould El Medhi, S., 2009, “Identification of timed stochastic Petri net models with normal distributions of firing periods”. Proceedings of the 13th IFAC Symposium on Information Control Problems in Manufacturing, Moscow, June 3-5, 2009. Lee, A., Lu, L., 2012, “Petri net modeling for probabilistic safety assessment and its application in the air lock system of a CANDU nuclear power plant”, Procedia Engineering Vol. 45, pp.11-20. Lee, J., Liu, K., F., R., Chiang, W., 2003, “Modeling Uncertainty Reasoning With Possibilistic Petri Nets”. IEEE, Transactions onSystems, Man and Cybernetics, Vol. 33, Nº 2, April 2003. Mehrez, A., Muzumdar, M., Acar, W., Weinroth, G.,1995, “A Petri Net Model View od Decision Making: na Operational Management Analysis”. International Journal of Management, Vol. 23, Nº 1, pp.63-78. National Aeronautics and Space Administration, 1995, “Systems Engineering Handbook”. SP610S. < https://web.stanford.edu/class/cee243/NASASE.pdf > 02 Oct 2017. O’Connor, P.D.T. and Kleyner, A., 2012. “Practical Reliability Engineering”. WILEY – Fifth Edition. Pamuk, N., Uyaroglu, Y., 2012, “An Expert System for Power Transformer Fault Diagnosis Using Advanced Generalized Stochastic Petri Net”, PRZEGLĄD ELEKTROTECHNICZNY (Electrical Review), ISSN 0033-2097, R. 88. Reddy, G., B., Murty, S., S., N., Ghosh, K., 1993, “Timed Petri Net: An Expeditious Tool for Modelling and Analysis of Manufacturing Systems”. Mathl. Comput. Modelling Vol. 18, No. 9, pp.17-30. Pergamon Press. Sabouhi H., Abbaspour, A., Fotuhi-Firuzabad M., Dehghanian, P., 2016, “Reliability modeling and availability analysis of combined cycle power plants”. Electrical Power and Energy Systems Vol. 79 pp.108-119. Sanfors III, H., W., 2003, “HVAC Water Chillers and Colling Towers”, Marcel Dekker, Inc., ISBN: 0-8247-0992-6. Souza, G. M. F., 2012, “ Thermal Power Plant Performance Analysis”. Springer-Verlag London Limited. ISBN 9781-4471-2308-8. SPX Cooling Technologies – Cooling Tower Fundamentals. 02 Sep. 2017 < http://spxcooling.com/pdf/Cooling-Tower-Fundamentals.pdf> Srinivasa Rao, M., Naikan, V. N. A., 2014, “Relaibility analisys of repairable systems using system dynamics modeling and simulation”, J. Ind. Eng. Int. DOI 10.1007/s40092-014-0069-3. Torell, W., Avelar, V., 1997, “Mean Time Between Failures: Explanation and Standards”. Schneider Electric’s Data Center Science Center. <http://www.apc.com/salestools/VAVR-5WGTSB/VAVR-5WGTSB_R1_EN.pdf> 06Oct2017. Yang, S., K., Liu, T., T., 1998, “ A Petri Net Approach to Early Failure Detection and Isolation for Preventive Maintenance”. International journal of Quality and Reliability Engineering, Vol. 14, Issue 5, pp.319-330. Zimmermann, A., Knoke, U., 2007, “A Software Tool for the Performability Evaluation with Stochastic and Colored Petri Nets”. <http://www2.tu-ilmenau.de/sse_file/timenet/ManualHTML4/UserManual.html> 02Oct2017. RESPONSIBILITY NOTICE The following text, properly adapted to the number of authors, must be included in the last section of the paper: The authors are the only responsible for the printed material included in this paper. View publication stats