A White Paper from the Experts in Business-Critical ContinuityTM Reliability of Air Moving Fans, and Their Impact on System Reliability Summary Air-moving fans are used to remove waste heat on all UPS systems with the exception of those with the very smallest ratings. The use of air-moving fans reduces the hot spot temperature of all the electrical components within the system by as much as 30°C compared to the hot spot temperature rise resulting from natural convection cooling. Quantifying the expected life of a large electrical assembly requires extensive modeling, laboratory testing and field verification. Without preventive maintenance (PM), the volumetric air flow rate for forced air-cooling will slowly decrease, resulting in an increase in the case or surface temperature of the individual components within the large electrically assembly. Air moving fans, in a UPS system, are part of a set of components which need to be serviced and replaced in order to achieve the expected life for a UPS system. This white paper briefly discusses the role of temperature on insulation aging, the difference between natural vs. forced air convection, reliability modeling for air-moving fans and finally how PM increases the reliability of a UPS system. Liebert recommends replacing small flat fans after 5 years of service and larger squirrel-cage fans in 8 years. In addition Liebert recommends checking air filters 4 times a year. 1 Introduction the individual component to the moving ambient air stream by convection. The heat transport process leads to a temperature rise between the ambient air temperature and the case or surface temperature. For UPS systems, the amount of waste heat can be very large. As an example, a 100 kVA UPS system can generate 5 kW to over 10 kW of heat. Even a 10 kVA UPS system can generate 500 Watts to over 1000 Watts. All electrical components used in electrical assemblies like UPS systems generate waste heat. The one exception is superconducting wire but this technology has not yet surfaced in large electrical assemblies. Typical components in question are shown in the list below: • Inductors and transformers (both coreless and designs which use a core) • Power Semiconductors (switches and rectifiers) • Capacitors • Resistors • Relays • Breakers • Contactors • LED’s (pilot lights) • Fuses • Wire and cables • I/O power connectors • DC power supplies (control power) • Etc. Removing this waste heat requires heat transport from the individual electrical components inside of the assembly to the ambient air outside of the electrical assembly. This process results in a hot spot temperature rise inside of each individual component. The hot spot temperature rise breaks down into two pieces: A. Temperature rise from the case of the component to the hot spot inside of the component, which drives the heat to the surface of the component. B. Temperature rise from the local air close to the component surface to the component surface itself, which drives the heat from the component surface to the moving air stream. Waste heat determines the efficiency of the UPS system. The efficiency of the components in a UPS system can be as high as 99.9 percent and as low as 85 percent with resistors as a special case of 0 percent. The resistor generates 100 percent waste heat. Film capacitors are at the top of this list with a 99.9 percent efficiency, inductors (iron core) are normally in the range of 95 percent to 97 percent efficient, transformers (iron core) are normally in the range of 92 percent to 95 percent efficient and DC power supplies are at the bottom of the list with efficiencies in the range of 85 percent to 92 percent. A typical efficiency number for a large electrical assembly like a UPS can easily be well under 95 percent with less than 90 percent being very typical. This means at least 5 percent to over 10 percent of the throughput power is converted to heat. This waste heat is transferred from The temperature rise between the ambient air temperature inside of the electrical assembly and the case (or surface) of the component can easily exceed 40°C when the cooling process is limited to natural convection. In addition there can be another 5°C to 15°C temperature rise from the component surface to the hot spot inside of the component. By using forced air-cooling, the 40°C case temperature rise can drop to a number below 10°C. Radiation is not a significant factor at these temperatures and active cooling (circulating a fluid through a chiller and a heat sink) is normally limited to systems which are very large (>2 MVA). Forced air-cooling is used on almost all 2 medium (5 kVA to 100 kVA) and large (100 kVA to 2 MVA) UPS systems to minimize the hot spot temperature rise in the individual components. contain a high permeability core), humidity, and cycling are just a few of the other drivers which can increase field aging. This white paper has been limited to the impact of temperature and air moving fan reliability. Fans come in many different designs and many different ratings. Air-moving fans age in the field like all other electrical components. Fans can be purchased today with claims of an expected life covering the range of 1 to 10 years. The fan manufacturer normally supplies an expected life number, normally provided at maximum rated operating conditions for the fan. This number is normally established through laboratory measurements and depends upon the design of the fan system. The fan system consists of the housing, the air moving blade, and the motor including the bearings and lubrication system (plus speed control if one is used). Many of these fan components have been moving from metal to plastic. The expected life of a specific fan and a specific application can be calculated based on the fan expected life at max rated operating conditions, a reliability model, information about the operating conditions for the specific application and the life de-rating formulas, also supplied by the fan manufacturer. In almost all cases the fans are operated below their max rated conditions. The next section will briefly explain why the magnitude of the component hot spot temperature leads to reduced component expected life and therefore reduced UPS system expected life. The last few sections will briefly explain why forced air convection cooling is much more effective than natural air convection cooling at reducing the thermal gradient, will briefly review air moving fans reliability modeling and finally will review how air moving fan reliability impacts the reliability of the complete UPS system. The Impact of Hot Spot Temperature Rise on Component Reliability All polymeric insulation and many solidstate materials degrade with temperature. The expected life of the insulation material is normally tied to a large increase in the leakage current along with a large decrease in the insulation resistance. There are many different aging processes which lead to an increase in the leakage current along with a decrease in the insulation resistance, but oxidation (normally traced to humidity and dissolved air), polymeric molecular splitting and ionic contaminate migration are three of the major reaction mechanisms. The reaction rate and diffusion coefficient for these processes are all partially driven by the magnitude of the hot spot temperature. It is impossible to remove all oxygen, moisture and ionic contaminates from materials using reasonable industrial processing. Even in a laboratory environment, producing pure materials is almost impossible. The expected life (technically referred to as reliability) of the individual components within a large electrical assembly and therefore the expected life (also referred to as reliability) of the electrical assembly itself are partially dependent on the magnitude of the individual component case (and therefore hot spot) temperature. Other parameters beyond just hot spot temperature also impact the reliability of large electrical assemblies. Electric fields across insulation material, vibration (due to the fan and the magnetic components which 3 a factor of 2 reduction in service life for every 10°C increase in the max case operating temperature. Many component manufacturers provide temperature life multipliers for their specific components and there is a large range of numbers for every 10°C. These numbers vary from 1.6 to around 3, i.e., L(T0-10°C) = k, where k can vary from 1.6 to 3. Another issue is the location of the 10°C range in question, i.e., the service life may decrease by a factor of 2 when the case temperature increases from 70°C to 80°C but the change in service life can be much smaller for a 10°C increase when the temperature increases from 30°C to 40°C. There are references in the industry which claim there is almost no temperature degradation below 40°C. The actual temperature service life multiplier can be quite different for the various components listed in the introduction section. Arrhenius modeling of thermal degradation has been used in the industry for over 60 years to relate the degradation due to increased hot spot temperature with a useful service life number. The Arrhenius model is shown in Equation 1. (1) Where K(T) is the reaction rate coefficient (which is a function of temperature), A is a constant, E is the activation energy for the specific reaction or diffusion process, R is the universal gas constant and T is the absolute temperature of the hot spot. The problem comes in trying to establish the activation energy number for specific processes. The Arrhenius equation becomes a straight line when it is plotted on a ln (K[T]) vs. 1/T graph. Many attempts to measure the activation energy for electrical insulation degradation reaction and ionic diffusion coefficient can be found in the literature and the data from the studies are often plotted using the linear relationship. These studies are far from consistent even after 60 years of work. In general a factor of 2 for every 10°C guideline is used in the electrical industry. This relationship is shown in equation 2. There is no debate in the industry that increasing the case operating temperature for most components used in UPS systems leads to a reduction in the service life. As a result there is a definite reduction in the service life for the complete UPS system as the internal air ambient temperature increases, which drives an increase in the component case or surface operating temperature. There may be industry debate on how to model the service life temperature multiplier for a complete UPS system but there is no debate that increasing the internal operating temperature of the UPS system will lead to a definite decrease in the reliability. Correspondingly, decreasing the operating temperature of the UPS system will lead to a definite increase in the UPS system reliability. Documented history showing a relationship between the reliability of large electrical assemblies and fan cooling performance has existed in open literature for well over 50 years with one of the (2) Where L(T) is the temperature related component expected life for a component which is operating at a max case temperature T. L(T0) is the service life temperature multiplier provided by the manufacturer of the component, normally at max case rated conditions. The relationship in equation 2 predicts 4 Both the specific heat capacity and the density for air vary with the local temperature. In order to develop a simple picture, these parameters are assumed to be constant. Equation 3 has been plotted in Figure 1, for four different delta temperature values, and a large range of air flow rates. The airflow rates are shown in CFM because these are the units typically used by fan manufacturers. The three sets of negative slope curves are curves of constant heat transport (Q). As can be seen, for a reduction in the airflow rate the difference between the incoming and the outgoing air temperature must increase in order to maintain the constant heat transport rate. The graph in Figure 1 also shows that kW’s of waste heat requires 100’s of CFM volumetric air flow in order to remain a fairly small thermal gradient (10°C) and therefore a small temperature rise inside of the component. better papers dating to 1993 (Hogan, J.M. (1993). The Effect of Fan-Reliability and Cooling-Performance on Electronic-Chassis Reliability. IEEE Transactions on Reliability, Vol. 42 (1), 172-174.). The next section will focus on the relationship between natural and forced air convection on the component case operating temperature and the cooling air stream. Natural vs. Forced Air Convection Cooling Radiation is not a major factor in heat transport for actual operating temperatures inside of a typical UPS system. The operating temperatures have to approach values in the 400°C range before radiation plays much of a role. Due to all the complexities and their potential reliability issues, active cooling is normally not used for UPS systems under 2 MVA. Cooling is limited to natural and forced air convection cooling. In both cases heat transport can be estimated using equation 3. The major challenge is to estimate the volumetric airflow rate for forced air vs. natural air convection. For now the volumetric airflow rate is being left in the relationship as a parameter so the equation can be evaluated. (3) Where Figure 1. Heat transport curves for different mass flow rates and different ambient temperature heat rise gradients (difference between the air temperature for the air entering the UPS system and the hot air exiting the UPS system). The second set of curves (solid lines) show curves of constant heat transport as a function of mass flow rate and temperature difference. Q is the heat transport in Watts F is the volume flow rate in cubic feet per minute (CFM) Cp is the specific heat capacity for air in (J/g-°K) ρ is the air density in g/m3 5 Figure 2. Typical Flat fan (left) and typical squirrel cage fan (right). The volumetric airflow rate is the big difference between forced and natural convection cooling. With forced air-cooling, the air moving fans generate a pressure drop by compressing the air, which drives the air movement, and the airflow rate can be substantial (>200 CFM). In the case of natural convection the thermal gradients cause a variation in the gas density, which also leads to pressure gradients and volumetric airflow but the magnitude of the airflow rate is much smaller. The airflow rates resulting from natural convection are much smaller than those generated by air fans. Studies of small simple chimneys, which are similar in size to a large 18˝ rack assembly, show a thermal gradient much larger than 40°C is required to remove heat in the 100’s of W range. The problem becomes even more complicated because most large electrical assemblies do not have an unobstructed air flow path. Air-Cooling Fans Air-cooling fans come in many different sizes, many different configurations and are driven by many different motor technologies. The two most common configurations are flat fans and squirrel cage fans. Examples of these two configurations can be seen in Figure 2. Some airflow fans are mounted inside of the UPS assembly and are designed to move air over heat sinks, which normally contain ribs (extruded Aluminum pieces which contain flat surfaces for component mounting and ribs) for increased surface area. Power semiconductors are often mounted directly to this heat sink where very low thermal resistance is a requirement. Other aircooling fans are mounted on the inside surface of the outer panels of the electrical assembly package, and are used to either push cool ambient air into the assembly or pull hot air out of the assembly. Alternative mounting arrangements also exist such as fan trays mounted near the top or near the bottom of the electrical chassis, or individual fans mounted to the system frame close to 6 an opening in the chassis. These fans can be located on top of the electronic assembly, on the bottom of the electronic assembly or on the side of the electronic assembly. An outline drawing of a typical Liebert UPS system is shown in Figure 3A and 3B. The Liebert outline drawing shows a number of different fans and different fan locations within the typical UPS system. The motor technologies used for air moving fans also cover a broad range. In very general terms, very small fans (less than 0.005 [1/200] HP) normally use shaded pole motors, medium size fans use sub-fractional HP motors (0.005 [1/200] < motor rating < 0.083 [1/12] HP) which tend to be DC brushless motors, split phase capacitor motors or simple induction motors and larger fans (0.083 [1/12] to 1.5 HP) use single-phase induction motors. Motor speed controls (referred in the industry as VSD, variable speed drives) are also starting to appear in the industry. The major mechanical design parameter impacting the fan reliability is the bearing technology. Again in very simple terms, motor bearings come in two different forms: sleeve bearings or ball bearings plus a sealed lubrication system. The dividing range between use of sleeve bearings vs. ball bearings is roughly 100 to 200 CFM (this is a very general boundary and many exceptions are found in the industry). Air-moving fans have both mechanical as well as electrical failure modes. The mechanical failure modes are primarily due to wear in the bearing but can also be due to fan blades and fan housings, both of which can distort over time. More and more metal fan enclosures are being replaced with plastic enclosures. This is also true for the fan blade itself. Electrical failures are due to motor coils, which age and eventually fail turn to turn or coil to ground. Mechanical Figure 3A and 3B. Outline drawing for a typical Liebert UPS system shows location of a number of fans. 7 failures normally account for more fan failures than electrical failures with a split of 65 percent/35 percent being reported by organizations which have studied many fan failures (Tian, X. (2006). Cooling Fan Reliability: Failure Criteria, Accelerated Life Testing, Modeling and Qualification. Reliability and Maintainability Symposium, 2006. 380-384.). Liebert’s field service data agrees with fan data appearing in the open literature studies. function and is commonly modeled using a Weibull mathematical function. Some fan designs may fail earlier than others, but if sufficient time passes, the volumetric air flow being moved by any fan design will reduce and the fan will eventually fail and will stop moving any air. Fans have a specific service life and a failure probability. The Weibull distribution has emerged as the fan reliability model of choice. The widespread use of the Weibull function for component reliability modeling (including fan reliability modeling) can be attributed to three reasons. The Weibull distribution is included in Excel (as a closed form equation), the Weibull function has closed form solutions for moments and the Weibull function is now included in a number of simple QC software packages. Bottom line, it is now very easy to model fan reliability using the Weibull distribution. The Weibull model is characterized by two parameters, a characteristic life similar to the mode in a non-symmetrical distribution and a shape factor, which determines the width and non-symmetry of the distribution. Just for completeness, the equation for a Weibull distribution is shown in Equation 4. It is very common to use small mechanical filters on the fans to reduce the amount of dust and particles which would be pulled into the air stream. Air moving in electrical equipment often leads to charge accumulation and electrostatic forces. The electrostatic forces along with the fine dust particles that exist in all but the clean room environments lead to decreased filter porosity and eventually to reduced airflow rates. Without servicing and replacement, forced airflow inside the UPS will decrease over time. The reduction in the air volumetric flow rate begins to develop in as little as a few years and in some cases airflow stops completely well inside of ten years. Reliability Modeling for Air-Cooling Fans (4) Motors and fan assemblies all age in the field and fail over time. This is true for any and all fan assembly designs. The failure is normally gradual and not abrupt. During the gradual aging period, the speed of the fan and therefore the volume air-flow rate continually decreases. Where, α is the characteristic life and β is the shape factor. The function p(t) is the probability that a fan from a total population will fail at a time t around a small time interval ∆t. The cumulative function [integration of P(τ) from τ = 0 to τ = t] is the probability that a component will fail after the total population of components have operated for a length of time t. If this function, P(t), is subtracted from 1, the new function, S(t), is A histogram of the failure times for a group of fans all operating under the same field conditions is a very broad and nonsymmetrical distribution. This distribution is referred to as the probability density 8 Figure 4. Two cumulative failure probability curves: one for a characteristic time (CT) of 75,000 hours and one for a characteristic time (CT) of 150,000 hours, both with a shape factor (SF) of 3.25. the probability that components will survive after the total population of components have operated for a time interval of t. This function labeled S(t) is shown in equation 5. value of 3.25 being a good representative number. Liebert strives for fans with a characteristic time closer to 75,000 hours for sleeve bearing fans and close to 150,000 hours for ball bearing fans. (5) A plot of two different failure probability curves, each for the same shape factor (3.25), but two different characteristic life numbers is shown in Figure 4. The main point of the graph in Figure 4 is to show that over the range from 5 to 8 years (@ 43 kHrs to @70 kHrs) there can be a 5 percent to 10 percent probability that a fan in the UPS will fail. The curve in Figure 4 is based on nominal numbers and is only being used as an example. Specific environmental information and fan reliability information are required for a specific reliability curve. Fan failures have been studied extensively in the past twenty years and characteristic times and shape factors are available from some of the larger air moving fan suppliers as well as some large users like HP. Liebert has also studied their own field fan failure history and their data is very similar to the data which can be found in the open literature. In general, the characteristic times for fans with sleeve bearings are around 75,000 hours and around 150,000 hours for fans with ball bearings. The shape factors tend to be in the range from 2.0 to 5 with a The third function, which is often referenced in the literature, is the MTBF (mean time 9 time, this will reduce the effective service life of the individual components, with capacitors and power semiconductors being the most sensitive to small increases in the surface or case temperature. The reduced volumetric airflow develops due to: dirt accumulation in the filters; loss of bearing lubrication; increased motor friction; distortion of the plastic fan parts, which can also lead to increased friction; and electrical, which can reduce the power delivery of the motor or can cause the motor to completely stop rotating. between failure), which is briefly discussed in the Liebert white paper, The Effect of Regular, Skilled Preventive Maintenance on Critical Power System Reliability. The MTBF is a average value which is used to characterize the macroscopic service life picture of a specific component but the actual reliability curve (based on Weibull or an equivalent distribution model) is required if a microscopic failure picture is being studied. The Impact of Fan Reliability on the Reliability of a Large Electrical Assembly. Field Service Recommendation for Air Moving Fans Quantifying the reliability of a large electrical assembly requires extensive modeling, laboratory testing and field verification. The point of this white paper is to demonstrate that without PM, the volumetric air flow rate for forced air-cooling will slowly decrease, resulting in an increase in the case or surface temperature of the individual components within the large electrical assembly. Over Liebert’s experience shows significant improvement in the expected life (reliability) of the UPS results from proper maintenance and servicing of the system. Years of field data have been used to develop the recommended field servicing scheduled shown below: Liebert’s Recommended Field Servicing for Air Moving Fans Expected life at max rated conditions (Hrs/Yrs) Recommended replacement or servicing (Hrs/Yrs) Low-profile fans (flat fans) 61,320/7 43,800/5 Squirrel-cage fans 87,600/10 70,080/8 4 times per year 4 times per year Component Air filter This white paper provides the bases for these recommendations. 10 11 Emerson Network Power 1050 Dearborn Drive P.O. Box 29186 Columbus, Ohio 43229 800.877.9222 (U.S. & Canada Only) 614.888.0246 (Outside U.S.) Fax: 614.841.6022 EmersonNetworkPower.com Liebert.com While every precaution has been taken to ensure accuracy and completeness in this literature, Liebert Corporation assumes no responsibility, and disclaims all liability for damages resulting from use of this information or for any errors or omissions. Specifications subject to change without notice. ©2007 Liebert Corporation. All rights reserved throughout the world. Trademarks or registered trademarks are property of their respective owners. ®Liebert and the Liebert logo are registered trademarks of the Liebert Corporation. Business-Critical Continuity, Emerson Network Power and the Emerson Network Power logo are trademarks and service marks of Emerson Electric Co. ©2007 Emerson Electric Co. WP153-117 SL-24660 (04/11) Emerson Network Power. The global leader in enabling Business-Critical Continuity™. AC Power Connectivity DC Power Embedded Computing Embedded Power Monitoring Outside Plant Power Switching & Controls Precision Cooling EmersonNetworkPower.com Racks & Integrated Cabinets Services Surge Protection