Reliability of Air Moving Fans, and Their Impact on System Reliability

advertisement
A White Paper from the Experts
in Business-Critical ContinuityTM
Reliability of Air Moving Fans,
and Their Impact on System Reliability
Summary
Air-moving fans are used to remove waste heat on all UPS systems with the exception
of those with the very smallest ratings. The use of air-moving fans reduces the hot spot
temperature of all the electrical components within the system by as much as 30°C
compared to the hot spot temperature rise resulting from natural convection cooling.
Quantifying the expected life of a large electrical assembly requires extensive modeling,
laboratory testing and field verification. Without preventive maintenance (PM), the
volumetric air flow rate for forced air-cooling will slowly decrease, resulting in an increase in
the case or surface temperature of the individual components within the large electrically
assembly. Air moving fans, in a UPS system, are part of a set of components which need to
be serviced and replaced in order to achieve the expected life for a UPS system.
This white paper briefly discusses the role of temperature on insulation aging, the difference
between natural vs. forced air convection, reliability modeling for air-moving fans and
finally how PM increases the reliability of a UPS system. Liebert recommends replacing small
flat fans after 5 years of service and larger squirrel-cage fans in 8 years. In addition Liebert
recommends checking air filters 4 times a year.
1
Introduction
the individual component to the moving
ambient air stream by convection. The heat
transport process leads to a temperature
rise between the ambient air temperature
and the case or surface temperature. For
UPS systems, the amount of waste heat can
be very large. As an example, a 100 kVA
UPS system can generate 5 kW to over 10
kW of heat. Even a 10 kVA UPS system can
generate 500 Watts to over 1000 Watts.
All electrical components used in electrical
assemblies like UPS systems generate waste
heat. The one exception is superconducting
wire but this technology has not yet
surfaced in large electrical assemblies.
Typical components in question are shown
in the list below:
• Inductors and transformers (both coreless
and designs which use a core)
• Power Semiconductors (switches and
rectifiers)
• Capacitors
• Resistors
• Relays
• Breakers
• Contactors
• LED’s (pilot lights)
• Fuses
• Wire and cables
• I/O power connectors
• DC power supplies (control power)
• Etc.
Removing this waste heat requires heat
transport from the individual electrical
components inside of the assembly to
the ambient air outside of the electrical
assembly. This process results in a hot spot
temperature rise inside of each individual
component. The hot spot temperature rise
breaks down into two pieces:
A. Temperature rise from the case of the
component to the hot spot inside of the
component, which drives the heat to the
surface of the component.
B. Temperature rise from the local air
close to the component surface to the
component surface itself, which drives the
heat from the component surface to the
moving air stream.
Waste heat determines the efficiency
of the UPS system. The efficiency of the
components in a UPS system can be as high
as 99.9 percent and as low as 85 percent
with resistors as a special case of 0 percent.
The resistor generates 100 percent waste
heat. Film capacitors are at the top of this
list with a 99.9 percent efficiency, inductors
(iron core) are normally in the range of 95
percent to 97 percent efficient, transformers
(iron core) are normally in the range of
92 percent to 95 percent efficient and DC
power supplies are at the bottom of the list
with efficiencies in the range of 85 percent
to 92 percent. A typical efficiency number
for a large electrical assembly like a UPS
can easily be well under 95 percent with
less than 90 percent being very typical. This
means at least 5 percent to over 10 percent
of the throughput power is converted to
heat. This waste heat is transferred from
The temperature rise between the ambient
air temperature inside of the electrical
assembly and the case (or surface) of the
component can easily exceed 40°C when
the cooling process is limited to natural
convection. In addition there can be another
5°C to 15°C temperature rise from the
component surface to the hot spot inside of
the component. By using forced air-cooling,
the 40°C case temperature rise can drop
to a number below 10°C. Radiation is not a
significant factor at these temperatures and
active cooling (circulating a fluid through a
chiller and a heat sink) is normally limited
to systems which are very large (>2 MVA).
Forced air-cooling is used on almost all
2
medium (5 kVA to 100 kVA) and large (100
kVA to 2 MVA) UPS systems to minimize the
hot spot temperature rise in the individual
components.
contain a high permeability core), humidity,
and cycling are just a few of the other drivers
which can increase field aging. This white
paper has been limited to the impact of
temperature and air moving fan reliability.
Fans come in many different designs and
many different ratings. Air-moving fans
age in the field like all other electrical
components. Fans can be purchased today
with claims of an expected life covering the
range of 1 to 10 years. The fan manufacturer
normally supplies an expected life number,
normally provided at maximum rated
operating conditions for the fan. This
number is normally established through
laboratory measurements and depends
upon the design of the fan system. The
fan system consists of the housing, the air
moving blade, and the motor including the
bearings and lubrication system (plus speed
control if one is used). Many of these fan
components have been moving from metal
to plastic. The expected life of a specific fan
and a specific application can be calculated
based on the fan expected life at max rated
operating conditions, a reliability model,
information about the operating conditions
for the specific application and the life
de-rating formulas, also supplied by the
fan manufacturer. In almost all cases the
fans are operated below their max rated
conditions.
The next section will briefly explain why
the magnitude of the component hot spot
temperature leads to reduced component
expected life and therefore reduced UPS
system expected life. The last few sections
will briefly explain why forced air convection
cooling is much more effective than natural
air convection cooling at reducing the
thermal gradient, will briefly review air
moving fans reliability modeling and finally
will review how air moving fan reliability
impacts the reliability of the complete UPS
system.
The Impact of Hot Spot Temperature
Rise on Component Reliability
All polymeric insulation and many solidstate materials degrade with temperature.
The expected life of the insulation material
is normally tied to a large increase in the
leakage current along with a large decrease
in the insulation resistance. There are many
different aging processes which lead to an
increase in the leakage current along with
a decrease in the insulation resistance, but
oxidation (normally traced to humidity and
dissolved air), polymeric molecular splitting
and ionic contaminate migration are three
of the major reaction mechanisms. The
reaction rate and diffusion coefficient for
these processes are all partially driven by the
magnitude of the hot spot temperature. It is
impossible to remove all oxygen, moisture
and ionic contaminates from materials using
reasonable industrial processing. Even in a
laboratory environment, producing pure
materials is almost impossible.
The expected life (technically referred to
as reliability) of the individual components
within a large electrical assembly and
therefore the expected life (also referred
to as reliability) of the electrical assembly
itself are partially dependent on the
magnitude of the individual component
case (and therefore hot spot) temperature.
Other parameters beyond just hot spot
temperature also impact the reliability of
large electrical assemblies. Electric fields
across insulation material, vibration (due to
the fan and the magnetic components which
3
a factor of 2 reduction in service life
for every 10°C increase in the max case
operating temperature. Many component
manufacturers provide temperature life
multipliers for their specific components
and there is a large range of numbers for
every 10°C. These numbers vary from 1.6
to around 3, i.e., L(​T​0​-10°C) = k, where k
can vary from 1.6 to 3. Another issue is the
location of the 10°C range in question, i.e.,
the service life may decrease by a factor
of 2 when the case temperature increases
from 70°C to 80°C but the change in service
life can be much smaller for a 10°C increase
when the temperature increases from 30°C
to 40°C. There are references in the industry
which claim there is almost no temperature
degradation below 40°C. The actual
temperature service life multiplier can be
quite different for the various components
listed in the introduction section.
Arrhenius modeling of thermal degradation
has been used in the industry for over 60
years to relate the degradation due to
increased hot spot temperature with a
useful service life number. The Arrhenius
model is shown in Equation 1.
(1) Where K(T) is the reaction rate coefficient
(which is a function of temperature), A is a
constant, E is the activation energy for the
specific reaction or diffusion process, R is the
universal gas constant and T is the absolute
temperature of the hot spot.
The problem comes in trying to establish
the activation energy number for specific
processes. The Arrhenius equation becomes
a straight line when it is plotted on a ln (K[T])
vs. 1/T graph. Many attempts to measure
the activation energy for electrical insulation
degradation reaction and ionic diffusion
coefficient can be found in the literature and
the data from the studies are often plotted
using the linear relationship. These studies
are far from consistent even after 60 years of
work. In general a factor of 2 for every 10°C
guideline is used in the electrical industry.
This relationship is shown in equation 2.
There is no debate in the industry that
increasing the case operating temperature
for most components used in UPS systems
leads to a reduction in the service life. As
a result there is a definite reduction in the
service life for the complete UPS system
as the internal air ambient temperature
increases, which drives an increase in the
component case or surface operating
temperature. There may be industry
debate on how to model the service life
temperature multiplier for a complete
UPS system but there is no debate
that increasing the internal operating
temperature of the UPS system will lead
to a definite decrease in the reliability.
Correspondingly, decreasing the operating
temperature of the UPS system will lead
to a definite increase in the UPS system
reliability. Documented history showing
a relationship between the reliability of
large electrical assemblies and fan cooling
performance has existed in open literature
for well over 50 years with one of the
(2) Where L(T) is the temperature related
component expected life for a component
which is operating at a max case
temperature T. L(​T​0​) is the service life
temperature multiplier provided by the
manufacturer of the component, normally
at max case rated conditions.
The relationship in equation 2 predicts
4
Both the specific heat capacity and
the density for air vary with the local
temperature. In order to develop a simple
picture, these parameters are assumed to
be constant. Equation 3 has been plotted in
Figure 1, for four different delta temperature
values, and a large range of air flow rates.
The airflow rates are shown in CFM because
these are the units typically used by fan
manufacturers. The three sets of negative
slope curves are curves of constant heat
transport (Q). As can be seen, for a reduction
in the airflow rate the difference between the
incoming and the outgoing air temperature
must increase in order to maintain the
constant heat transport rate. The graph in
Figure 1 also shows that kW’s of waste heat
requires 100’s of CFM volumetric air flow in
order to remain a fairly small thermal gradient
(10°C) and therefore a small temperature rise
inside of the component.
better papers dating to 1993 (Hogan, J.M.
(1993). The Effect of Fan-Reliability and
Cooling-Performance on Electronic-Chassis
Reliability. IEEE Transactions on Reliability,
Vol. 42 (1), 172-174.). The next section will
focus on the relationship between natural
and forced air convection on the component
case operating temperature and the cooling
air stream.
Natural vs. Forced Air Convection
Cooling
Radiation is not a major factor in heat
transport for actual operating temperatures
inside of a typical UPS system. The operating
temperatures have to approach values in
the 400°C range before radiation plays
much of a role. Due to all the complexities
and their potential reliability issues, active
cooling is normally not used for UPS systems
under 2 MVA. Cooling is limited to natural
and forced air convection cooling. In both
cases heat transport can be estimated
using equation 3. The major challenge is
to estimate the volumetric airflow rate for
forced air vs. natural air convection. For now
the volumetric airflow rate is being left in the
relationship as a parameter so the equation
can be evaluated.
(3) Where
Figure 1. Heat transport curves for
different mass flow rates and different
ambient temperature heat rise gradients
(difference between the air temperature
for the air entering the UPS system and
the hot air exiting the UPS system).
The second set of curves (solid lines)
show curves of constant heat transport
as a function of mass flow rate and
temperature difference.
Q is the heat transport in Watts
F is the volume flow rate in cubic feet per
minute (CFM)
Cp is the specific heat capacity for air in
(J/g-°K)
ρ is the air density in g/​m​3​
5
Figure 2. Typical Flat fan (left) and typical squirrel cage fan (right).
The volumetric airflow rate is the big
difference between forced and natural
convection cooling. With forced air-cooling,
the air moving fans generate a pressure
drop by compressing the air, which drives
the air movement, and the airflow rate can
be substantial (>200 CFM). In the case of
natural convection the thermal gradients
cause a variation in the gas density, which
also leads to pressure gradients and
volumetric airflow but the magnitude of
the airflow rate is much smaller. The airflow
rates resulting from natural convection
are much smaller than those generated by
air fans. Studies of small simple chimneys,
which are similar in size to a large 18˝ rack
assembly, show a thermal gradient much
larger than 40°C is required to remove
heat in the 100’s of W range. The problem
becomes even more complicated because
most large electrical assemblies do not have
an unobstructed air flow path.
Air-Cooling Fans
Air-cooling fans come in many different
sizes, many different configurations
and are driven by many different motor
technologies. The two most common
configurations are flat fans and squirrel cage
fans. Examples of these two configurations
can be seen in Figure 2. Some airflow fans
are mounted inside of the UPS assembly
and are designed to move air over heat
sinks, which normally contain ribs (extruded
Aluminum pieces which contain flat
surfaces for component mounting and
ribs) for increased surface area. Power
semiconductors are often mounted directly
to this heat sink where very low thermal
resistance is a requirement. Other aircooling fans are mounted on the inside
surface of the outer panels of the electrical
assembly package, and are used to either
push cool ambient air into the assembly or
pull hot air out of the assembly. Alternative
mounting arrangements also exist such as
fan trays mounted near the top or near the
bottom of the electrical chassis, or individual
fans mounted to the system frame close to
6
an opening in the chassis. These fans can be
located on top of the electronic assembly,
on the bottom of the electronic assembly
or on the side of the electronic assembly.
An outline drawing of a typical Liebert UPS
system is shown in Figure 3A and 3B. The
Liebert outline drawing shows a number
of different fans and different fan locations
within the typical UPS system.
The motor technologies used for air moving
fans also cover a broad range. In very
general terms, very small fans (less than
0.005 [1/200] HP) normally use shaded pole
motors, medium size fans use sub-fractional
HP motors (0.005 [1/200] < motor rating
< 0.083 [1/12] HP) which tend to be DC
brushless motors, split phase capacitor
motors or simple induction motors and
larger fans (0.083 [1/12] to 1.5 HP) use
single-phase induction motors. Motor
speed controls (referred in the industry
as VSD, variable speed drives) are also
starting to appear in the industry. The major
mechanical design parameter impacting
the fan reliability is the bearing technology.
Again in very simple terms, motor bearings
come in two different forms: sleeve bearings
or ball bearings plus a sealed lubrication
system. The dividing range between use of
sleeve bearings vs. ball bearings is roughly
100 to 200 CFM (this is a very general
boundary and many exceptions are found in
the industry).
Air-moving fans have both mechanical
as well as electrical failure modes. The
mechanical failure modes are primarily
due to wear in the bearing but can also be
due to fan blades and fan housings, both of
which can distort over time. More and more
metal fan enclosures are being replaced with
plastic enclosures. This is also true for the
fan blade itself. Electrical failures are due to
motor coils, which age and eventually fail
turn to turn or coil to ground. Mechanical
Figure 3A and 3B. Outline drawing for a
typical Liebert UPS system shows location
of a number of fans.
7
failures normally account for more fan
failures than electrical failures with a split
of 65 percent/35 percent being reported
by organizations which have studied many
fan failures (Tian, X. (2006). Cooling Fan
Reliability: Failure Criteria, Accelerated
Life Testing, Modeling and Qualification.
Reliability and Maintainability Symposium,
2006. 380-384.). Liebert’s field service data
agrees with fan data appearing in the open
literature studies.
function and is commonly modeled using a
Weibull mathematical function. Some fan
designs may fail earlier than others, but if
sufficient time passes, the volumetric air
flow being moved by any fan design will
reduce and the fan will eventually fail and
will stop moving any air. Fans have a specific
service life and a failure probability.
The Weibull distribution has emerged as
the fan reliability model of choice. The
widespread use of the Weibull function for
component reliability modeling (including
fan reliability modeling) can be attributed
to three reasons. The Weibull distribution
is included in Excel (as a closed form
equation), the Weibull function has closed
form solutions for moments and the Weibull
function is now included in a number of
simple QC software packages. Bottom line,
it is now very easy to model fan reliability
using the Weibull distribution. The Weibull
model is characterized by two parameters,
a characteristic life similar to the mode in a
non-symmetrical distribution and a shape
factor, which determines the width and
non-symmetry of the distribution. Just for
completeness, the equation for a Weibull
distribution is shown in Equation 4.
It is very common to use small mechanical
filters on the fans to reduce the amount
of dust and particles which would be
pulled into the air stream. Air moving in
electrical equipment often leads to charge
accumulation and electrostatic forces.
The electrostatic forces along with the fine
dust particles that exist in all but the clean
room environments lead to decreased filter
porosity and eventually to reduced airflow
rates. Without servicing and replacement,
forced airflow inside the UPS will decrease
over time. The reduction in the air
volumetric flow rate begins to develop in as
little as a few years and in some cases airflow
stops completely well inside of ten years.
Reliability Modeling for Air-Cooling
Fans
(4) Motors and fan assemblies all age in the
field and fail over time. This is true for any
and all fan assembly designs. The failure is
normally gradual and not abrupt. During
the gradual aging period, the speed of the
fan and therefore the volume air-flow rate
continually decreases.
Where, α is the characteristic life and β is the
shape factor.
The function p(t) is the probability that
a fan from a total population will fail at
a time t around a small time interval ∆t.
The cumulative function [integration of
P(τ) from τ = 0 to τ = t] is the probability
that a component will fail after the total
population of components have operated
for a length of time t. If this function, P(t), is
subtracted from 1, the new function, S(t), is
A histogram of the failure times for a
group of fans all operating under the same
field conditions is a very broad and nonsymmetrical distribution. This distribution
is referred to as the probability density
8
Figure 4. Two cumulative failure probability curves: one for a characteristic time (CT) of
75,000 hours and one for a characteristic time (CT) of 150,000 hours, both with a shape
factor (SF) of 3.25.
the probability that components will survive
after the total population of components
have operated for a time interval of t. This
function labeled S(t) is shown in equation 5.
value of 3.25 being a good representative
number. Liebert strives for fans with a
characteristic time closer to 75,000 hours
for sleeve bearing fans and close to 150,000
hours for ball bearing fans.
(5) A plot of two different failure probability
curves, each for the same shape factor
(3.25), but two different characteristic life
numbers is shown in Figure 4. The main
point of the graph in Figure 4 is to show
that over the range from 5 to 8 years (@ 43
kHrs to @70 kHrs) there can be a 5 percent
to 10 percent probability that a fan in the
UPS will fail. The curve in Figure 4 is based
on nominal numbers and is only being used
as an example. Specific environmental
information and fan reliability information
are required for a specific reliability curve.
Fan failures have been studied extensively
in the past twenty years and characteristic
times and shape factors are available from
some of the larger air moving fan suppliers
as well as some large users like HP. Liebert
has also studied their own field fan failure
history and their data is very similar to
the data which can be found in the open
literature. In general, the characteristic times
for fans with sleeve bearings are around
75,000 hours and around 150,000 hours for
fans with ball bearings. The shape factors
tend to be in the range from 2.0 to 5 with a
The third function, which is often referenced
in the literature, is the MTBF (mean time
9
time, this will reduce the effective service
life of the individual components, with
capacitors and power semiconductors being
the most sensitive to small increases in the
surface or case temperature. The reduced
volumetric airflow develops due to: dirt
accumulation in the filters; loss of bearing
lubrication; increased motor friction;
distortion of the plastic fan parts, which can
also lead to increased friction; and electrical,
which can reduce the power delivery of the
motor or can cause the motor to completely
stop rotating.
between failure), which is briefly discussed
in the Liebert white paper, The Effect of
Regular, Skilled Preventive Maintenance on
Critical Power System Reliability. The MTBF is a
average value which is used to characterize
the macroscopic service life picture of a
specific component but the actual reliability
curve (based on Weibull or an equivalent
distribution model) is required if a
microscopic failure picture is being studied.
The Impact of Fan Reliability on
the Reliability of a Large Electrical
Assembly.
Field Service Recommendation for Air
Moving Fans
Quantifying the reliability of a large electrical
assembly requires extensive modeling,
laboratory testing and field verification. The
point of this white paper is to demonstrate
that without PM, the volumetric air flow rate
for forced air-cooling will slowly decrease,
resulting in an increase in the case or surface
temperature of the individual components
within the large electrical assembly. Over
Liebert’s experience shows significant
improvement in the expected life (reliability)
of the UPS results from proper maintenance
and servicing of the system. Years of
field data have been used to develop the
recommended field servicing scheduled
shown below:
Liebert’s Recommended Field Servicing for Air Moving Fans
Expected life at max
rated conditions
(Hrs/Yrs)
Recommended replacement
or servicing
(Hrs/Yrs)
Low-profile fans (flat fans)
61,320/7
43,800/5
Squirrel-cage fans
87,600/10
70,080/8
4 times per year
4 times per year
Component
Air filter
This white paper provides the bases for these recommendations.
10
11
Emerson Network Power
1050 Dearborn Drive
P.O. Box 29186
Columbus, Ohio 43229
800.877.9222 (U.S. & Canada Only)
614.888.0246 (Outside U.S.)
Fax: 614.841.6022
EmersonNetworkPower.com
Liebert.com
While every precaution has been taken to ensure accuracy
and completeness in this literature, Liebert Corporation
assumes no responsibility, and disclaims all liability for
damages resulting from use of this information or for
any errors or omissions. Specifications subject to change
without notice. ©2007 Liebert Corporation. All rights
reserved throughout the world. Trademarks or registered
trademarks are property of their respective owners.
®Liebert and the Liebert logo are registered trademarks
of the Liebert Corporation. Business-Critical Continuity,
Emerson Network Power and the Emerson Network Power
logo are trademarks and service marks of Emerson Electric
Co. ©2007 Emerson Electric Co.
WP153-117
SL-24660 (04/11)
Emerson Network Power.
The global leader in enabling Business-Critical Continuity™.
AC Power
Connectivity
DC Power
Embedded Computing
Embedded Power
Monitoring
Outside Plant
Power Switching & Controls
Precision Cooling
EmersonNetworkPower.com
Racks & Integrated Cabinets
Services
Surge Protection
Download