Australian Government Data Centre Strategy 2010-2025 Better Practice Guide: Data Centre Cooling

advertisement
Australian Government
Data Centre Strategy 2010-2025
Better Practice Guide: Data Centre Cooling
November 2013
22/11/2013 3:00 PM
22/11/2013 3:00 PM
Contents
Contents
2
1. Introduction
3
Purpose
3
Scope
3
Policy Framework
3
Related documents
4
2. Discussion
6
Overview
6
Fundamentals
6
Common Concepts and Definitions
8
Common Cooling Problems
9
Improving Cooling Efficiency
10
Potential Work Health Safety Issues
12
Capacity Planning for Cooling Systems
13
Operations Effectiveness and Continuous Improvement
15
Optimising Cooling Systems
16
Maintenance
16
Sustainability
17
Trends
17
Conclusion
18
3. Better Practices
19
Operations
19
Planning
20
4. Conclusion
22
Summary of Better Practices
22/11/2013 3:00 PM
22
1. Introduction
The purpose of this guide is to advise Australian Government agencies on ways to
improve operations relating to the data centre cooling. Many government functions
are critically dependent upon information and communication technology (ICT)
systems based in data centres.
The principal purpose of the data centre’s cooling systems is to provide conditioned
air to the ICT equipment with the optimum mix of temperature, humidity and
pressure. In larger, purpose built data centres, the cooling system also cools the
support equipment and the general office area.
Cooling is typically a lower priority for management attention. The ICT equipment
and power are given more attention because when they fail, the impacts are more
immediate and disruptive. However, extended cooling system failures can be more
damaging than power failures. Further, inefficient or ineffective cooling is a major
cause of waste in data centre operations, and contributes to increased hardware
failures. Historically, making cooling systems efficient has resulted in reducing data
centre operating costs by up to 50 per cent. Ineffective cooling results in some ICT
equipment running at higher temperatures and so failing sooner. Sound operations
practices are essential to efficient and effective data centre cooling.
This guide on cooling forms part of a set of better practice guides for data centres.
Purpose
The cooling system is a major cost driver for data centres, and interacts with many
other data centre systems. The intent of this guide is to assist managers to assess
how well a cooling system meets their agency’s needs, and to reduce the capital and
operating costs relating to data centre cooling.
Scope
This guide addresses the operations and processes required to achieve better
results from data centre cooling technology.
This guide does not consider data centre cooling technology in detail, as information
at this level is widely available, subject to rapid change, contentious and specialised.
Industry will be able to supply agencies with advice and cooling technology to meet
their specific needs.
The discussion will be restricted to cooling technology and operations that are
relevant to data centres used by Australian Government agencies.
Policy Framework
The guide has been developed within the context of the Australian public sector’s
data centre policy framework. This framework applies to agencies subject to the
Better Practice Guide: Data Centre Cooling |
3
Financial Management and Accountability Act 1997 (FMA Act). The data centre
policy framework seeks financial, technical and environmental outcomes.
The Australian Government Data Centre Strategy 2010 – 2025 (data centre strategy)
describes actions that will avoid $1 billion in future data centre costs. The data
centre facilities panel, established under the coordinated procurement policy,
provides agencies with leased data centre facilities.
The Australian Government ICT Sustainability Plan 2010 – 2015 describes actions
that agencies are to take to improve environmental outcomes. The ICT sustainability
plan sets goals for power consumption in data centres, which is the key factor
driving the need for cooling.
The National Construction Code was created in 2011 by combining the Building Code
of Australia and the Plumbing Code of Australia. The National Construction Code
controls building design in Australia, and may be further modified by State
Government and council regulations.
The Technical Committee 9.9 (TC9.9) of the American Society of Heating,
Refrigeration and Air-conditioning Engineers (ASHRAE) publishes guidance on data
centre cooling1.
The Australian Refrigeration Council (ARC) is the organisation appointed by the
Australian Government able to accredit individuals to handle refrigerants and
related chemicals.
The Australian Institute of Refrigeration, Air conditioning and Heating (AIRAH) is a
specialist membership association for air conditioning, refrigeration, heating and
ventilation professionals. AIRAH provides continuing professional development,
accreditation programs and technical publications.
Related documents
Information about the data centre strategy, and DCOT targets and guidance can be
obtained from the Data Centre section (datacentres@finance.gov.au).
The data centre better practice guides also cover:
1

Power: the data centre infrastructure supplying power safely, reliably and
efficiently to the ICT equipment and the supporting systems.

Structure: the physical building design provides for movement of people and
equipment through the site, floor loading capacity, reticulation of cable, air
and water. The design also complements the fire protection and security
better practices.

Data Centre Infrastructure Management: the system that monitors and
reports the state of the data centre. Also known as the building management
system.

Fire protection: the detection and suppression systems that minimise the
effect of fire on people and the equipment in the data centre.
The reports of the Technical Committee 9.9 are widely referenced.
http://tc99.ashraetcs.org/documents/ASHRAE%20Networking%20Thermal%20Guidelines.pdf
http://tc99.ashraetcs.org/documents/ASHRAE%202012%20IT%20Equipment%20Thermal%20Management%2
0and%20Controls_V1.0.pdf
Better Practice Guide: Data Centre Cooling |
4

Security: the physical security arrangements for the data centre. This
includes access controls, surveillance and logging throughout the building,
as well as perimeter protection.

Equipment racks: this guide brings together aspects of power, cooling,
cabling, monitoring, fire protection, security and structural design to achieve
optimum performance for the ICT equipment.

Environment: this guide examines data centre sustainability, including
packaging, electric waste, water use and reducing green house gas
generation.
Better Practice Guide: Data Centre Cooling |
5
2. Discussion
Overview
This section discusses key concepts and practices relating to data centre cooling.
The focus is on operations, as this is essential for efficient, reliable and effective
performance of the information and communication technology (ICT) in the data
centre. If more background is needed on cooling systems design, there is a vast
amount of publicly available information, from the general to the very detailed.
AIRAH2 provides material relating to Australia’s general refrigeration industry. The
Green Grid3 and ASHRAE TC9.9 are industry bodies that provide vendor
independent information relating to data centre cooling.
The refrigeration industry has been operating for over 150 years, and the air
conditioning industry for over 100 years. As data centres are recent, designers first
adapted other air conditioning systems to suit data centre needs. In the last 15
years, purpose built equipment has been developed. The result is diversity and
innovation. There are a range of designs and choice in equipment makes and
models. And there is constant innovation in cooling designs and technology.
Cooling a data centre is a continuing challenge. Data centres have dynamic thermal
environments. This requires regular monitoring and adjustments due to interactions
between ICT, power and cooling systems. External factors, such as the weather, time
of day and seasons, also contribute to the challenge.
The better practice is to ensure that the overhead costs due to cooling are
minimised over the data centre’s life.
Fundamentals
The basic physics of a data centre is that electricity is converted to heat and noise.
The cooling system must transfer enough heat quickly enough from the ICT
equipment to prevent equipment failures.
There are two types of air conditioning systems, comfort and precision. Comfort
systems are designed for human use while precision systems are designed for ICT
equipment. Comfort systems have a lower capital cost, but have a higher operating
cost as they are less efficient than precision systems in cooling data centres.
Units
The wide spread use of archaic units of measurement is due to the longevity of the
refrigeration industry. As the core issue the cooling system is intended to handle is
2
www.airah.org.au
3
www.thegreengrid.org
Better Practice Guide: Data Centre Cooling |
6
energy, this paper uses the International Standard (SI) unit, the watt. This allows
comparison between the power consumed in the data centre and the cooling
required. Agencies are encouraged to use the watt as the basic unit.
Heating, Cooling and Humidity
In a simplified model of data centre cooling there are three major elements.
Supply
Heat rejected
Cooling
Input air
Transport
Waste heat
carried away
Demand
ICT
and
Other
Sources
Power supplied to
ICT equipment
becomes heat
Cooling
Figure 1: A Simplified Model for Data Centre Cooling
Supply: this creates the cooling. Common technologies are refrigeration, chillers, and
use of the ambient temperature. Cooled air is very commonly used, due to overall
cost and current ICT designs. Liquids are much more effective (10 to 80 times).
Typically, smaller cooling systems use air only (with a sealed refrigerant) while
larger cooling systems use a combination of air and liquid.
Transport: ensures that enough cooling is delivered soon enough, and enough heat is
removed quickly enough, to maintain the ICT equipment at the correct temperature.
Typically, air is used to deliver the cooling and carry away the heat at the equipment
rack, and fans are used to move the air.
Demand: the sources that create the heat, mostly due to ICT. The other sources
include the people that work in the data centre, the external ambient temperature
that transfers into the data centre, and the other support systems. The cooling
system itself generates heat. The uninterruptible power supply (UPS) also generates
heat in standby and operation, in particular batteries.
The cooling system is also required to adjust the humidity of the air and to remove
particles. Depending upon the climate at a data centre’s location, moisture may need
to be added or removed. Similarly, the types and amount of particles to be removed
from the air are determined by the location and external events.
ICT Equipment Heat Generation
Each item of ICT hardware in the data centre needs an adequate supply of cooling
air to be supplied to it, and the heated air removed. Different types of equipment
have different needs.
Server and network ICT equipment depend on high performance computer chips.
Most computer chips are able to generate enough heat to damage themselves and
nearby components. Chips are designed to operate reliably at between 55°C and
Better Practice Guide: Data Centre Cooling |
7
70°, with fans blowing cooling air over the chips to remove heat. Manufacturers specify
the range of inlet air temperature needed for the equipment to operate reliably.
Other ICT devices, such as disk storage and tape drives, consume much less
electricity and generate much less heat. Although these types of devices also have
chips, they are designed to operate at lower temperatures.
Common Concepts and Definitions
This section illustrates commonly used terms and concepts in data centres operated
by APS agencies. Note that this is not a comprehensive or definitive list. There are
many other ways used to implement cooling systems in data centre.
Figure 2 shows one type of data centre cooling system. A chiller on the left receives
water warmed in the data centre, chills the water and pumps it back to the data hall.
The Computer Room Air Handler (CRAH) uses the chilled water to create cool air.
The cool air is pushed into the under floor plenum to the ICT racks. The ICT
equipment is cooled by the air. The warm air is drawn back to the CRAH, to be
cooled again.
CRAH
Chiller
Chilled water
loop
ICT
Rack
Under floor plenum
Figure 2: Data Centre Cooling using CRAH and chilled water
Figure 3 shows another common type of cooling system. The Computer Room Air
Conditioner (CRAC) creates cool air and pushes this into the under floor plenum to
the ICT racks. The ICT equipment is cooled by the air. The warm air is drawn back to
the CRAC, to be cooled again.
Exchanger
Economizer
CRAC
ICT
Rack
Under floor plenum
Figure 3: Cooling system using CRAC
CRACs may use refrigerants (as shown in Figure 3) or chilled water (similar to
Figure 2) to cool the air. The refrigerant is pumped to the exchanger to remove the
excess heat, before being returned to the CRAC.A common technology for the
exchanger is the cooling tower, which uses water vapour to cool and humidify
external air as it is drawn in, while warm air is expelled to the outside air.
Better Practice Guide: Data Centre Cooling |
8
Free air cooling is a technique of bringing air that has a suitably low temperature
into the data centre, either directly to the ICT rack or indirectly to chill the water.
This is also known as an ‘economizer’.
Figure 4 shows the direct exchange (DX) cooling system. The DX unit has the same
mode of operating as a CRAC, and use refrigerants for cooling. DX units may also
have free air cooling.
Exchanger
DX Unit
ICT
Rack
Under floor plenum
Figure 4: DX cooling system
Dedicated chilled water systems generate more cost effective cooling for larger data
centres. Smaller data centres will typically use refrigeration or share chilled water
with the building air conditioning system.
The cooling system must also manage the humidity, the amount of water vapour in
the air. The preferred range is 20% to 80% relative humidity. Below 20%, there is a
risk of static electricity causing ICT equipment failure. Above 80%, there is a risk of
condensation.4
A blanking panel is a solid plate put in a rack to prevent cool and warm air from
mixing. Blanking panels are placed as required, and provide a 1 to 5 per cent
improvement in efficiency.
The inlet air temperature is the temperature measured at the point of entering a
piece of ICT. Knowing the inlet air temperature across all equipment is important
when maximising the data centre’s efficiency.
The examples in this section showed an under floor plenum to reflect the design
most commonly used in APS data centres. Data halls first began using raised floors
(under floor plenums) in 1965. Many recently built data centres use solid concrete
slabs rather than raised floors. There are merits in both approaches.
Common Cooling Problems
The figure below illustrates many common problems in data centre cooling. The
cooling air rises from the under-floor plenum for racks1 and 2. However, rack 3 is
drawing the warmed air from rack 2 directly into the equipment in rack 3. This
increases the rate of equipment failure in rack 3.
Before entering rack 2, the cooling air mixes with warmed air from rack 1. Thus,
rack 2 is not cooled as effectively. In both rack 1 and 2, the warm air is moving back
from right to left inside the racks. Blanking would stop this air mixing.
4
Large data centres can have issues. http://www.theregister.co.uk/2013/06/08/facebook_cloud_versus_cloud/
Better Practice Guide: Data Centre Cooling |
9
As well as the increased rate of equipment failure, the energy to cool the air has
been wasted by allowing the warm air leaving a rack to mix before entering the rack.
`
Cold air
Warm air
Hot air
Rack 1
Rack 2
Rack 3
Figure 5: Common cooling problems
Another common problem occurs when two or more CRACs have misaligned target
states. For example, if one CRAC is set for a relative humidity of 60% and another
CRAC is set to 40%, then both ‘battle’ to reach their preferred humidity level. As
both CRACs will operate for far longer than necessary, electricity and water will be
wasted.
Older CRACs use belts to drive the fans. The belts wear and shed dust throughout
the data hall. This dust passes into the ICT equipment’s cooling systems, reducing
efficiency slightly. It is cost effective to replace belts as the newer drives have lower
operating costs due to lower power use and lower maintenance requirements.
Noise is acoustic energy, and can be used as a proxy measure of data centre
efficiency. Higher levels of noise mean that more energy is being lost from various
data centre components and converted to acoustic energy. Two key sources are fans
and air-flow blockages. Fans are essential to air movement. However, lowering the
inlet air temperature means the fans need to operate less frequently, or at lower
power.
Another common issue is blockages in transporting the cool or hot air. Blockages
mean that more pressure is needed to transport the air. Pressure is generated by the
fans (for example, in the CRACs or CRAHs), which must use more power to create
the greater pressure. Bundles of cables in the underfloor plenum or creating a
curtain in the equipment racks are very common faults that are easily remedied.
Ductwork with several left and right turns is another common issue. Smooth curves
minimise turbulence and allow for more efficient air movement.
Improving Cooling Efficiency
There are several simple, inexpensive techniques that have consistently improved
cooling system performance by 10 to 40 per cent.
Better Practice Guide: Data Centre Cooling |
10
Hot / Cold Aisle
Cold air
Hot air
Figure 6: Hot aisle configuration
Hot aisle / cold aisle: aligning the ICT equipment in the racks so that all the cold air
is drawn from one side of the rack and expelled from the other side. The racks are
then aligned in rows, so that the hot air from two rows blow toward each other,
while the alternate row, the cold air is drawn into two racks. Changing from
randomly arranged equipment to this arrangement reduces cooling costs by 15% to
25%.
Hot / Cold Aisle Containment
Enclosing one of the hot or cold aisles gives greater efficiencies by further
preventing hold and cold air from mixing. Cooling costs are reduced by another 10%
over hot / cold aisle alignment.5
Cold air
Hot air
Cold aisle containment
Hot aisle containment
Figure 7 Cold or hot aisle containment
Hot or cold aisle containment is nearly always cost effective in data centres that
have not implemented hot / cold aisle alignment and purpose-built containment
solutions are available. Containment can be retrofitted into existing data halls, using
an inexpensive material such as plywood, MDF or heavy plastic. However, due care
is necessary, for example to ensure that the fire suppression system will still operate
as designed.
5
Moving from random placement to hot / cold aisle containment means 25% to 35% reductions in cooling costs.
Better Practice Guide: Data Centre Cooling |
11
Raising the Temperature
Since 2004, ASHRAE TC 9.9 has published guidance on the appropriate temperature
of a data centre. Many organisations have reported major reductions in cooling costs
by operating the data centre at temperatures in the mid-twenties.
Class
The most recent advice was released in October 2011, and subsequently
incorporated into a book in 2012. The following table presents the recommended
and allowable conditions for different classes of ICT equipment.
A1
to
A4
Equipment Environmental Specifications
Product Operations
Product Power Off
Maximum
Dry-Bulb
Humidity Maximum Maximum
Dry-Bulb
Relative Maximum
Rate of
Temperature Range, Non- Dew Point Elevation
Temperature Humidity Dew Point
Change
(°C)
Condensing
(°C)
(m)
(°C)
(%)
(°C)
(°C/hour)
Recommended (Applies to all A classes; individual data centres can choose to expand this range
based upon the analysis described in ASHRAE documents)
18 to 27
5.5°C DP to
60% RH and
15°C DP
Allowable
A1
A2
A3
A4
B
C
15 to 32
10 to 35
5 to 40
5 o 45
5 to 35
5 to 40
20% to
80% RH
20% to
80% RH
-12°C DP & 8%
RH to 85% RH
-12°C DP & 8%
RH to 90% RH
8% RH to
80% RH
8% RH to
80% RH
17
3050
5/20
5 to 45
8 to 80
27
21
3050
5/20
5 to 45
8 to 80
27
24
3050
5/20
5 to 45
8 to 80
27
24
3050
5/20
5 to 45
8 to 80
27
28
3050
NA
5 to 45
8 to 80
29
28
3050
NA
5 to 45
8 to 80
29
ASHRAE offers two cautions, around noise and operating life. It is not enough to
raise the temperature of the air entering the data hall. The facilities and ICT staff
must know the temperature of the air entering the ICT equipment, and how the
equipment will respond. Beyond a certain temperature, any savings made in the
cooling system will be lost as fans in the ICT equipment work harder and longer.
ASHRAE also advise that raising the temperature does reduce operating life.
However, for many types of ICT equipment the operating life is significantly longer
than the economic life. Agencies should monitor the life of their ICT assets, including
failure rates. Agencies should also discuss their plans with the ICT equipment
manufacturer.
Potential Work Health Safety Issues
The cooling system can pose health risks to staff and members of the public. With
planning, these risks can be treated and managed. The common risks are noise,
bacteria and heat.
Heat is a possible risk once the data centre design includes zones in which the
temperature is intended to reach over 35°C. Hot aisle containment systems can
Better Practice Guide: Data Centre Cooling |
12
routinely have temperatures over 40 C. Staff must follow procedures to monitor
their environmental temperature, including duration. Hydration, among other
mitigation steps, will be required.
A typical data centre is noisy, and this is a potential risk to be managed by data
centre management, staff and visitors.6 Australian work health safety regulations
identify two types of harmful noise.7. Harmful noise can cause gradual hearing loss
over a period of time or be so loud that it causes immediate hearing loss. Hearing
loss is permanent.
The exposure standard for noise is defined as an LAeq,8h of 85 dB(A) or an LC,peak of
140 dB(C).

LAeq,8h means the eight hour equivalent continuous A-weighted sound
pressure level in decibels, referenced to 20 micropascals, determined in
accordance with AS/NZS 1269.1. This is related to the total amount of noise
energy a person is exposed to in the course of their working day. An
unacceptable risk of hearing loss occurs at LAeq,8h values above 85 dB(A).

LC,peak means the C-weighted peak sound pressure level in decibels,
referenced to 20 micropascals, determined in accordance with AS/NZS
1269.1. It usually relates to loud, sudden noises such as a gunshot or
hammering. LC,peak values above 140 dB(C) can cause immediate damage to
hearing.
Guidance and professional services are available to manage risks due to noise.8
Capacity Planning for Cooling Systems
Planning for cooling capacity requires taking several perspectives of the data centre.
The data centre can be broken down into a set of volumes of space. The first volume
is the ICT equipment in the rack. The second is groups of racks, or pods (if used).
The third is the data hall, and the last is the whole data centre.
For each volume, the key questions are:

How much heat is being generated?

How is the heat distributed in that volume of space?

How much cooling can be delivered to that volume, and at what rate?
Figure 8 shows a simplified representation of the data centre power systems. These
are the components that create the heat that requires cooling. Each component
subsystem has its own cooling needs and characteristics.
6
Safework Australia, “Managing Noise and Preventing Hearing Loss in the Workplace: Code of Practice”,
http://www.safeworkaustralia.gov.au/sites/SWA/about/Publications/Documents/627/Managing_Noise_and_Preventi
ng_Hearing_Loss_at_Work.pdf
7
Safework SA, “Noise in the Workplace” http://www.safework.sa.gov.au/uploaded_files/Noise.pdf
8
Australian Hearing, ”Protecting Your Hearing”
http://www.hearing.com.au/digitalAssets/9611_1256528082241_NFR2157-October-09.pdf
Better Practice Guide: Data Centre Cooling |
13
Main switchboard
Local
distribution
lines to the
building
UPS
Equipment
racks
DB
ICT
equipment
Backup
generator
Office area
Fire
protection
system
HVAC
system
Security
system
Data Centre
Infrastructure
Management
Figure 8: Conceptual view of data centre power systems
A simple approach to determine the total demand is:

Measure the total data centre power used at the main switchboard. Most of
this power will be converted to heat.

If the UPS uses chemical batteries, then these create heat throughout their
life, even if the UPS is not supplying power. This heat must be added to the
demand.

The backup generator will require cooling when operating. This must be
included in the total demand. If, when operating on backup power, there is a
substantial amount of the ICT equipment turned off, then

If the office area uses the data centre cooling, then this must be included.
People generate about 1 kW of heat.

Margin for peak demand, such as hot weather.

Headroom for growth in demand must be added. To control capital
expenditure, the headroom should be decided more on the length of time
needed for a capacity upgrade, and not the life of the data centre.
Using the data centre power as measured at the main switchboard is preferable over
adding up all the name plate power required by the ICT equipment. The ICT
equipment name plate describes the maximum amount of cooling required. As the
ICT equipment is usually operating at less than maximum power, using the sum of
all the name plate ratings of the equipment will lead to over provisioning of the
cooling system. Headroom can then be created, based on expected growth.
The total cooling demand can then be allocated to the various halls and rooms. The
number of CRACs (or other types of technology) to supply this cooling should be
enough so that at least one CRAC can be shut down for maintenance while the full
cooling capacity is delivered to the data hall. (This is known as N+1 redundancy).
The equipment racks will house different classes of ICT equipment, and so the
cooling needs will vary between racks. Some racks may use 20kW of power, and so
need 20kW of cooling, while other racks use 1kW. It is necessary to consider the
upper and lower cooling needs to ensure the cooling is distributed appropriately.
Better Practice Guide: Data Centre Cooling |
14
Operations Effectiveness and Continuous Improvement
A program to consistently improve operational effectiveness requires measurement
and standard reporting.
The level of investment is influenced by the data centre power bill and state of the
data centre. Data centres that follow no better practices may reduce their power bill
by over 50 per cent. Measurement and reporting will achieve these savings sooner.
Measurement
There are many options and possible measurement points in a data centre. Many
types of data centre equipment now include thermal sensors and reporting. The
precision and accuracy of these sensors is variable and should be checked.
Agencies may choose to sample at various points in the data centre, and extrapolate
for similar points. This approach reduces costs, at the expense of accuracy. Sampling
may be unsuitable in data centres with a high rate of change, or when trialling
higher data hall temperatures.
The following points should be considered for reporting:

Inlet air temperature for each device.

Exit point from the cooling unit.

Base of the rack.

Top of the rack

Return to the cooling unit.

External ambient temperature.

Chilled water loop.
For liquid cooling, this needs to be extended for the transfer points from the liquid
to the ICT equipment and back again.
A FLIR camera can be useful in identifying the air temperature and movement. The
camera captures small temperature gradients. This information can be used to find
hot spots, as the basis for efficiency improvements and removing causes of faults.
Data centres over medium size, or ones supplying critical services should use an
automated data collection and reporting product. The recording frequency can be as
low as 15 minutes, but ideally every 5 minutes.
There must be two thermal alarms, a warning for temperatures approach the point
at which equipment may fail, and a second for when temperatures exceed
equipment operating thresholds.
NABERS and PUE
Agencies that are planning to control data centre costs should use a consistent
metric.
The APS Data Centre Optimisation Target policy specifies the use of the Power Usage
Effectiveness (PUE) metric, and sets a target range of 1.7 to 1.9. The National
Australian Built Environment Rating System (NABERS) power for data centre metric
Better Practice Guide: Data Centre Cooling |
15
was launched in February 2013. NABERS should, over time, replace PUE for APS
data centres. A rating of 3 to 3.5 stars is equivalent to DCOT’s PUE target.
A key difference between NABERS and PUE is the ability to compare different data
centres. NABERS is explicitly designed for the purpose of comparing different data
centres. PUE is intended to be a metric for improving the efficiency of an individual
data centre, not for comparing data centres.
Optimising Cooling Systems
The Plan , Do, Check, Act approach (Deming Cycle) is suitable for optimising cooling
systems. The measurement and reporting systems can establish the baseline and
report on the effect of the changes. Typical changes include:

Reduce the heat to be removed by reducing the electricity used by the ICT
equipment. There are many actions that can be taken, including
virtualisation, consolidation, using modern equipment, and turning off idle
hardware.

Reduce the amount of energy needed to cool the air, by using the external
environment. Free air cooling can be retrofitted to most existing data
centres. Common techniques are to draw in external air, and to pipe the
chilled water through the external environment. Rarer examples include
using rivers, seas and underground pipes.

Reduce the energy used to move the air to and from the ICT equipment. One
approach is to prevent cold and hot from mixing. This can use blanking
panels, containment and hot / cold aisle alignment. Another approach is to
remove barriers, allowing the air to move more freely. A third approach is to
use the fans less. Possible actions include using variable speed fans
(replacing fixed speed fans), using a cycle of pushing very cold air into the
data hall then turning the fans off and letting the hall warm up, and using
larger, more efficient fans.
Once the effect of the trial has been measured, any beneficial changes can be made,
the new baseline established and the next range of actions planned.
Maintenance
The efficiency and performance of the cooling system is tied to the maintenance
regime. Skimping on routine maintenance usually incurs higher running costs and
reduces the reliability of the data centre.
At minimum, the routine maintenance regime should follow the manufacturer’s
specifications. Most modern air conditioning units provide historical information
which is useful in monitoring overall performance, and when optimising cooling
system performance.
Larger data centres, and those with more stringent reliability needs, will find
preventive and active maintenance are likely to be necessary. Preventive (or
predictive) maintenance is the scheduled replacement of components and units
before they are expected to fail. This type of maintenance relies on advice from the
manufacturer (which may change from time to time based on field experience) and
on the performance of the systems in the data centre.
Better Practice Guide: Data Centre Cooling |
16
Active maintenance involves replacing equipment once any warning signs are
noticed. Active maintenance relies on detailed monitoring of all components, and
being able to set up parameters for ‘normal’ and ‘abnormal’ operation.
Managers should note anecdotal evidence of ‘car park servicing’, a form of fraud in
which maintenance work is claimed to be done but has not. This risk can be
minimised by escorting service provides, and by monitoring the operating history.
Sustainability
Two considerations for sustainable cooling operations are refrigerants and water.
Most refrigeration and air-conditioning equipment uses either ozone depleting or
synthetic greenhouse gases, which are legally controlled in Australia. The Ozone
Protection and Synthetic Greenhouse Gas Management Act 1989 (the Act) controls the
manufacture, import and export of a range of ozone depleting substances and
synthetic greenhouse gases. The import, export and manufacture of these
'controlled substances', and the import and manufacture of certain products
containing or designed to contain some of these substances, is prohibited in
Australia unless the correct licence or exemption is held. More information is
available here: http://www.environment.gov.au/atmosphere/ozone/index.html.
The Australian Government ICT Sustainability Plan 2010-2015 describes a control
process which should be used for reducing water use in cooling systems. There is no
specific target for water use.
http://www.environment.gov.au/sustainability/government/ictplan/index.html
Some cooling technology uses significant quantities of water9, and their use may be
banned under extreme drought conditions. The Green Grid has developed the Water
Usage Effectiveness (WUE) metric10, to assist data centre operators to develop
controls to manage their water use. Agencies should note that there are cooling
systems designs that have closed water loops, which need only tens of litres of topup water. Some data centres also use rain fall to improve their sustainability.
Trends
There are several trends that are likely to affect data centre cooling systems:
9

Power efficiency: there is a steady reduction of the amount of power used in
all classes of ICT equipment, even as price/performance improves. In some
racks the amount of power needed will fall, meaning less cooling is required.
LAN switches with copper interfaces are an example of this.

Densification: there is a steady reduction in size for some classes of ICT
equipment, notably servers and storage. Racks with servers are likely to
consume more power. As the servers become physically smaller, more
servers will fit into a rack. While each server uses less power, the greater
number of servers means more power is needed.
Evaporative cooling towers can use millions of litres per year.
http://www.airah.org.au/imis15_prod/content_files/bestpracticeguides/bpg_cooling_towers.pdf
10
The Green Grid: http://www.thegreengrid.org/en/Global/Content/white-papers/WUE
Better Practice Guide: Data Centre Cooling |
17

Cloud computing: this is likely to slow down the rate of expansion of data
centre ICT capacity, and may significantly reduce the ICT systems in the data
centre.
Conclusion
Cooling is an essential overhead once the ICT equipment uses more than about
10kW of power. Ideally, the cooling systems keep the temperature and humidity
within a range that preserves the life of the ICT equipment.
As cooling is typically the largest overhead agencies should concentrate on making
the cooling efficient. However, this challenging and complex work is a lower priority
activity than ensuring the power supply and managing ICT moves and changes in
the data centre.
Key points are:

Design and operate for the specific and the whole. Consider all aspects of the
data centre when upgrading or tuning the cooling systems. Ensure that the
consequences of changes are considered on other data centre equipment, not
only the ICT equipment. The design must consider the whole data centre, and
not a subset. Extrapolating from the likely performance of the cooling system at
a single rack is likely to produce errors. Instead, model the likely air flow for the
entire data centre. Then be sure to measure it.

Air inlet temperature is a key metric: this is the temperature of the air cooling
the ICT equipment. Being able to measure the air temperature at this point is
central to understanding how well the cooling system is working.

Change the ICT equipment, change the air flow. The cooling system
behaviour will change as the hardware, rack, cables etc change. This means
monitoring and tuning the cooling system is a continual task.

Raise the temperature, cautiously. Raising the data centre temperature to
more than 24°C has proven effective in many sites for reducing energy use and
saving money. However, the ASHRAE guidance clearly advises taking care when
doing so. As the temperature rises, there may be changes in the air flow,
resulting in new hot spots. As well, different makes and models of ICT
equipment may require different temperatures and humidity ranges to operate
reliably. Agencies must confirm these details with the equipment manufacturer’s
specifications.

When things go wrong. The operations, disaster recovery and business
continuity plans need to explicitly consider minor and major failures, and the
time needed to restart. In a major cooling failure, the ICT equipment can
continue to warm the data centre air. Time will be needed to remove this
additional heat once the cooling system restarts.

Safety. Noise, heat and bacteria are all potential issues with a data centre
cooling system. Good operating procedures and training will address these risks.
Better Practice Guide: Data Centre Cooling |
18
3. Better Practices
Operations
The data hall has been arranged in hot and cold aisles, or in a containment solution.
All obstructions to the air flow are removed. In particular, cables do not share air
ducts or lay across the path that air is intended to follow. This includes to and from
the data hall, within the racks, and under the raised floor.
The temperature of the essential equipment in the data centre is monitored and
recorded. The humidity of the data centre is monitored and recorded. All deviations
from ‘Recommended’ to ‘Allowable’ (or worse) are analysed, corrected and
reported. The actions to keep the temperature and humidity to recommended levels
are documented and practiced. There is a routinely applied process for finding,
investigating and if needed, removing hot spots from the data centre. A FLIR camera
may assist in this process.
The noise levels in the data centre are monitored and reported. A noise management
plan is operating.
The agency has an energy efficiency target that involves the data centre’s cooling
system. The target may be based on PUE or NABERS. The progress towards this
target is being tracked and reported to the agency’s executive monthly.
There is routine maintenance of all cooling system elements, as per the
manufacturer’s requirements. The maintenance includes pipes and ducts as well as
major equipment. There is a control plan in place to ensure that the maintenance
has been performed adequately.
The data centre is monitored for dust and other particles. Sources of particles,
including unfiltered outside air, wearing belts and older floor tiles, are removed
over time. Filter paper may be used as an interim measure to improve equipment
reliability by removing particles.
The disaster recovery and business continuity plans include the impacts and
controls for the partial or complete failure of the cooling system. There are
rehearsals of the limitation, bypass and recovery activities for cooling system
operations.
There is a plan for managing leaks and spills in the data centre. This plan is
rehearsed from time to time.
There is a method for ensuring that procedures are followed and documentation is
maintained. This method may be based on ISO 9000 or other framework. There are
training and/or communications processes in place to ensure staff know and follow
procedures.
Better Practice Guide: Data Centre Cooling |
19
Planning
A plan is being followed for:

Cooling systems asset replacement.

Altering the capacity and distribution of the cooling.

Noise management.
A plan exists for the actions following the failure of the cooling system. The length of
time to restore the operating temperature is known and updated from time to time.
Agencies with larger data centres may conduct a complex fluid dynamic analysis of
the data centre from time to time.
All planning work involves the ICT, facilities and property teams. The plans are
approved by the senior responsible officer, and included in agency funding.
Better Practice Guide: Data Centre Cooling |
20
Fundamental
⊠
Measuring and reporting the power consumption of the cooling system.
Measuring and reporting the inlet temperature of ICT equipment.
⊠ Monitoring the outlet temperature of ICT equipment.
⊠
Maintaining hot and cold aisle alignment for racks and ICT equipment.
If hot/cold aisle containment has not been implemented, evaluate the
⊠ business case for containment.
⊠
The temperature of the inlet air has been raised to at least 22°.
There are active operations processes to raise the data centre temperature to
⊠ reduce energy costs. The temperature range in which the cooling systems
and the ICT systems use the least amount of energy has been identified.
⊠
⊠
The water use by the cooling system is measured and reported.
There are active operations processes to minimise water use.
The work health and safety plans include noise management. There are plans
⊠ to reduce noise. The relationship between raising temperature and noise
⊠
levels is measured and used in operations and capacity planning.
There is a capacity plan for the cooling system. Options for upgrading or
replacing parts of the cooling system are documented.
The cooling equipment is maintained according to manufacturers’
All works are inspected and verified as being conducted as
required. The operating hours of key equipment (e.g. CRACs) is tracked.
⊠ specifications.
The cooling system is cleaned according to manufacturer’s specifications and
⊠ government regulations.
⊠
There is a plan to manage leaks and spills in the data centre.
There is a plan, endorsed by senior management, for changing the cooling
⊠ systems capacity.
Better Practice Guide: Data Centre Cooling |
21
4. Conclusion
Agencies that use better practices in their data centres can expect lower costs, better
reliability, and improved safety than otherwise. Implementing the better practices
will give managers more information about data centre cooling, enabling better
decisions. Overall, the data centre will be better aligned to the agency’s strategic
objectives and the total cost of ownership will be lower.
Agencies will also find it simpler and easier to report against the mandatory
objectives of the data centre strategy. The key metric is avoided costs, that is, the
costs that agencies did not incur as a result of improvements in their data centres.
Capturing avoided costs is most effective when done by an agency in the context of a
completed project that has validated the original business case.
Summary of Better Practices
Cooling is an essential overhead once the ICT equipment uses more than about
10kW of power. Cooling is typically the largest data centre overhead and agencies
should ensure that the cooling system is efficient.
Key points are:

Design and operate for the specific and the whole. The design must consider
the whole data centre, and how each piece of equipment is cooled. The likely air
flow through the data centre should be modelled and measured routinely.

Air inlet temperature is a key metric: Being able to measure the air
temperature at this point is central to understanding how well the cooling
system is working.

Change the ICT equipment, change the air flow. The cooling system
behaviour will change as the data centre configuration changes. Monitoring and
tuning the cooling system is a continual task.

Raise the temperature, cautiously. Raising the data centre temperature has
proven effective in many sites for reducing energy use and so saving money. The
ASHRAE guidance clearly advises taking care when doing so.

When things go wrong. In a major cooling failure, the ICT equipment can
continue to warm the data centre air. Time will be needed to remove this
additional heat once the cooling system restarts.

Safety. Noise, heat and bacteria are all potential issues with a data centre
cooling system.
Better Practice Guide: Data Centre Cooling |
22
Download