Holistic Design at Cardiff

advertisement
Holistic, Energy Efficient
Design
@ Cardiff
Going Green Can Save Money
Dr Hugh Beedie
CTO
ARCCA & INSRV
Introduction



The Context
Drivers to Go Green
Where does all the power go ?




Before the equipment
In the equipment
What should we do about it ?
What Cardiff University is doing about it
The Context (1)



Cardiff University receives £3M grant to
purchase a new supercomputer
A new room is required to house it, with
appropriate power, cooling, etc
2 tenders


Data Centre construction
Supercomputer
The Context (2)
 INSRV Sustainability Mission: To minimise CU’s IT
environmental impact and to be a leader in delivering
sustainable information services.
 Some current & recent initiatives:
 University INSRV Windows XP image default
settings
 Condor – saving energy, etc compared to a
dedicated supercomputer
 ARCCA & INSRV new Data Centre
 PC Power saving project – standby 15 mins after
logout (being implemented this session)
Drivers – Why do Green IT?







Increasing demand for CPU & Storage
Lack of Space
Lack of Power
Increasing energy bills (oil prices doubled)
Enhancing the Reputation of Cardiff
University & attracting better students
Sustainable IT
Because we should (for the Planet )
Congress Report Aug 2007



US Data Centre electricity demand
doubled 2000-2006
Trends toward 20kW+ per rack
Large scope for efficiency improvement



Obvious – more efficiency at each stage
Holistic approach necessary – facility and
component improvements
Less obvious – virtualisation (up to 5X)
Where does all the power go? (1)
“Up to 50% is used before getting
to the Server”
Report to US Congress Aug 2007
Loss = £50,000 p.a. for every 100kW
supplied to the room
Where does all the power go? (2)
Where does all the power go? (3)

How ?


Power Conversion - before it gets to your
room, you lose in the HV-> LV transformer
Efficiency=98% not 95%
Return On Investment (ROI)?



New installation, ROI = 1 month
Replacement, ROI = 1 year
Lifetime of investment = 20+ yrs !!!!!
Where does all the power go? (4)
How?

Cooling infrastructure
Typical markup 75%
• Lowest markup 25-30% ?
• Est ROI 2-3 years (lifetime 8 years)
•
Where does all the power go? (4)
How?


Backup power (UPS) (% load vs %
efficiency)
Efficiency = 80-95%
•
•
Est. ROI for new installation - <1year
Replacement not so good, UPS’ life 3-5 yrs
only ?
Where does it go? – Bull View
Cumulative power
Data Centre consumption
100%
80%
Cooling
40%
Power
delivery
Loads
Where does it go? – Intel View
25.5%
5.5%
7.3%
36.4%
Load
CPU, Memory,
Drives , I/O
100W
18.2%
PSU 50W
Voltage
Regulators
20W
Server
fans
15W
7.3%
UPS
+PDU
20W
Room cooling
system
70W
Total 275W
Source: Intel Corp.
Where does it go? – APC View
Server Power Consumption
Server Components
Power Consumption
PSU losses
38W
Fan
10W
CPU
80W
Memory
36W
Disks
12W
Peripheral slots
50W
Motherboard
25W
Options for Cardiff (1)

Carry on as before


Dual core HPC solutions
Wait for quad core





Saves on Flops/watt
Saves on infrastructure (fewer network ports)
Saves on management (fewer nodes)
Saves on space
Saves on power
Options for Cardiff (2)

High density solution


Carry on as before (6-8kW per rack)


Needs specialist cooling over 8kW per rack
Probable higher TCO
Low density solution (typically)



BT – free air cooling
Allow wider operating temp range – warranty
issues ?
Not applicable here (no space)
What did Cardiff do? (1)

Ran 2 projects



TCO as key evaluation criterion


HPC equipment
Environment
Plus need to measure and report on usage
Problems


Finger pointing (strong project mgt)
Scheduling (keep everyone in the loop)
Timetable
Tender elements
Date of Issue
Date of Order
Room Tender
April 2007
July 2007
(to Comtec)
No delay
December
2007
Waiting for
QuadCore
March 2008
(Cardiff
Estates)
Long lead times
on low loss
transformers
HPC Tender
January 2007
HV
August 2007
Transformer
Reason for
delay
What did Cardiff do? (2)

Bought Bull R422 servers for HPC




80W Quad core Harpertown
2 dual socket, quad core servers in 1U –
common PSU
Larger fans (not on CPU)
Other project in same room


IBM Bladecentres
Some pizza boxes
Back Up Power
APC 160kW Symmetra UPS




Full and half load efficiency 92%
Scaleable & Modular – could grow
as we grew
Strong environment management
options

Integrated with cooling units

SNMP

Bypass capability
Bought 2 (not fully populated)

1 for compute nodes

1 for management and
another project
Enhanced existing standby
generator
Cooling Inside the Room
APC Netshelter Airflow
APC Inline RC units
Provides residual cooling to the room
Front
Servers
Resilient against loss of an RC unit
Cools hot air without mixing with
cold air
InRow
Cooling
Unit
InRow
Cooling
Unit
Cooling – Outside the Room
3 Airedale 120kW
Chillers
Ultima Compact Free
Cool 120kW
Quiet model
Variable speed fans
N+1 arrangement
Free-cooling vs Mechanical cooling
System operated on
100% Free-Cooling
(12% of Year)
System operated on
partial Free-Cooling
(50% of Year)
System operated on
mechanical cooling only
(38% of Year)
Cooling
Load
-7oC
3.5oC
12.5oC
Ambient Temp.
Cost Savings Summary
 Low loss transformer
 £10k p.a.
 UPS
 £20k p.a.
 Cooling
 £50k p.a. estimated
 Servers
 80W part - £20k p.a.
 Quad core – same power but twice the ‘grunt’
Lessons learned from SRIF3 HPC
Procurement - Summary

Strong project management essential

IT and Estates liaison essential but difficult

Good Supplier relationship essential

Major savings possible


Infrastructure (power & cooling)
Servers (density and efficiency)
Download