Holistic, Energy Efficient Design @ Cardiff Going Green Can Save Money Dr Hugh Beedie CTO ARCCA & INSRV Introduction The Context Drivers to Go Green Where does all the power go ? Before the equipment In the equipment What should we do about it ? What Cardiff University is doing about it The Context (1) Cardiff University receives £3M grant to purchase a new supercomputer A new room is required to house it, with appropriate power, cooling, etc 2 tenders Data Centre construction Supercomputer The Context (2) INSRV Sustainability Mission: To minimise CU’s IT environmental impact and to be a leader in delivering sustainable information services. Some current & recent initiatives: University INSRV Windows XP image default settings Condor – saving energy, etc compared to a dedicated supercomputer ARCCA & INSRV new Data Centre PC Power saving project – standby 15 mins after logout (being implemented this session) Drivers – Why do Green IT? Increasing demand for CPU & Storage Lack of Space Lack of Power Increasing energy bills (oil prices doubled) Enhancing the Reputation of Cardiff University & attracting better students Sustainable IT Because we should (for the Planet ) Congress Report Aug 2007 US Data Centre electricity demand doubled 2000-2006 Trends toward 20kW+ per rack Large scope for efficiency improvement Obvious – more efficiency at each stage Holistic approach necessary – facility and component improvements Less obvious – virtualisation (up to 5X) Where does all the power go? (1) “Up to 50% is used before getting to the Server” Report to US Congress Aug 2007 Loss = £50,000 p.a. for every 100kW supplied to the room Where does all the power go? (2) Where does all the power go? (3) How ? Power Conversion - before it gets to your room, you lose in the HV-> LV transformer Efficiency=98% not 95% Return On Investment (ROI)? New installation, ROI = 1 month Replacement, ROI = 1 year Lifetime of investment = 20+ yrs !!!!! Where does all the power go? (4) How? Cooling infrastructure Typical markup 75% • Lowest markup 25-30% ? • Est ROI 2-3 years (lifetime 8 years) • Where does all the power go? (4) How? Backup power (UPS) (% load vs % efficiency) Efficiency = 80-95% • • Est. ROI for new installation - <1year Replacement not so good, UPS’ life 3-5 yrs only ? Where does it go? – Bull View Cumulative power Data Centre consumption 100% 80% Cooling 40% Power delivery Loads Where does it go? – Intel View 25.5% 5.5% 7.3% 36.4% Load CPU, Memory, Drives , I/O 100W 18.2% PSU 50W Voltage Regulators 20W Server fans 15W 7.3% UPS +PDU 20W Room cooling system 70W Total 275W Source: Intel Corp. Where does it go? – APC View Server Power Consumption Server Components Power Consumption PSU losses 38W Fan 10W CPU 80W Memory 36W Disks 12W Peripheral slots 50W Motherboard 25W Options for Cardiff (1) Carry on as before Dual core HPC solutions Wait for quad core Saves on Flops/watt Saves on infrastructure (fewer network ports) Saves on management (fewer nodes) Saves on space Saves on power Options for Cardiff (2) High density solution Carry on as before (6-8kW per rack) Needs specialist cooling over 8kW per rack Probable higher TCO Low density solution (typically) BT – free air cooling Allow wider operating temp range – warranty issues ? Not applicable here (no space) What did Cardiff do? (1) Ran 2 projects TCO as key evaluation criterion HPC equipment Environment Plus need to measure and report on usage Problems Finger pointing (strong project mgt) Scheduling (keep everyone in the loop) Timetable Tender elements Date of Issue Date of Order Room Tender April 2007 July 2007 (to Comtec) No delay December 2007 Waiting for QuadCore March 2008 (Cardiff Estates) Long lead times on low loss transformers HPC Tender January 2007 HV August 2007 Transformer Reason for delay What did Cardiff do? (2) Bought Bull R422 servers for HPC 80W Quad core Harpertown 2 dual socket, quad core servers in 1U – common PSU Larger fans (not on CPU) Other project in same room IBM Bladecentres Some pizza boxes Back Up Power APC 160kW Symmetra UPS Full and half load efficiency 92% Scaleable & Modular – could grow as we grew Strong environment management options Integrated with cooling units SNMP Bypass capability Bought 2 (not fully populated) 1 for compute nodes 1 for management and another project Enhanced existing standby generator Cooling Inside the Room APC Netshelter Airflow APC Inline RC units Provides residual cooling to the room Front Servers Resilient against loss of an RC unit Cools hot air without mixing with cold air InRow Cooling Unit InRow Cooling Unit Cooling – Outside the Room 3 Airedale 120kW Chillers Ultima Compact Free Cool 120kW Quiet model Variable speed fans N+1 arrangement Free-cooling vs Mechanical cooling System operated on 100% Free-Cooling (12% of Year) System operated on partial Free-Cooling (50% of Year) System operated on mechanical cooling only (38% of Year) Cooling Load -7oC 3.5oC 12.5oC Ambient Temp. Cost Savings Summary Low loss transformer £10k p.a. UPS £20k p.a. Cooling £50k p.a. estimated Servers 80W part - £20k p.a. Quad core – same power but twice the ‘grunt’ Lessons learned from SRIF3 HPC Procurement - Summary Strong project management essential IT and Estates liaison essential but difficult Good Supplier relationship essential Major savings possible Infrastructure (power & cooling) Servers (density and efficiency)