Power Management for Computer Systems and Datacenters Karthick Rajamani, Charles Lefurgy, Soraya Ghiasi, Juan C Rubio, Heather Hanson, Tom Keller {karthick, lefurgy, sghiasi, rubioj, hlhanson, tkeller}@us.ibm.com IBM Austin Research Labs ISLPED 2008 © 2008 IBM Corporation Overview of Tutorial 2 1. Introduction and Background 2. Power Management Concepts • New focus on power management - why, who, what ? • Understanding the problem Diverse requirements WHAT is the problem ? Understanding variability • Basic solutions • Advanced solutions Building blocks - sensors and actuators Feedback-driven and model-assisted solution design 3. Industry solutions 4. Datacenter • Sample solutions • In-depth solutions • Facilities Management Anatomy of a datacenter Improving efficiency ISLPED 2008 © 2008 IBM Corporation Scope of this tutorial Power Management Solutions for Servers and the Data Center • Problems • Solution concepts, characteristics and context • Tools and approaches for developing solutions • Brief overview of some industrial solutions Outside Scope Power-efficient microprocessor design, circuit, process technologies Power-efficient middleware, software, compiler technologies Embedded systems 3 ISLPED 2008 © 2008 IBM Corporation Servers and Storage Heat Density Trends 5 ISLPED 2008 © 2008 IBM Corporation Cost of Power and Cooling Spending (US$B) $120 $100 Worldwide Server Market Installed Base (M Units) 50 45 New Server Spending 40 Power and Cooling 35 $80 30 $60 25 20 $40 15 10 $20 5 $0 0 Source: IDC, The Impact of Power and Cooling on Data Center Infrastructure, May 2006 6 ISLPED 2008 © 2008 IBM Corporation Government and Organizations in Action EPA EnergyStar for Computers Japan Energy Savings Law 2005 ECMA TC38-TG2 EEP US Congress HR5646 2006 2007 2008 SPECpower Regulations and standardization of computer energy efficiency (e.g. Energy Star for computers), because of • Spiraling energy and cooling costs at data centers • Environmental impact of high energy consumption Japan, Australia, EU and US EPA working out joint/global regulation efforts Benchmarks for power and performance • Metrics: GHz, BW, FLOPS → SpecInt, SpecFP, Transactions/sec → Performance/Power 8 ISLPED 2008 © 2008 IBM Corporation Benchmarking Power-Performance: SPECPower_ssj_2008 Based on SPECjbb, a java performance benchmark. Generalization to other workloads, Self-calibration phases determine peak throughput on system-under-test Benchmark consists of 11 load levels: 100% of peak throughput to idle, in 10% steps Fixed time interval per load level Random arrival times for transactions to mimic realistic variations within each load level 70 60 50 40 30 20 10 power per level 0% 10% 20% 30% 40% 50% 60% 70% 90% 80% SPECPower Load Levels Time Source: Heather Hanson, IBM idle 11 100% 0 calibration ∑ 80 calibration idle 100% Average Utilization calibration ∑ Throughput per level % Load, Utilization, and Power Range of load levels – not just peak 100% Normalized Power 90 blade/cluster environments underway. Primary benchmark metric Load Level 100 ISLPED 2008 © 2008 IBM Corporation Outline 1. Introduction and Background 2. Power Management Concepts • New focus on power management - why, who, what ? • Understanding the problem Diverse requirements WHAT is the problem? Understanding variability • Basic solutions • Advanced solutions Building blocks - sensors and actuators Feedback-driven and model-assisted solution design 3. Industry solutions 4. Datacenter • Sample solutions • In-depth solutions • Facilities Management Anatomy of a datacenter Improving efficiency 15 ISLPED 2008 © 2008 IBM Corporation User Requirements Users desire different goals • High performance • Safe and reliable operation • Low operating costs But solution for one can be potentially contradictory to a different goal 16 • Increasing frequency for higher performance can lead to unsafe operating temperatures • Lowering power consumption by operating at lower active states can help safe operation while increasing execution time and potentially total energy costs ISLPED 2008 © 2008 IBM Corporation Is a single sub-system the main problem? No Server power budget breakdown for different classes/types. Normalized Power Breakdown 100% 90% 80% 70% 60% Processor / Cache 50% Memory subsystem 40% IO and Disk Storage 30% Cooling 20% Power subsystem 10% 0% de no e ad Bl PC H nd he ig H e m ra nf ai M Using budget estimates for specific machine configurations, for illustration purposes. Biggest power consumer varies with server class Important to understand composition of target class for delivering targeted solutions Important to address power consumption/efficiencies in all main subsystems 18 ISLPED 2008 © 2008 IBM Corporation Workloads and Configuration Dictate Power Consumption Estimated power breakdown for two Petaflop supercomputer HPC node designs, each configuration tailored to application class Big Science System Power Breakdown (HPL) Big Simulation System Power Breakdown (Stream) 9% 9% 16% 14% 3% 1% 11% 4% 59% 2% 24% 48% Processors Optics Processors Optics Memory DIMMs Package Distribution Losses Memory DIMMs Package Distribution Losses DC-DC Losses AC-DC Losses DC-DC Losses AC-DC Losses Source: Karthick Rajamani, IBM Depending on configuration and usage, processor or memory sub-system turn out to be dominant Power distribution/conversion losses is a significant fraction Note1: Room A/C energy costs are not captured – significant at the HPC data center level. Note 2: I/O and storage can also be significant power consumers for commercial computing installations 19 ISLPED 2008 © 2008 IBM Corporation Power Variability across Real Systems Large variation between different applications • 2x between LINPACK and idle, Linpack about 20% higher than memory-bound SPEC CPU2000 swim benchmark. Smaller power variation between individual blades (8%) Source: Charles Lefurgy, IBM 250 LINPACK Power measured on 33 IBM HS21 blades P o w e r (W ) 200 Swim (SPEC 2000) 150 Idle 100 Processor clock modulation 50 Processor voltage/freq scaling 0 250 500 750 1000 1250 1500 1750 2000 2250 2500 2750 3000 Processor effective frequency (MHz) 21 ISLPED 2008 © 2008 IBM Corporation Impact of Process Variation 1.12 1.10 Normalized Power Source: Juan Rubio, Karthick Rajamani, IBM 1.10 1.08 1.08 1.08 1.06 1.03 1.04 1.02 1.00 1.00 0.98 0.96 0.94 Part 5 Part 4 Part 3 Part 2 Part 1 10% power variation in random sample of five ‘identical’ Intel PentiumM processor chips running Linpack. 23 ISLPED 2008 © 2008 IBM Corporation Source: Karthick Rajamani, IBM 1.51 1.51 1.28 1.16 1.00 1.0 1.52 1.5 1.52 2.0 1.55 2.07 2.5 1.00 Normalized Current/Power Variations from Design 0.5 0.0 Max Active (Idd7) Vendor 1 Vendor 2 Vendor 3 Max Idle (Idd3N) Vendor 4 Vendor 5 Memory power specifications for DDR2 parts with identical performance specifications • 24 2X difference in active power and 1.5X difference in idle power ISLPED 2008 © 2008 IBM Corporation Environmental Conditions: Ambient Temperature ~5 C drop in ambient temperature ~5 C drop in CPU temperature Frequency (MHz) time time Time (hh:mm) Source: Heather Hanson, Coordinated Power, Energy, and Temperature Management, Dissertation, 2007. 25 ISLPED 2008 © 2008 IBM Corporation Take Away Power management solutions need to target the computer system as a whole with mechanisms to address all major sub-systems. Further, function-oriented design and workload’s usage of a system can create dramatic differences in the distribution of power by sub-systems. User requirements impose a diverse set of constraints which can sometimes be contradictory. There is increasing variability and even unpredictability of power consumption due to manufacturing technologies, incorporation of circuitlevel power reduction techniques, workload behavior and environmental effects. Power management solutions need to be flexible and adaptive to accommodate all of the above. 27 ISLPED 2008 © 2008 IBM Corporation Outline 1. Introduction and Background 2. Power Management Concepts • New focus on power management - why, who, what ? • Understanding the problem Diverse requirements WHAT is the problem ? Understanding variability • Basic solutions • Advanced solutions Building blocks - sensors and actuators Feedback-driven and model-assisted solution design 3. Industry solutions 4. Datacenter • Sample solutions • In-depth solutions • Facilities Management Anatomy of a datacenter Improving efficiency 28 ISLPED 2008 © 2008 IBM Corporation Basic Solutions – Save Energy Function Definition: Reduce power consumption of computer system when not in active use. Conventional Implementation: Exploiting architected idle modes of processor entered by executing special code/instructions with interrupts causing exit. • Multiple modes with increasing reduction in power consumption usually with increased latency to entry/exit • Exploit circuit-level clock gating of significant portions of the chip and more recently voltage-frequency reductions for idle states Exploiting device idle modes through logic in controller or device driver e.g. 29 • standby state for disks • self-refresh for DRAM in embedded systems ISLPED 2008 © 2008 IBM Corporation Basic Solutions – Avoid System Failure Function Definition: Maintain continued computer operation in the face of power or cooling emergencies. Conventional Implementation Using redundant components in power distribution and cooling sub-systems • N+M solutions. Throttling of components under thermal overload. • Employing thermal sensors, components are throttled when temperatures exceed pre-set thresholds. Fan speed is adjusted based on heat load detected with thermal sensors – to balance acoustics, power and cooling considerations. 30 ISLPED 2008 © 2008 IBM Corporation Adaptive Solutions: Key to address variability and diverse conditions Feedback-driven control provides 1. capability to adapt to environment, workload, varying user requirements 2. regulate to desired constraints even with imperfect information Models provide ability to 1. estimate unmeasured quantities 2. predict impact for change Feedback-driven and model-assisted control framework Sensors Provide real-time feedback Power, temperature, performance (activity), stability Weapon against variability and unpredictability 31 Actuators Regulate component states e.g. voltage/freq, DRAM power-down; activity – instruction/request throughput. Manage CPU, memory, fans, disk. Tune power-performance levels to environment, workload and constraints. ISLPED 2008 © 2008 IBM Corporation Thermal Sensors Thermal sensor key characteristics • Accuracy and precision - lower values require higher tolerance margins for thermal control solutions. • Accessibility and speed - Impact placement of control and rate of response. Ambient measurement sensors • Located on-board, inlet temperature, outlet temperature, at the fan e.g. National Semiconductor LM73 on-board sensor with +/-1 deg C accuracy. • Relatively slower response time – observing larger thermal constant effects. • Standard interfaces for accessing include PECI, I2C, SMBus, and 1-wire On-chip/-component sensors • Measure temperatures at specific locations on the processor or in specific units • Need more rapid response time, feeding faster actuations e.g. clock throttling. • Proprietary interfaces with on-chip control and standard interfaces for off-chip control. 32 ISLPED 2008 © 2008 IBM Corporation Power Measurement Sensors AC power • External components – Intelligent PDU, SmartWatt • Intelligent power supplies – PSMI standard • Instrumented power supplies DC power • Most laptops – battery discharge rate • IBM Active Energy Manager – system power • Intel Foxton technology – processor power Sensor must suit the application: • Access rate (second, ms, us) • Accuracy • Precision • Accessibility (I2C, ethernet, open source driver) 33 (1) C. Poirier, R. McGowen, C. Bostak, S. Naffziger. “Power and Temperature Control on a 90nm Itanium-Family Processor”, ISSCC 2007 (2) HotChips - http://www.hotchips.org/archives/hc17/3_Tue/HC17.S8/HC17.S8T3.pdf ISLPED 2008 © 2008 IBM Corporation Activity Monitors ‘Performance’ Counters • Traditionally part of processor performance monitoring unit • Can track microarchitecture and system activity of all kinds • A fast feedback for activity, have also been shown to serve as potential proxies for power and even thermals Resource utilization metrics in the operating system • Serve as useful input to resource state scheduling solutions for power reduction Application performance metrics • Best feedback for assessing power-performance trade-offs 35 ISLPED 2008 © 2008 IBM Corporation Actuators for Processor Power Control Processor pipeline throttling in IBM Power4 and follow-ons. Clock throttling in x86 architectures. Dynamic voltage and frequency scaling (DVFS) in modern Intel, AMD, IBM POWER6 processors. Power (W) Power vs Performance Chart shows DFS/DVFS power-performance trade-offs on an IBM LS-20 blade that uses AMD Opteron processor. 180 160 140 120 100 80 60 40 20 0 DVFS significantly more efficient than DFS. DVFS trade-offs also nearlinear at system-level. 0 0.2 0.4 0.6 0.8 1 1.2 Pe rformance LS-20 Power (DVFS) LS-20 Power (DFS @1.4V) Source: Karthick Rajamani, IBM 36 ISLPED 2008 © 2008 IBM Corporation Memory Systems Power Management MP Stream-Copy Pow er • Request throttling an effective means to limit memory power with linear power-performance trade-offs Incorporation of DRAM idle power-down modes can significantly lower their power. • Can be implemented in memory controller • Can increase performance in power-constrained systems by reducing throttling on active ranks as idle ranks consume less power Normalized Power DRAM power linearly related to memory bandwidth. 1.00 1 0.8 0.6 0.4 0.2 0 0.89 0.71 0.52 0.49 16 ranks 8 ranks 0.57 0.35 4 ranks Basic 0.40 16 ranks 0.58 0.31 8 ranks 0.46 0.29 4 ranks With Pow er Management 1 chgrp 2 chgrp N or m aliz ed B u s B an dw id th 1.00 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Normal With Power Management 0.83 0.72 0.72 0.63 0.50 0.50 0.39 Normalized Performance MP Stream-Copy Performance 0 Commercial Best Case Commercial Worst Case Technical Best Case Technical Worst Case 1.16 1.00 16 ranks 1.030.99 8 ranks 1.03 0.83 4 ranks Basic 1.021.06 1.011.05 1.02 0.78 16 ranks 8 ranks 4 ranks With Power Management 1 chgrp 2 chgrp Combination of 2-channel grouping and Power-down management attacks both active and idle power consumption yielding the best results. Source: Karthick Rajamani, IBM 37 1.2 1 0.8 0.6 0.4 0.2 0 ISLPED 2008 © 2008 IBM Corporation Low-power Enterprise Storage Better components • Use disks with fewer higher capacity spindles • Flash solid-state drives • Variable speed disks – with disk power being proportional to rotational speed (squared), tailoring speed to required performance could improve efficiency Massive Array of Idle Disks (MAID) – turn on only 25% of array at one time • Targets “persistent data” layer between highly available, low latency disk storage and tape • May need to tolerate some latency when a disk turns on, keep some always on • Improve performance by tuning applications to target only 25% of array ‘Virtualization’ 38 • “Thin provisioning” or over-subscription of storage can drive utilization toward 100%, delaying purchase of additional storage • Physical storage is dedicated only when data is actually written ISLPED 2008 © 2008 IBM Corporation Outline 1. Introduction and Background 2. Power Management Concepts • New focus on power management - why, who, what ? • Understanding the problem Diverse requirements WHAT is the problem ? Understanding variability • Basic solutions • Advanced solutions Building blocks - sensors and actuators Feedback-driven and model-assisted solution design 3. Industry solutions 4. Datacenter • Sample solutions • In-depth solutions • Facilities Management Anatomy of a datacenter Improving efficiency 39 ISLPED 2008 © 2008 IBM Corporation Adaptive Power Management Demo Power6 blade prototype power capping demo • Demonstrates ability to adapt at runtime to workload changes and user input (power cap limit) while maintaining blade power below specified value. Power6 blade power savings demo • Demonstrates ability to adapt to load characteristics to provide increased power savings by matching power-performance level to load-demand. - Adapting to load level - Adapting to load type 41 ISLPED 2008 © 2008 IBM Corporation Advanced Solutions – Performance Maximization Usage: When cooling or power resources are shared by multiple entities, dynamic partitioning of the resource among the sharing entities can increase utilization of the shared resource enabling higher performance. We term this technique as power shifting. Figure shows an example of CPU-memory power shifting. 60 Points show execution intervals of many workloads with no limit on power budget. CPU Power Watts 50 40 ‘Static’ dotted rectangle encloses unthrottled intervals for a budget of 40W partitioned statically – 27W CPU, 13W Memory. 30 Static 20 10 Dynamic 0 0 10 20 30 40 50 60 Memory Power Watts 47 ‘Dynamic’ dashed triangle encloses unthrottled intervals for power shifting with 40W budget Better performance as a much larger set of intervals run unthrottled. ISLPED 2008 © 2008 IBM Corporation Power Shifting between Processor and Memory Models Pcpu = DPC * C1 + Pstby _ cpu Pmem = BW * M 1 + Pstby _ mem Re-adjust budgets based on feedback Pdynamic = Pbudget − Pcpu _ stby − Pmem _ stby Pest = DPC 0 * C1 + BW 0 * M 1 Budgetcpu = DPC 0 *C1/ Pest * Pdynamic + Pcpu _ stby Budgetmem = BW 0 * M 1/ Pest * Pdynamic + Pmem _ stby New budgets determine new throughput and bandwidth limits. Graph shows the increase in execution time for constrained power budget with SPECCPU2000 workloads Proportional-Last-Interval – particular power shifting algorithm Static – common budget allocation – using average consumption across workloads. 48 ISLPED 2008 Reference: A Performance Conserving Approach for Reducing Peak Power in Server Systems – W Felter, K Rajamani, T Keller, C Rusu, ICS 2005 © 2008 IBM Corporation Take Away Advanced functions extend basic power management capabilities • Providing adaptive solutions for tackling variability in systems, environment and requirements, and • Enabling dynamic power-performance trade-offs. They can be implemented by methodologies incorporating • Targeted sensors and actuators. • Feedback-control systems. • Model-assisted frameworks. 53 ISLPED 2008 © 2008 IBM Corporation Virtualization – Opportunities and Challenges for Power Reduction Different studies have shown significant under-utilization of compute resources in large data centers. Virtualization enables multiple low-utilization OS images to occupy a single physical server. • Multi-core processors with virtualization support and large SMP systems provide a growing infrastructure which facilitates virtualization-based consolidation. The common expectation is • A net reduction in energy costs. • Lower infrastructure costs for power delivery and cooling. However: • Processor capacity is not the only resource workloads need – memory, I/O.. • Workloads on partitions sharing a system might have different power-performance needs. Isolating and understanding the characteristics and consequent power management needs for each partition is non-trivial, requiring 54 • Additional instrumentation and monitoring capabilities. • Augmenting VM managers for coordinating, facilitating or managing power management functions ISLPED 2008 © 2008 IBM Corporation Managing Server Power in a Data Center http://dilbert.com/strips/comic/2008-02-12 http://dilbert.com/strips/comic/2008-02-14 57 ISLPED 2008 © 2008 IBM Corporation Outline 1. Introduction and Background 2. Power Management Concepts • New focus on power management - why, who, what ? • Understanding the problem Diverse requirements WHAT is the problem ? Understanding variability • Basic solutions • Advanced solutions Building blocks - sensors and actuators Feedback-driven and model-assisted solution design 3. Industry solutions 4. Datacenter • Sample solutions • In-depth solutions • Facilities Management Anatomy of a datacenter Improving efficiency 58 ISLPED 2008 © 2008 IBM Corporation Efficiency improvements Computer Industry Response DC-powered data center Free cooling Hot-aisle containment In-rack cooling Power measurement Cluster power trending Efficient power supplies Virtualization Dynamic Smart Cooling Mobile data center Power capping Adaptive power savings Thermal sensors Idle modes DVFS Data Center Liquid cooling Rack Server Components 59 ISLPED 2008 © 2008 IBM Corporation In-depth Examples 1. Active Energy Manager – for cluster-wide system monitoring and policy management. 2. EnergyScale – for power management of POWER6 systems. 60 ISLPED 2008 © 2008 IBM Corporation IBM Systems Director Active Energy Manager Provides a single view of the actual power usage across multiple platforms. Measures, trends, and controls energy usage of all managed systems. Management server IBM Systems Director Active Energy Mgr. Network Network 61 Rack servers Blade Enclosures Service processor X86 blade Cell blade Power blade Management module ISLPED 2008 Legacy systems or without compatible instrumentation iPDU © 2008 IBM Corporation Rack-mount Server Management Measure and Trend: power, thermals Control: power cap, energy savings 62 ISLPED 2008 © 2008 IBM Corporation Intelligent Power Distribution Unit (iPDU) Web interface User defined receptacle names Current Power Cumulative use 64 ISLPED 2008 © 2008 IBM Corporation EnergyScale Elements and Functionality Thermal / Power measurement System health monitoring/maintenance Power/thermal capping Hardware Management IBM Director Active Energy Manager Module Power saving Performance-aware power management IBM POWER6™ Power Power Modules Modules EnergyScale 65 Flexible Service Processor (FSP) EnergyScale Controller System-levelpower power System-level management management firmware firmware Power Power Measurement Measurement ISLPED 2008 © 2008 IBM Corporation EnergyScale Sensors 67 Sensor Feedback On Description On chip digital temperature sensors (internal use) POWER6 chip temperature 24 digital temp. sensitive ring oscillators – 8 per core, 8 in nest On chip analog temperature sensors POWER6 chip temperature 3 metal thermistors – 1 per core, 1 in nest – analog Critical path monitor (internal use) POWER6 operational stability 24 sensors providing real-time timing margin feedback Voltage Regulation Module (VRM) Component/voltage-rail power Voltage and current provided for each voltage-rail in the system Temperature of VRM components On-board power measurement sensors System and component power Calibrated sensors, accurate, real-time feedback Discrete temperature sensor System level ambient temperature Reports temperature of air seen by components Dedicated processor activity counters Core activity and performance Per-core activity information Dedicated memory controller activity counters DRAM usage Per-controller activity and power management information ISLPED 2008 © 2008 IBM Corporation Temperature Sensors on POWER6 Core 0 Digital thermal sensors • Quick response time • Placed in hot-spots identified during simulation and early part characterization Metal thermistors • Very accurate MC 0 • Used in previous designs SMP Coherency Fabric MC 1 Core 1 • Large area for a 65nm chip Source: M. Floyd et al., IBM J. R&D, November 2007 68 ISLPED 2008 © 2008 IBM Corporation EnergyScale Actuators Actuator/Mode Controlled By Description Dynamic Voltage and Frequency Scaling (DVFS) Service processors Variable frequency oscillator control, voltage control for array and logic domains, system-wide Processor Pipeline Throttling On-chip thermal protection circuitry, service processors 6 different modes of pipeline throttling, manageable on a per-core basis Processor Standby Modes OS and Hypervisor Nap/Active Memory Throttling Service processors 4 different modes of activity/request throttling Memory Standby Modes Memory controller Per-rank power-down mode enable – FSP/dedicated microcontroller Fan Speeds Service processors Based on ambient temperature Service Processors: FSP and EnergyScale Controller on slide 70 69 ISLPED 2008 © 2008 IBM Corporation Water cooling Thermal conductivity [W/(m*K)] Volumetric heat capacity [kJ/(m3*K)] Air 0.0245 1.27 H2O 0.6 4176 Water-cooled chips NCAR Bluefire Supercomputer using IBM p575 hydrocluster. Images courtesy of UCAR maintained Bluefire web gallery. 73 ISLPED 2008 © 2008 IBM Corporation Typical Industry Solutions in 2008: Other Related Function Description Example On demand Purchase cycles on demand (avoid owning idle resources) Amazon: Elastic Compute Cloud (EC2) Data center assessment Measure power/thermal/airflow trends. Use computational fluid dynamics to model data center. IBM, HP, Sun, and many others Recommend changes to air flow, equipment placement, etc. Certification for carbon offsets 3rd party verifies energy reduction of facilities. Trade certificates for money on certificate trading market. Neuwing Energy Ventures Utility rebates Encourage data centers to use less power (e.g. by using virtualization) PG&E Solutions shown in example column are representative ones incorporating the specific function/technique. Many of these solutions also provide other functions. No claim is being made regarding superiority of any example shown over any alternatives. 78 ISLPED 2008 © 2008 IBM Corporation Take Away There is significant effort from industry on power and temperature management of computer systems The current intent is not only to make individual components more energy efficient, but: • Enhance functionality in certain components • Integrate them into system-level power management solutions 79 ISLPED 2008 © 2008 IBM Corporation Outline 1. Introduction and Background 2. Power Management Concepts • New focus on power management - why, who, what ? • Understanding the problem Diverse requirements WHAT is the problem ? Understanding variability • Basic solutions • Advanced solutions Building blocks - sensors and actuators Feedback-driven and model-assisted solution design 3. Industry solutions 4. Datacenter • Sample solutions • In-depth solutions • Facilities Management Anatomy of a datacenter Improving efficiency 80 ISLPED 2008 © 2008 IBM Corporation The Data Center Raised Floor No two are the same 81 ISLPED 2008 © 2008 IBM Corporation A Typical Data Center Raised Floor Racks (computers, storage, tape) Networking equipment (switches) Secured Vault Network Operating Center 82 Fiber Connectivity Terminating on Frame Relay Switch ISLPED 2008 © 2008 IBM Corporation Power Delivery Infrastructure for a Typical Large Data Center (30K sq ft of raised-floor and above) Several pounds of copper Transfer panel switch 83 Power Distribution Unit (PDU) Diesel generators Uninterruptible Power Supply (UPS) modules Diesel tanks ISLPED 2008 UPS batteries Power feed substation © 2008 IBM Corporation Cooling Infrastructure for a Typical Large Data Center (30K sq ft of raised-floor and above) Computer Room Air Conditioning (CRAC) units Water pumps Water chillers 84 Cooling towers ISLPED 2008 © 2008 IBM Corporation Sample Data Center Energy Consumption Breakdown Fans in the servers already consume 5-20% of the computer load Reference: Tschudi, et al., “Data Centers and Energy Use – Let’s Look at the Data”, ACEEE 2003 85 ISLPED 2008 © 2008 IBM Corporation Data Center Efficiency Metrics Need metrics to indicate energy efficiency of entire facility • Metrics do not include quality of IT equipment Power Usage Effectiveness (PUE) Most commonly used metrics Power Usage Effectiveness ( PUE ) = Total facility power IT equipment power Data Center Efficiency ( DCE ) = IT equipment power Total facility power Study of 22 sample data centers 3.0 2.5 Fallacy: Cooling power = IT power. Reality: Data center efficiency varies. 2.0 1.5 1.0 0.5 Minimum PUE 0.0 1 2 3 4 5 6 7 8 9 10 11 12 14 16 17 18 19 20 21 22 Data Center number Reference: Tschudi, et al., “Measuring and Managing Data Center Energy Use”, 2006 86 ISLPED 2008 © 2008 IBM Corporation Outline 1. Introduction and Background 2. Power Management Concepts • New focus on power management - why, who, what ? • Understanding the problem Diverse requirements WHAT is the problem ? Understanding variability • Basic solutions • Advanced solutions Building blocks - sensors and actuators Feedback-driven and model-assisted solution design 3. Industry solutions 4. Datacenter • Sample solutions • In-depth solutions • Facilities Management Anatomy of a datacenter Improving efficiency 87 ISLPED 2008 © 2008 IBM Corporation Data Center Power Distribution 88 ISLPED 2008 © 2008 IBM Corporation Maintainability and Availability vs. Energy Efficiency Distributing power across a large area requires a decentralized/hierarchical architecture • Power transformers (13.2 kV to 480 V or 600 V) • Power distribution units (PDU) • And a few miles of copper Maintaining the uptime of a data center requires the use of redundant components • Uninterrupted Power Supplies (UPS) • Emergency Power Supply (EPS) – e.g. diesel power generators • Redundant configurations (N+1, 2N, 2(N+1), …) to guarantee power and cooling for IT equipment Both introduce energy losses 89 ISLPED 2008 © 2008 IBM Corporation Data Center Power Delivery Upstream tap 230 kV Downstream tap 230 kV Other Loads Power Grid Circuit Breaker Circuit Breaker Circuit Breaker Circuit Breaker Substation 13.2 kV Power Feeds ‘A’ feed to Data Center ‘B’ feed to Data Center 13.2 kV 230 kV 13.2 kV EPS Generators Sub Sub Sub n/c bkr n/o bkr n/c tie breaker UPS Batteries Static Bypass UPS System Maintenance Bypass n/o bkr n/c tie breaker Alternate Bypass Maintenance Bypass 480 V Transfer Switch Gear A Transfer Switch Gear B tie breaker PDU Substation usually receives power from 2 or more points Substation provides 2 redundant feeds to data center If utility power is lost: 1. UPS can support critical load for at least 15 minutes 2. EPS generators can be online in 10-20 seconds 3. The transfer switch will switch to EPS generators for power 4. EPS generators are diesel fueled and can run for an extended period of time 480 V Critical Load 90 ISLPED 2008 © 2008 IBM Corporation Data Center Power Conversion Efficiencies UPS(1) Power Distribution(2) 88 - 92% 98 - 99% Power Supply(3,4) 55 - 90% DC/DC(5) 78% - 93% The heat generated from the losses at each step of power conversion requires additional cooling power (1) http://hightech.lbl.gov/DCTraining/graphics/ups-efficiency.html (2) N. Rasmussen. “Electrical Efficiency Modeling for Data Centers”, APC White Paper, 2007 (3) http://hightech.lbl.gov/documents/PS/Sample_Server_PSTest.pdf (4) “ENERGY STAR® Server Specification Discussion Document”, October 31, 2007. (5) IBM internal sources 91 ISLPED 2008 © 2008 IBM Corporation Stranded power in the data center Data center must wire to nameplate power However, real workloads do not use that much power Result: available power is stranded and cannot be used Example: IBM HS20 blade server – nameplate power is 56 W above real workloads. 300 Nameplate power: 308 W Stranded power: 56 W 250 200 150 Real workload maximum 100 0 nameplate 50 idle gzip-1 vpr-1 gcc-1 mcf-1 crafty-1 parser-1 eon-1 perlbmk-1 gap-1 vortex-1 bzip2-1 twolf-1 wup swim-1 mgrid-1 applu-1 mesa-1 galgel-1 art-1 equake-1 facerec-1 ammp-1 lucas-1 fma3d-1 sixtrack-1 apsi-1 gzip-2 vpr-2 gcc-2 mcf-2 crafty-2 parser-2 eon-2 perlbmk-2 gap-2 vortex-2 bzip2-2 twolf-2 wup swim-2 mgrid-2 applu-2 mesa-2 galgel-2 art-2 equake-2 facerec-2 ammp-2 lucas-2 fma3d-2 sixtrack-2 apsi-2 SPECJBB LINPACK Server Power (W) 350 Source: Lefurgy, IBM 93 ISLPED 2008 © 2008 IBM Corporation Data Center Direct Current Power Distribution Goal: • Reduce unnecessary conversion losses Approach: • Distribute power from the substation to the rack as DC • Distribute at a higher voltage than with AC to address voltage drops in transmission lines Challenges: • Requires conductors with very low resistance to reduce losses • Potential changes to server equipment Prototype: • Sun, Berkeley Labs and other partners. (1) http://hightech.lbl.gov/dc-powering/ 94 ISLPED 2008 © 2008 IBM Corporation AC System Losses Compared to DC 480 VAC Bulk Power Supply AC/DC DC/AC AC/DC DC/DC 12 V PSU UPS VRM 12 V VRM 5V PDU VRM 9% measured improvement 480 VAC Bulk Power Supply 2-5% measured improvement VRM 1.2 V VRM 1.8 V Server VRM 12 V AC/DC DC UPS or Rectifier DC/DC 0.8 V VRM 12 V VRM 5V 380 VDC PSU 3.3 V 3.3 V Loads using Legacy Voltages Loads using Silicon Voltages Loads using Legacy Voltages VRM Server 95 ISLPED 2008 VRM 1.2 V VRM 1.8 V 0.8 V Loads using Silicon Voltages VRM © 2008 IBM Corporation Typical Industry Solutions in 2008: Power Consumption Function Description Example Configurator Estimate power/thermal load of system before purchase Sun: Sim Datacenter Measurement Servers with built-in sensors measure power, inlet temperature, outlet temperature. HP: server power supplies that monitor power Power capping Set power consumption limit for individual servers to meet rack/enclosure constraints. IBM: Active Energy Manager Energy savings Performance-aware modeling to enable energy-savings modes with minimal impact on application performance. IBM: POWER6 EnergyScale Power off Turn off servers when idle. Based on user-defined policies (load, time of day, server interrelationships) Cassatt: Active Response Virtualization Consolidate computing resources for increased efficiency and freeing up idle resources to be shutdown or kept in low-power modes. VMware: ESX Server DC-powered data center Use DC power for equipment and eliminate AC-DC conversion. Validus DC Systems Componentlevel control Enable control of power-performance trade-offs for individual components in the system. AMD: PowerNow, Intel: Enhanced Speedstep Solutions shown in example column are representative ones incorporating the specific function/technique. Many of these solutions also provide other functions. No claim is being made regarding superiority of any example shown over any alternatives. 96 ISLPED 2008 © 2008 IBM Corporation Data Center Cooling 97 ISLPED 2008 © 2008 IBM Corporation Raised Floor Cooling Thermodynamic part of cooling: Hot spots (high inlet temperatures) impact CRAC efficiency (~ 1.7% per oF) Transport part of cooling: Low CRAC utilization impacts CRAC blower efficiency (~3 kW/CRAC) Source: Hendrik Hamann, IBM 99 ISLPED 2008 © 2008 IBM Corporation Impact of Raised Floor Air Flow on Server Power When there is not enough cold air coming from the perforated tiles Source: J. Rubio, IBM • Air pressure drops in front of the machine • Servers fans need to work harder to get cold air across its components. • Additionally, hot air can create a high pressure area, and overflow into the cold aisle System Power (W) 340 335 330 325 320 315 310 305 300 0 Basic experiment 100 200 300 Air flow at perf tile (CFM) • Create enclosed micro-system – rack, 2 perforated tiles and path to CRAC • Linpack running on single server in bottom half of the rack, other servers idle. • Adjust air flow from perforated tiles 100 ISLPED 2008 © 2008 IBM Corporation Air Flow Management Equipment • Laid out to create hot and cold aisles 1x10 tiles A Tiles • Standard tiles are 2’ x 2’ 1x10 tiles B 1x10 tiles • Perforated tiles are placed according to amount of air needed for servers • Cold aisles usually 2-3 tiles wide • Hot aisles usually 2 tiles wide Under-floor • Floor cavity height sets total cooling capability • 3’ height in new data centers Reference: R. Schmidt, “Data Center Airflow: A Predictive Model”, 2000 101 ISLPED 2008 © 2008 IBM Corporation Modeling the Data Center Computational Fluid Dynamics (CFD) •Useful for initial planning phase and what-if scenarios •Input parameters such as rack flows, etc. are very difficult/expensive to come by (garbage in – garbage out problem). •Coupled partial differential equations require long-winding CFD calculations Measurement-based •Find problems in existing data centers •Measure the temperature and air flow throughout the data center •Highlights differences between actual data center and the ideal data center modeled by CFD 105 ISLPED 2008 © 2008 IBM Corporation Analysis for Improving Efficiencies, MMT (a) CFD model results @ 5.5 feet (b) Experimental data @ 5.5 feet 36.0oC 12.6oC 9oC Temperature legend for model and data contours Temperature legend for model and data contours -19oC Source: Hendrik Hamann, IBM 106 (c ) Difference between model and data ISLPED 2008 © 2008 IBM Corporation Typical Data Center Raised Floor #10: CRAC layout #9: recirculation #1:flow control Problems ! #7: layout #6:Hot/cold Aisle problem #5: hot air max #3:leak #2:rack layout min #8: intermixing #4:flow control (overprovisoned) Source: Hendrik Hamann, IBM 107 ISLPED 2008 © 2008 IBM Corporation HP Dynamic Smart Cooling Deploy air temperature sensor(s) on each rack Collect temperature readings at centralized location Apply model to determine setting of CRAC fans to maximize cooling of IT equipment Challenges: • Difficult to determine impact of particular CRAC(s) on temperature of a given rack – using offline principal component analysis (PCA) and online neural networks to assist logic engine • Requires CRACs with variable frequency drives (VFD) – not standard in most data centers, but becoming available with time. Category Typical size Small (air cooling) Medium Large (air and (air and chilled water chilled water cooling) cooling) 10K sq ft 30K sq ft >35K sq ft Energy savings (% of cooling costs) 40% 30% 15% Estimated MWh saved 5,300 9,100 10,500 (1) C. Bash, C. Patel, R. Sharma, “Dynamic Thermal Management of Air Cooled Data Centers”, HPL-2006-11, 2006 (2) L. Bautista and R. Sharma, “Analysis of Environmental Data in Data Centers”, HPL-2007-98, 2007 (3) http://www.hp.com/hpinfo/globalcitizenship/gcreport/energy/casestudies.html 108 ISLPED 2008 © 2008 IBM Corporation Commercial Liquid Cooling Solutions for Racks Purpose: • Localized heat removal – attack it before it reaches the rest of the computer room • Allows us to install equipment with high power densities in a room not designed for it Implementation: • Self-contained air cooling solution (water or glycol for taking heat from the air) • Air movement Liebert XDF™ Enclosure (1) Types: • Enclosures – create cool microclimate for selected ‘problem’ equipment • Sidecar heat exchanger – to address rack-level hotspots without increasing HVAC load (1) APC InfraStruXure InRow RP (2) “Liebert XDF™ High Heat-Density Enclosure with Integrated Cooling”, http://www.liebert.com/product_pages/ProductDocumentation.aspx?id=40 (2) “APC InfraStruXure InRow RP Chilled Water”, http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=ACRP501 109 ISLPED 2008 © 2008 IBM Corporation Chilled Water System Two separate water loops CRAC Chilled water (CW) loop Chiller CW pump • Chiller(s) cool water which is used by CRAC(s) to cool down the air • Chilled water usually arrives to the CRACs near 45°F-55°F Condensation water pump Condensation water loop • Usually ends in a cooling tower • Needed to remove heat out of the facilities Cooling tower Sample chilled water circuit 110 ISLPED 2008 © 2008 IBM Corporation Air-Side and Water-Side Economizers (a.k.a. Free Cooling) Air-side Economizer (1) • A control algorithm that brings in outside air when it is cooler than the raised floor return air • Needs to consider air humidity and particles count • One data center showed reduction of ~30% in cooling power Water-side Economizer (2) • Circulate chilled water (CW) thru an external cooling tower (bypassing the chiller) when outside air is significantly cold • Usually suited for climates that have wetbulb temperatures lower than 55°F for 3,000 or more hours per year, and chilled water loops designed for 50°F and above chilled water Thermal energy storage (TES) (3) • Create chilled water (or even ice) at night. • Use to assist in generation of CW during day, reducing overall electricity cost for cooling • Reservoir can behave as another chiller, or be part of CW loop (1) A. Shehabi, et al. “Data Center Economizer Contamination and Humidity Study”, March 13, 2007. http://hightech.lbl.gov/documents/DATA_CENTERS/EconomizerDemoReportMarch13.pdf (2) Pacific Gas & Electric, “High Performance Data Centers: A Design Guidelines Sourcebook”, January 2006. http://hightech.lbl.gov/documents/DATA_CENTERS/06_DataCenters-PGE.pdf (3) “Cool Thermal Energy Storage”, ASHRAE Journal, September 2006 111 ISLPED 2008 © 2008 IBM Corporation Typical Industry Solutions in 2008: Cooling Function Description Example Hot aisle containment Close hot aisles to prevent mixing of warm and cool air. Add doors to ends of aisle and ceiling tiles spanning over aisle. American Power Conversion Corp. Sidecar heat exchange Sidecar heat exchange uses water/refrigerant to optimize hot/cold aisle air flow. Closed systems re-circulate cooled air in the cabinet, preventing mixing with room air. Emerson Network Power: Liebert XD products Air flow regulation Control inlet/outlet temperature of racks by regulating CRAC airflow. Model relationship between individual CRAC airflow and rack temperature. HP: Dynamic Smart Cooling Cooling economizers Use cooling tower to produce chilled water when outside air temperature is favorable. Turn off chiller’s compressors. Wells Fargo & Co. data center in Minneapolis Cooling storage Generate ice or cool fluid with help of external environment, or while energy rates are reduced IBM ice storage Modular data center Design data center for high-density physical requirements. Data center in a shipping container. Sun: Project Blackbox Airflow goes rack-to-rack, with heat exchangers in between. Solutions shown in example column are representative ones incorporating the specific function/technique. Many of these solutions also provide other functions. No claim is being made regarding superiority of any example shown over any alternatives. 116 ISLPED 2008 © 2008 IBM Corporation Next generation solutions – Component to Datacenter Integrated fan control for thermal management Partition-level power management Datacenter modeling and optimization Dynamic and deterministic performance boost On-chip power measurement Integrated IT and facilities management Power-aware workload management Process Technologies Enhanced processor idle states Hardware Acceleration Finegrained, faster clock scaling 125 Power-aware microarchitecture Power Shifting Power-aware virtualization for resource and server consolidation Enhanced memory power modes ISLPED 2008 Combined firmware-OS power management © 2008 IBM Corporation Conclusion There is lot of work going on in the industry and academia to address power and cooling issues – its a very hot topic ! Much of it has been done in the last few years and we’re nowhere near solving all the issues. Scope of the problem is vast from thermal failures of individual components to efficiency of data centers and beyond. There is no silver bullet – the problem has to be attacked right from better manufacturing technologies to coordinated facilities and IT management. Key lies in adaptive solutions that are real-time information driven, and incorporate adequate understanding of the interplay between diverse requirements, workloads and system characteristics. 127 ISLPED 2008 © 2008 IBM Corporation