MANAGING DISTRIBUTED UPS ENERGY FOR EFFECTIVE POWER CAPPING IN DATA CENTERS Vasileios Kontorinis, L.Zhang, B.Aksanli, J.Sampson, H.Homayoun, E. Pettis*, D. Tullsen, T. Rosing *Google ISCA 2012 UCSD Datacenter market is growing 2 World is becoming more IT dependent. Internet users increased from 16% to 30% of world population in 5 years [Internet World Stats] Smart phones are projected to jump from 500M in 2011 to 2B in 2015 [Inter.Telecom.Union] Internet heavily depends on Datacenters Data center power will double in 5 years Expected worldwide Datacenter Investment in 2012: 35B$ (equivalent to GDP of Lithuania) [DataCenterDynamics] Important to build cost-effective Datacenters Power Oversubscription - Opportunity 3 Datacenter More servers Server Cost Total Cost of Ownership / Server No Oversubscription One time capital expenses Servers Supporting equipment Recurring Costs With Oversubscription Same infrastructure Power Oversubscription More Cost-effective Data centers Power Oversubscription – Opportunity 4 [Barroso et al. + APC TCO calc] Assumptions: Server cost: 1500$ 28000 servers (10MW) Energy: 4.7c/KWh Power: 12$/kW Amort. Time DC: 10y, servers: 4y Distributed LA-based UPS Available at: http://cseweb.ucsd.edu/~tullsen/DCmodeling.html Utility Peak 5.5% Facility Space 4.5% Utility Energy 11.7% Power Infrastructure 7.9% Cooling Infrastructure 3.3% PUE overhead 2.6% Server Opex 2.0% Rest 11.9% DC opex 9.9% Server Depreciation 40.6% UPS LA 0.2% Power Oversubscription using Stored Energy 5 Power Profile Pulse Model Shaping Diurnal Power Profile Power Power Peak Power M Tu W Time … Peak Power Pulse Peak Power Reduction Low Power Pulse … UPS stored Energy + _ Su Time Leverage diurnal patterns of web services Discharge UPS batteries during high activity (once per day) Recharge during high (once per day) Centralized UPS 6 Used in most small / medium data centers Scales poorly High losses in AC-DC-AC conversion (5-10%) Centralized single point of failure, requires redundancy X Increasingly cost-inefficient for large data centers Distributed UPS 7 Used in large data centers Scales with data center size Avoids AC-DC-AC conversion Distributed points of failure Facebook Cheaper UPS solution Google Related work and our proposal 8 Utility Diesel Generator UPS + _ Centralized UPSs for power capping [Govindan, ISCA 2011] Distributed UPSs for rare power emergencies [Govindan, ASPLOS 2012] Our proposal: PDUs … Racks Provision distributed UPS for peak power capping Different battery technology Shave power on daily basis Place more servers under same power infrastructure Better amortize capex costs Outline 9 Introduction Choosing the right battery for power shaving Datacenter workload and power modeling Policies and results Conclusions Outline 10 Introduction Choosing the right battery for power shaving Datacenter workload and power modeling Policies and results Conclusions Competing Battery Technologies 11 Lead Acid (LA) Lithium Cobalt Oxide (LCO) Lithium Iron Phosphate (LFP) Electric Metrics 12 Backup UPS batteries rarely used (3-4 times per year) Proper metrics: Cost Size Wh / $ Volumetric Density (Wh / liter) Backup + peak shaving UPS batteries used on daily basis Proper metrics: Charge cycles Cost Size Recharge speed Wh * cycles / $ Volumetric Density (Wh / litre) ( % charge / hour) Battery Technology Comparison 13 Backup: Lead Acid (cheaper) Backup+Peak Shaving: Lithium Iron Phosphate (cost effective) Battery Capacity-Cost Estimation Power 14 Peak Duration E shaved Peak Reduction Time LFP Lead Acid Assumptions 15 Number of servers 28K Server Type Custom Sun Fire X4270 - Intel Xeon (8-core), 8 GB Mem. - Idle Power: 175W - Max Power: 350W PSU efficiency 80% Workload Pulse Model, utilization 50% Batteries LFP (5$/Ah), LA (2$/Ah) TCO savings with peak duration 16 LFP LFP size constraint LA LA size constraint LA The more we shave, the more we gain! LFP more space,energy efficient than LA, can shave more! TCO savings with battery DoD 17 When shaving same energy: Low DoD High DoD + _ (a) LA + _ (b) LFP Sweet DoD spot for TCO savings (LA: 40%, LFP: 60%) Key points for battery selection 18 When using batteries for peak power shaving: Shave as much power as possible (reasonably sized battery) There is a DoD sweet spot, maximizing TCO savings LFP better technology because: lots of recharges more efficient discharge higher energy density cheaper in the future What if: - Servers with unbalanced load? - Day-to-day variation in demand? Outline 19 Introduction Choosing the right battery for power shaving Datacenter workload and power modeling Policies and results Conclusions Workload Modeling 20 Whole year traffic data from Google Transparency Report Apply weights according to web presence: (Search 29.2%, Social Networking 55.8%, Map Reduce 15%) Present results for 3 worst consecutive days (11/17/2010-11/19/2010) Workload Modeling (cont.) 21 Model 1000 machine cluster, with 5 PDUs, 10 racks per PDU, 20 servers (2u) per rack. We simulate load based on M/M/8 queues and scale inter-arrival time according to workload traffic Interarrival Time Job Job Job Job Job Job Job Job Job Job Job Job Job Job 8 Cores (consumers)/ Server Job Job Job Job Job Scheduler (Round Robin or Load-aware) …….. Job Service Time Outline 22 Introduction Choosing the right battery for power shaving Datacenter workload and power modeling Policies and results Conclusions Policy goals 23 Guarantee power budget at specific level of power hierarchy Discharge during only high activity, charge during only low activity Effective irrespective of job scheduling Make uniform battery usage Uncoordinated Policy 24 Power over Threshold Recharge Complete Available In Use Power below Threshold Recharge Reached DoD Goal Applied at the server level Easy to implement Runs independently per server DoD goal set to 60% of battery capacity (LFP) Not Available (Power + Bat. Recharge Power) below Threshold Uncoordinated Policy Results 25 Round Robin Scheduling Batteries discharge when not required Batteries recharge during peak Fails to guarantee budget Budget violation Uncoordinated Policy Results (cont.) 26 Load-aware Scheduling Batteries discharge all together (wasteful) Recharge all together (violates budget) Fails to guarantee budget Coordination is required!! Budget violation Coordinated Control 27 Applied at higher levels (PDU, Cluster) Requires remote battery enable/disable, initiate recharge Number of batteries enabled proportional to peak magnitude Batteries used spatially distributed Overall Power 300 server 100 server equivalent equivalent 200 server 200 server equivalent equivalent 0 server equivalent Day1 Day2 rack1 Day3 rack2 Coordinated Policies 28 Pdu-level Cluster-level Power cap close to Average power (ideal) of 250W Peak power reduction of 19% 23% more servers 6.2% TCO/server reduction Discussion: Energy proportionality 29 Modern Servers Sharper, thinner peaks We can shave more power, with same stored energy Overall Power Energy Proporional Servers Day1 Day2 Day3 Peak power reduction of up to 37.5% with the 40Ah LFP battery Concluding remarks 30 Battery provisioning of distributed UPS topologies to cap power and oversubscribe data center is beneficial Critical to reconsider battery properties (technology, capacity, DoD) Coordination of charges and discharges is required We cap peak power by 19%, allow 23% more servers and better amortize capex costs Achieve 6.2% reduction in TCO/server ($15M -- 28k server DC) 31 BACKUP SLIDES TCO savings with battery cost 32 LA is stable technology LFP advancements expected, due to electric vehicles TCO savings increase over time with LFP! When things go wrong? 33 Scenario 1: Unexpected daily traffic We use the additional 35% capacity in our batteries (DoD optimized for TCO savings at 60%) Scenario 2: Batteries are not replaced immediately With 50% of batteries dead we can still reduce peak by 15% Grouping battery maintenance/replacement for cost savings possible Exploration of Dead Batteries 34 Discussion: DVFS 35 To DVFS or not DVFS? Datacenter SLAs violations likely during peak load DVFS bad during high demand Great during low demand Creates higher margins for aggressive battery capping Overall Power Potential SLA violation WITH No DVFS SLA violation unlikely Day1 Day2 Day3 Battery Capacity-Cost Estimation 36 E Datacenter,shaved = Power PeakReduction * PeakDuration Peak Duration E shaved Peak Reduction Time E server,shaved = E Datacenter,shaved* PSUEff # servers Cbattery 1 1 Eserver,shaved PE-1 *I * * = DoD 0.8 V Cbattery *CostperAh * # servers UPSdepreciation = Min(servicelife, DoD(cycles) / 30) LFP Lead Acid (~twice volume) Battery Related Assumptions 37 Workload partitioning 38 Distributed Algorithm 39