MANAGING DISTRIBUTED UPS ENERGY FOR
EFFECTIVE POWER CAPPING IN DATA CENTERS
Vasileios Kontorinis, L.Zhang, B.Aksanli, J.Sampson, H.Homayoun,
E. Pettis*, D. Tullsen, T. Rosing
*Google
ISCA 2012
UCSD
Datacenter market is growing
2
World is becoming more IT dependent.
Internet users increased from 16% to 30% of world population
in 5 years [Internet World Stats]
Smart phones are projected to jump from 500M in 2011
to 2B in 2015 [Inter.Telecom.Union]
Internet heavily depends on Datacenters
Data center power will double in 5 years
Expected worldwide Datacenter Investment in 2012: 35B$
(equivalent to GDP of Lithuania) [DataCenterDynamics]
Important to build cost-effective Datacenters
Power Oversubscription - Opportunity
3
Datacenter
More servers
Server
Cost
Total Cost of Ownership / Server
No Oversubscription
One time
capital
expenses
Servers
Supporting equipment
Recurring
Costs
With Oversubscription
Same infrastructure
Power Oversubscription More Cost-effective Data centers
Power Oversubscription – Opportunity
4
[Barroso et al. + APC TCO calc]
Assumptions:
Server cost: 1500$
28000 servers (10MW)
Energy: 4.7c/KWh
Power: 12$/kW
Amort. Time DC: 10y, servers: 4y
Distributed LA-based UPS
Available at:
http://cseweb.ucsd.edu/~tullsen/DCmodeling.html
Utility Peak
5.5%
Facility Space
4.5%
Utility Energy
11.7%
Power
Infrastructure
7.9%
Cooling
Infrastructure
3.3%
PUE overhead
2.6%
Server Opex
2.0%
Rest
11.9%
DC opex
9.9%
Server
Depreciation
40.6%
UPS LA
0.2%
Power Oversubscription using Stored Energy
5
Power Profile
Pulse
Model
Shaping
Diurnal Power Profile
Power
Power
Peak
Power
M
Tu
W
Time
…
Peak Power
Pulse
Peak Power
Reduction
Low Power
Pulse
…
UPS stored
Energy
+
_
Su
Time
Leverage diurnal patterns of web services
Discharge UPS batteries during high activity (once per day)
Recharge during high (once per day)
Centralized UPS
6
Used in most small / medium data
centers
Scales poorly
High losses in AC-DC-AC conversion
(5-10%)
Centralized single point of failure,
requires redundancy
X
Increasingly cost-inefficient for large data centers
Distributed UPS
7
Used in large data centers
Scales with data center size
Avoids AC-DC-AC conversion
Distributed points of failure
Facebook
Cheaper UPS solution
Google
Related work and our proposal
8
Utility
Diesel
Generator
UPS
+
_
Centralized UPSs for power
capping [Govindan, ISCA 2011]
Distributed UPSs for rare power
emergencies [Govindan, ASPLOS
2012]
Our proposal:
PDUs
…
Racks
Provision distributed UPS for peak
power capping
Different battery technology
Shave power on daily basis
Place more servers under same
power infrastructure
Better amortize capex costs
Outline
9
Introduction
Choosing the right battery for power shaving
Datacenter workload and power modeling
Policies and results
Conclusions
Outline
10
Introduction
Choosing the right battery for power shaving
Datacenter workload and power modeling
Policies and results
Conclusions
Competing Battery Technologies
11
Lead Acid (LA)
Lithium Cobalt
Oxide (LCO)
Lithium Iron
Phosphate (LFP)
Electric
Metrics
12
Backup
UPS batteries rarely used (3-4 times per year)
Proper metrics:
Cost
Size
Wh / $
Volumetric Density (Wh / liter)
Backup + peak shaving
UPS batteries used on daily basis
Proper metrics:
Charge cycles
Cost
Size
Recharge speed
Wh * cycles / $
Volumetric Density (Wh / litre)
( % charge / hour)
Battery Technology Comparison
13
Backup: Lead Acid (cheaper)
Backup+Peak Shaving: Lithium Iron Phosphate (cost effective)
Battery Capacity-Cost Estimation
Power
14
Peak
Duration
E shaved
Peak
Reduction
Time
LFP
Lead Acid
Assumptions
15
Number of servers
28K
Server Type
Custom Sun Fire X4270
- Intel Xeon (8-core), 8 GB Mem.
- Idle Power: 175W
- Max Power: 350W
PSU efficiency
80%
Workload
Pulse Model, utilization 50%
Batteries
LFP (5$/Ah), LA (2$/Ah)
TCO savings with peak duration
16
LFP
LFP size constraint
LA
LA size constraint
LA
The more we shave, the more we gain!
LFP more space,energy efficient than LA, can shave more!
TCO savings with battery DoD
17
When shaving same energy:
Low DoD
High DoD
+
_
(a) LA
+
_
(b) LFP
Sweet DoD spot for TCO savings (LA: 40%, LFP: 60%)
Key points for battery selection
18
When using batteries for peak power shaving:
Shave as much power as possible (reasonably sized
battery)
There is a DoD sweet spot, maximizing TCO savings
LFP better technology because:
lots of recharges
more efficient discharge
higher energy density
cheaper in the future
What if: - Servers with unbalanced load?
- Day-to-day variation in demand?
Outline
19
Introduction
Choosing the right battery for power shaving
Datacenter workload and power modeling
Policies and results
Conclusions
Workload Modeling
20
Whole year traffic data from Google Transparency Report
Apply weights according to web presence:
(Search 29.2%, Social Networking 55.8%, Map Reduce 15%)
Present results for 3 worst consecutive days
(11/17/2010-11/19/2010)
Workload Modeling (cont.)
21
Model 1000 machine cluster, with 5 PDUs, 10 racks per PDU,
20 servers (2u) per rack.
We simulate load based on M/M/8 queues and scale inter-arrival
time according to workload traffic
Interarrival Time
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
8 Cores (consumers)/ Server
Job
Job
Job
Job
Job
Scheduler
(Round Robin or Load-aware)
……..
Job
Service
Time
Outline
22
Introduction
Choosing the right battery for power shaving
Datacenter workload and power modeling
Policies and results
Conclusions
Policy goals
23
Guarantee power budget at specific level of power
hierarchy
Discharge during only high activity,
charge during only low activity
Effective irrespective of job scheduling
Make uniform battery usage
Uncoordinated Policy
24
Power over Threshold
Recharge Complete
Available
In Use
Power below Threshold
Recharge
Reached DoD Goal
Applied at the server level
Easy to implement
Runs independently per server
DoD goal set to 60% of
battery capacity (LFP)
Not
Available
(Power + Bat. Recharge Power) below Threshold
Uncoordinated Policy Results
25
Round Robin Scheduling
Batteries discharge when
not required
Batteries recharge during peak
Fails to guarantee budget
Budget
violation
Uncoordinated Policy Results (cont.)
26
Load-aware Scheduling
Batteries discharge all together
(wasteful)
Recharge all together
(violates budget)
Fails to guarantee budget
Coordination is required!!
Budget
violation
Coordinated Control
27
Applied at higher levels
(PDU, Cluster)
Requires remote battery
enable/disable, initiate recharge
Number of batteries enabled
proportional to peak magnitude
Batteries used spatially
distributed
Overall Power
300 server
100 server
equivalent
equivalent
200 server 200 server
equivalent
equivalent
0 server
equivalent
Day1
Day2
rack1
Day3
rack2
Coordinated Policies
28
Pdu-level
Cluster-level
Power cap close to Average power (ideal) of 250W
Peak power reduction of 19% 23% more servers
6.2% TCO/server reduction
Discussion: Energy proportionality
29
Modern Servers
Sharper, thinner peaks
We can shave more power,
with same stored energy
Overall Power
Energy Proporional Servers
Day1
Day2
Day3
Peak power reduction of up to 37.5% with the 40Ah LFP battery
Concluding remarks
30
Battery provisioning of distributed UPS topologies to cap power
and oversubscribe data center is beneficial
Critical to reconsider battery properties
(technology, capacity, DoD)
Coordination of charges and discharges is required
We cap peak power by 19%, allow 23% more servers and
better amortize capex costs
Achieve 6.2% reduction in TCO/server ($15M -- 28k server DC)
31
BACKUP SLIDES
TCO savings with battery cost
32
LA is stable technology
LFP advancements expected, due to electric vehicles
TCO savings increase over time with LFP!
When things go wrong?
33
Scenario 1: Unexpected daily traffic
We use the additional 35% capacity in our
batteries (DoD optimized for TCO savings at 60%)
Scenario 2: Batteries are not replaced immediately
With 50% of batteries dead we can still reduce
peak by 15%
Grouping battery maintenance/replacement for cost
savings possible
Exploration of Dead Batteries
34
Discussion: DVFS
35
To DVFS or not DVFS?
Datacenter SLAs violations
likely during peak load
DVFS bad during high demand
Great during low demand
Creates higher margins for
aggressive battery capping
Overall Power
Potential SLA violation
WITH
No DVFS
SLA violation unlikely
Day1
Day2
Day3
Battery Capacity-Cost Estimation
36
E Datacenter,shaved =
Power
PeakReduction * PeakDuration
Peak
Duration
E shaved
Peak
Reduction
Time
E server,shaved
= E Datacenter,shaved* PSUEff
# servers
Cbattery
1
1
Eserver,shaved
PE-1
*I
*
*
=
DoD 0.8
V
Cbattery *CostperAh * # servers
UPSdepreciation =
Min(servicelife, DoD(cycles) / 30)
LFP
Lead Acid (~twice volume)
Battery Related Assumptions
37
Workload partitioning
38
Distributed Algorithm
39