No Power Struggles:Coordinated multi

Ramya (UCSB), Parthasarathy et al (HP Labs)
Power delivery, consumption and
cooling problems in a data center are
being tackled currently by several
systems that address “separate” aspects
of these problems either locally/globally,
in hardware/software.
 When these systems are deployed
simultaneously, the policies of one tends
to interfere with the others
The lack of coordination amongst such
systems leads to undesirable
 This paper proposes a “Global Power
Management Solution” that coordinates
these individual solutions.
Classifying the existing power
management solutions..
Approach used: localized/distributed
resource management, VMs
 Power control : voltage scaling, power
states, turning off machines
 Implementation scope:
server/cluster/data center level
 Optimization requirements and
constraints: accept performance loss?,
allow power budget violation ?
In a nutshell..
“Tracking” problem – optimize power
consumption while delivering
 “Capping” problem – Optimize power
provisioning and cooling so as not to
violate the power budget.
 “Optimization” problem – maximize
power saving while minimizing
performance loss. (ACPIs, VMs, etc)
Representative Power
Management Solutions
Efficiency Controller (EC -tracking) –
optimize per server avg. power
consumption. Adjusts ACPI P- states
based on past resource usage to
manage “estimated” future demand.
 Server Manager (SM – capping) –
Reduce P-state of a server on violation
of Power budget.
Representative solutions..
Enclosure Manager (EM ) – thermal
power capping at blade level
 Group Manager (GM ) – at rack or data
center level
 These two monitor power usage on sets
of machines and re-provision power to
maintain group power budget
(determined manually or mandated by
higher level power managers)
Representative solutions..
Virtual Machine Controller (VMC) –
reduce average power usage across a
set of machines by workload
consolidation, turning of idling machines,
Power Struggles..
What happens if these solutions are deployed
simultaneously ?
Power Struggles - examples
EC and the SM both operate on the same
knob/actuator (P-state) but for different metrics. If
uncoordinated, the EC can potentially overwrite
the SM leading to power budget violations and
eventual thermal failover! – A correctness issue.
If the VMC and group cappers are
uncoordinated, the VMC can consolidate
more capacity onto a collection of servers
than allowed by the group power budget.
 In addition to excessive performance
violations (inefficiency), the VMC can
potentially react to the lower utilization
(because of power capping) and pack even
more workloads onto the server, leading to
a vicious cycle and system instability
Design Challenges of a
Coordination System
Interaction between different controllers
(EC, SM, EM, etc) must maintain
“correctness, stability and efficiency”.
 Global Awareness of the “presence” of
other controllers while having
minimal/zero knowledge of their
 Adaptability and Scalability – new
controllers with same/different
properties, new applications, etc.
Design Challenges - Sensitivity
Overlapping functionalities and policies
of controllers – can they be mitigated ?
 Is the Coordinated Management System
agnostic to the deployed systems and
applications (workloads) ?
The Design
The Design..
Use of feedback control loops.
Measure the required “metric”, compare
with the “reference” value and manipulate
the actuator based on the error so that the
output follows the reference.
 Efficiency Controller EC:
 Reference utilization rref
 Actual utilization ri
 If ri < rref adjust Actuator A (P-State) ie reduce
from say P0 to P4, resulting in higher utilization
and lower power usage.
 Server Manager SM:
 Power Capping by measuring per server
power consumption
 If current consumption exceeds “power
budget”, SM “INCREASES rref “ thereby
allowing the EC to reduce the P-State of the
 In effect, EC and SM use rref as
communication channel.
EM & GM:
 Same principle as SM. Compare current
power usage against ref. power budget and
assign new values to lower level servers
( EM ->SM, GM->EM) based on some
policy (FIFO, random, etc).
 The lower level servers pick the “minimum of
upper level recommendation and their own
local power budget”.
 Use Actual utilization instead of “apparent”
utilization (100% at P0 is not same as 100% at
 Supplied with data about approx power budget
at various levels.
 Also supplied with data about current power
budget violations at various levels (through CIM)
 The above three enable the VMCs to
consolidate right workloads and making sure
that the consolidated servers don’t violate the
power budgets nor fall into the vicious cycle
mentioned earlier.
Summary of changes to the
Modeling the Controllers
Power – Performance Model – run
actual workloads on hardware at
different utilization levels and measure
the power and performance.
 Through curve-fitting of the simulation
data, obtain linear models that represent
the controller behavior.
EC - scaled up or down by λ (changes
proportional to error in utilization).
 r_ref is increased by SM in case of power
budget violation cap_loc, resulting in EC
lowering the power states of the machines.
SM: manipulates r_ref of EC if its power
budget violates cap_loc , subject to a
cap determined by βloc factor.
EM & GM – operate on a fair share policy,
power allocated to a component is
proportional to power consumed in last
VMCs – Constrained Optimization
Problem to map n VMs to m servers
(decision variable matrix X).
 Include total power consumption and
migration overhead (αM ) in the calculation
Consider Server capacity constraints
Modeling VMCs..
Consider local, enclosure and group
level power budget constraints
The level of consolidation is tuned by tuning
the power budget buffers based on the
violations at different levels.
Modeling VMCs..
Equations 1 to 6 depict a 0-1 integer
optimization problem.
 The authors use a greedy bin packing
algorithm that yields an approximate
optimal solution for the placement of
 Real time deployment in Data Center or a
full-system simulation ?
○ Impractical, limits the set of use case
scenarios that can be studied due to the
actual system being tested
 Use of trace-driven simulation – use real
world traces of enterprise deployments that
would enable detailed workload modeling
and evaluation of tradeoffs at policy and
system levels. -?
Metrics used
Aggregate Power Saving, performance
loss and power budget violation at SM,
EM and GM levels.
 No peak power saving is measured.
 No workload queuing i.e. if workload
exceeds capacity, there is performance
loss due to power capping. No demand
carry over.
180 workload traces (databases, web
servers, remote desktops, e-commerce,
 Create different types of mixes (real & synthetic)
from this set to exercise different utilization
SUT – A low power Blade server A and an
entry level 2U server B.
 Experiment with different power budgets
and also study the sensitivity of this
architecture by varying the time constants.
Power – Performance models for
Blade A and Server B
Baseline: No power management
Base Results:
 Coordinated – 64% reduction in power
consumption, 3% performance degradation
and 5% power budget violation
 Uncoordinated – 12 % performance loss and
7% budget violation.
Sensitivity towards different Systems:
 Blade A - 5 p-states over higher power range
 Server B - 6 p-states over low power range.
 Blade A’s absolute power saving > Server B.
○ Implies, “Range of Power control is more
important than its granularity”
Variation for different workloads
At low utilization, VMC is major contributor to
savings (assuming idle machines are “turned off”).
As utilization increases, benefits of VMC decrease
while the combination of EC & VMC is better (i.e. a
Coordinated Solution is better than a single one).
If idle m/c are not switched off, savings drop