Safety-Critical Systems

advertisement
Power and Performance Management of
Virtualized Computing Environments via
Lookahead Control
1
2
2
Dara Kusic , Jeffrey O. Kephart , James E. Hanson ,
1
3
Nagarajan Kandasamy , and Guofei Jiang
1- Drexel University, Philadelphia, PA 19104
2- IBM T.J. Watson Research Center, Hawthorne, NY 10532
3- NEC Labs America, Princeton, NJ 08540
Presented by Tongping Liu
1/48
OUTLINE
Motivation
and problem statement
Description
Problem
of the experimental testbed
formulation and controller design
Performance
Conclusions
2/48
results
DATA-CENTER ENERGY COSTS
Server
energy
consumption is growing
at 9% per year
Data
centers are
projected to surpass
the airline industry in
CO2 emissions by 2020
McKinsey & Co. Report:
http://uptimeinstitute.org/content/view/168/57
3/48
Carbon dioxide emissions as percentage of
world total – industries
0.3
Data
centers
0.6
Airlines
0.8
1.0
Shipyards Steel
plants
Carbon emissions by countries
(Metric tons of CO2 per year)
170
Data
centers
142
146
Argentina Netherlands
178
Malaysia
SERVER UTILIZATION IN DATA CENTERS
Server
utilization
averages about 6%,
accounting for idle
servers
Peak daily server utilization (%)
100
100
90
90
80
80
70
70
60
60
50
50
40
40
30
Up to 30% of
servers are idle!
20
30
20
10
10
0
0
McKinsey & Co. Report:
http://uptimeinstitute.org/content/view/168/57
4/48
10
20
30
40
50
90
100
Average daily server utilization (%)
VIRTUALIZATION AS THE ANSWER
Performance-isolated platforms, called
virtual machines (VMs), allow
resources (e.g., CPU, memory) to be
shared on a single server
 Enables consolidation of online
services onto fewer servers

Technique
Selectively turn off core
components to increase
remaining unit efficiency
Efficiency
Impact
• 3-5%

Increases per-server utilization and
mitigates “server sprawl”
Deploy virtualization for
existing and new demand
• 25-30%

Enables on-demand computing, a
provisioning model where resources
are dynamically provisioned as per
workload demand
Implement free cooling
• 0-15%
Introduce greener and
more power efficient
servers
• 10-20%
McKinsey & Co. Report: http://uptimeinstitute.org/content/view/168/57
5/48
PROBLEM STATEMENT
 We
address combined power and performance management in
a virtualized computing environment
– The problem is posed as one of sequential optimization under uncertainty
and solved using limited look-ahead control (LLC)
– The notion of risk is encoded explicitly in the problem formulation
 Summary
of main results
– A server cluster managed using LLC saves 26% in power-consumption
costs over a 24 hour period when compared to an uncontrolled system
– Power savings are achieved with very few SLA violations (1.6% of the
total number of requests)
6/48
OUTLINE
Motivation
and problem statement
Description
Problem
of the experimental testbed
formulation and controller design
Performance
Conclusions
7/48
results
THE EXPERIMENTAL TESTBED
Workload Arrivals
LLC
Controllable Parameters:
host on/off: N(k), VM on/off: ni(k),
Workload fraction:  (k), CPU share f(k)
Bacchus
1 (k ), 2 (k )
Dispatcher

We target the application and
the database tiers
8/48
OS
OS OS
OS
f11(k) f21(k)
f12(k) f13(k) f22(k)
Chronos
Demeter
Silver
WebSphere
Silver
WebSphere
Gold
Gold
WebSphere
 1{1.. n}1 (k ),  2{1.. n}1 (k )
WebSphere
Silver
WebSphere
Gold
WebSphere
OS
OS
f2n(k)
Sleep
Eros
OS OS
Apollo
Poseidon
Operational Hosts
Eros
DB2
Silver
OS
DB2
Gold
OS
DB2
Silver
OS
DB2
Gold
OS
DB2
Silver
Database Tier
DB2
– Minimize power consumption
– Minimize SLA violations
Application Tier
Gold
The testbed is a two-tier
architecture with front-end
application servers and backend databases
 It hosts two online services
(Gold and Silver)
 Servers are virtualized
 Performance goals

Sleep
Powered-down
Hosts
EXPERIMENTAL SYSTEM
Six Dell servers (models 2950 and 1950) comprise the experimental testbed
 Virtualization of the CPU and memory is enabled by VMware ESX Server 3.0
 Virtual machines run SUSE Enterprise Linux Server Edition 10
 Control directives use the VMware API, Linux shell commands, and IPMI
 Silver application is Trade6 only; Gold application is Trade6 + extra CPU load

Host name
CPU speed
# of CPU cores
Memory
Apollo
2.3 GHz
8
8 GB
Bacchus
2.3 GHz
2
8 GB
Chronos
1.6 GHz
8
4 GB
Demeter
1.6 GHz
8
4 GB
Eros
1.6 GHz
8
4 GB
Poseidon
2.3 GHz
8
8 GB
9/48
CHARACTERISTICS OF THE INCOMING
WORKLOAD
3
x 10
4
Arrivals per Time Instance, Workload 1
2.5
We assume a session-less workload,
i.e., incoming requests are independent
of each other
 The transaction mixed is fixed to a
constant proportion of browse/buy
requests

Number of arrivals
Silver
2
1.5
1
Gold
0.5
0
0
 The
100
200
300
400
Time instance in 150 second increments
500
workload to the computing system is time varying and shows
significant variability over short time periods
10/48
APPLICATION ENVIRONMENT
Online
services are
enabled by enterprise
applications
Application server
Trade6 is an example
–It is transaction-based stock
trading application from IBM
–It can be hosted across one or
more servers in a multi-tier
architecture
Trade
Servlets
Web Clients
Trade
Action
Trade
Services
DB2
Database
Trade
Server Pages
WebSphere Application Server
11/48
Database
OUTLINE
Motivation
and problem statement
Description
Problem
of the experimental testbed
formulation and controller design
Performance
Conclusions
12/48
results
PROBLEM FORMULATION
 The
power/performance management problem is posed as a
dynamic resource provisioning problem under dynamic
operating constraints
 Objectives
– Maximize the profit generated by the system (i.e., minimize SLA
violations and the power consumption cost)
 Decisions
to be optimized
– Number of servers to turn on or off
– Number of VMs to provision to each service
– The CPU share given to each VM
– Distribute incoming workload to different servers
13/48
PROBLEM FORMULATION (Contd.)
Maximize
u
Dollars
[ R( x(k ), u(k ))  O(u(k ))  S (u(k ))]
generated (Revenue)
–Obtained as per a (nonlinear)
reward-refund curve specified by
the SLA
k
A stepwise pricing SLA for the online services
Gold SLA
7e-5
Violation
Silver SLA
of SLA results in a
refund to client
5e-5
Revenue, dollars
Reward
Refund
3e-5
Reward is defined by SLA for
each service class
1e-5
0
-1e-5
-3e-5
0
14/48
100
200
300
Response Time, ms
400
PROBLEM FORMULATION (Contd.)
Maximize
u
[ R( x(k ), u(k ))  O(u(k ))  S (u(k ))]
k
Power
consumption
cost
costs
of operating servers
Switching
 Key
characteristics of the control problem
– Opportunity cost lost due
to the unavailability of
servers/VMs involved in
provisioning decisions
– Transient power
consumption costs
– Some control actions have (long) dead times; e.g., switching on a server,
instantiating VMs, migrating VMs
– Decisions must be optimized over a discrete domain
– Optimization must be performed quickly, given the dynamics of input
 We
15/48
use a limited look-ahead control (LLC) concept
THE LLC FRAMEWORK
Workload
arrival
 LLC
is same as model
predictive control, but in a
discrete domain and quickly
Predictive
filter
Workload
forecast
Estimated
state
Observed
state
System
 Advantages
System
model
Control
input to
evaluate
Optimizer
Control
input
– Use predictions to improve control performance
– Robust (iterative feedback) even in dynamic operating conditions
– Inherent compensation for dead times
– Multi-objective and non-linear optimization in the discrete domain under
explicit constraints
16/48
THE LLC FRAMEWORK (Contd.)
k+1
 Use
a system model to
estimate future system
states over a prediction
horizon
 Obtain
an “optimal”
sequence of control
inputs
 Apply
k+3
k+4
x(k)
the first control input in
the sequence at time k +1;
discard the rest
17/48
k+2
xˆ (k  1)
xˆ (k  2)
Prediction horizon :
[k  1, k  h]
xˆ (k  3)
xˆ (k  h)
WORKLOAD ESTIMATION USING A PREDICTIVE
FILTER
Workload
arrival
Predictive
filter
Workload
forecast
Kalman Filter Workload Estimates, Workload 1
4
3
System
model
x 10
Kalman estimate: dotted line
Actual value: solid line
Estimated
state
Observed
state
System
Optimizer
Control
input
Number of arrivals
2.5
Control
input to
evaluate
Training
phase
2
Silver
Prediction error is
about 8%
1.5
1
Gold
0.5
0
0
50
100
150
200
250
300
350
400
450
Time instance in 150 second increments
 A Kalman
filter is used to estimate the workload over the
prediction horizon
18/48
500
CONSTRUCTING THE SYSTEM MODEL
Measured Average Response Time for the Gold Application
800
Workload
arrival
Predictive
filter
Workload
forecast
System
model
700
Estimated
state
Observed
state
System
Control
input to
evaluate
Optimizer
Control
input
Response time in ms
600
4.5 GHz CPU Share VM
500
3 GHz CPU Share VM
6GHz CPU Share VM
400
300
Gold SLA
200
100
0
20
25
30
Workload in arrivals per second
35
System model will base on observed state, control input and estimated
workload to create new state.
 The behavior of each application is captured using simulation-based learning
and stored in an approximation structure (e.g., lookup table, neural network)
– OFFLINE mode
19/48

CONSTRUCTING THE SYSTEM MODEL
Measured Average Response Time for the Gold Application
800
Workload
arrival
Predictive
filter
Workload
forecast
System
model
700
Estimated
state
Observed
state
System
Control
input to
evaluate
Optimizer
Control
input
Response time in ms
600
4.5 GHz CPU Share VM
500
3 GHz CPU Share VM
6GHz CPU Share VM
400
300
Gold SLA
200
100
0
20
25
30
Workload in arrivals per second
35
Example 1: Given a 3 GHz CPU share and 1 GB of memory, how many
requests can a Gold VM handle before incurring SLA violations?
 Average response time is below the limit doesn’t mean that no violations.

20/48
CONSTRUCTING THE SYSTEM MODEL
Measured Average Response Time for the Gold Application
800
Workload
arrival
Predictive
filter
Workload
forecast
System
model
700
Estimated
state
Observed
state
System
Control
input to
evaluate
Optimizer
Control
input
Response time in ms
600
4.5 GHz CPU Share VM
500
3 GHz CPU Share VM
6GHz CPU Share VM
400
300
Gold SLA
200
100
0

20
25
30
Workload in arrivals per second
35
Example 2: Given a 6 GHz CPU share and 1 GB of memory, how many
requests can a Gold VM handle before incurring SLA violations?
21/48
Observation – non-linear behavior
Measured Average Response Time for the Gold Application
800
700
Response time in ms
600
4.5 GHz CPU Share VM
500
3 GHz CPU Share VM
6GHz CPU Share VM
400
300
Gold SLA
200
100
0
20
25
30
Workload in arrivals per second
35

3G: 22 requests, 6G: 29 requests, why we can’t achieve 2 speedup if
CPU share is 2 times??

Memory or IO is not considered????
22/48
CONSTRUCTING THE SYSTEM MODEL (Contd.)
Power Consumption per Host Machine State
350
Predictive
filter
Workload
forecast
Estimated
state
Observed
state
System
Control
input to
evaluate
Optimizer
Control
input
Dell PowerEdge 2950
300
System
model
Power consumed in Watts
Workload
arrival
250
200
Host states:
Dell PowerEdge 1950
1 - Standby
2 - Boot, 1st 20 sec.
3 - Boot, remaining 2 min. 35 sec.
4 - Idle, 0 vms
5 - Boot 1st vm
6 - Idle, 1vm
7 - Boot 2nd vm
8 - Idle, 2 vms
9 - Workload on 1 vm
10 - Workload on 2 vms
150
100
50
0
1
2
3
4
5
6
7
Host state
Power = current * voltage
 Two observations:

– Power consumption of boot time is larger than idle state.
– Power consumption having VMs is not greatly larger than idle state.
23/48
8
9
10
CONSTRUCTING THE SYSTEM MODEL
(Contd.) - Power consumptions

Power consumption is closely related to CPU usage.
24/48
Does increase of CPU utilization increase
Computer Power consumption?
The
more utilization of the CPU, the more
signals generated and processed by it.
Consequently, the more utilization of the CPU,
the greater the energy requirement.
Power = energy x time.
So we can than conclude that the greater the
CPU utilization the greater the power
consumption.
25/48
Key Observations
(1) Idle machine consumes 70% or more
power of full utilization

Conclusion (1): Power down machine to achieve
maximum power savings.
 (2)
Intensity of the workload at VMs doesnot
affect power consumption and cpu utilization.
Conclusion (2): Only the number of VMs can
affect power consumption.
 (3)
Power consumed by a server is a function
of instantiated VMs on it.
26/48
EXPERIMENTAL SYSTEM
Six Dell servers (models 2950 and 1950) comprise the experimental testbed
 Virtualization of the CPU and memory is enabled by VMware ESX Server 3.0
 Virtual machines run SUSE Enterprise Linux Server Edition 10
 Control directives use the VMware API, Linux shell commands, and IPMI
 Silver application is Trade6 only; Gold application is Trade6 + extra CPU load

Host name
CPU speed
# of CPU cores
Memory
Apollo
2.3 GHz
8
8 GB
Bacchus
2.3 GHz
8
8 GB
Chronos
1.6 GHz
8
4 GB
Demeter
1.6 GHz
8
4 GB
Eros
1.6 GHz
8
4 GB
Poseidon
2.3 GHz
8
8 GB
27/48
CPU scheduling mode
 work-conservative
mode (WC-mode): in order to keep the
server resources well utilized
– Under WC-mode, the shares are merely guarantees, and CPU is idle if and
only if there is no runnable work.
 Non-work-conservative:
– With NWC-mode, the shares are caps, i.e., each client owns its fraction
of the CPU.
– It means that if one VM is assigned to 3G HZ cpu, this VM cannot use more than this even if the
system is 10G HZ and no other VM at all.

Assumption:
– Esx server is worked on non-work-conservative mode
– Cpu assignment is not larger than maximum limit of hardware.
28/48
DEVELOPING THE OPTIMIZER
Workload
arrival
Predictive
filter
Workload
forecast
System
model
Estimated
state
Observed
state
System
Control
input to
evaluate
Optimizer
Control
input
 Issue 1: Risk-aware control
– Due to the energy and opportunity costs
incurred when switching hosts and VMs
on/off, excessive switching caused by
workload variability may actually reduce
profits
– We need to encode a notion of risk in the
cost function
Cost that accumulates during the time a server is being turned on
but is unavailable to perform any useful service
29/48
RISK-AWARE CONTROL


Environment-input estimates will have prediction errors
We encode a notion of risk in the optimization problem
– Generate a set of expected next states for lots of the predicted environment inputs
Estimated
environment
input
Construct an uncertainty bound for
environment input of interest:
ˆ ( j)   ( j)  ˆ ( j)  ˆ ( j)   ( j)
i
i
i
i
i
Averaged past observed error between
actual and forecasted arrival rate
30/48
RISK-AWARE CONTROL (Contd.)
A utility
function encodes risk into the objective function
Apply a mean-square variance model of utility






2

ˆ
ˆ
ˆ
URi   A  mean X i ( j )     var X i ( j )  mean X i ( j )  i


Utility model with tunable risk-preference parameter, β
A > 2*Mean
Formulate a utility maximization problem
 U Xˆ ( j), u ( j)
k h
max
{u}
2
j  k 1 i i
i
i
i
Maximize utility over horizon and client classes
31/48
Uncertainty as variance
Tunable risk-preference
parameter, β
β < 0 : risk-seeking
β > 0 : risk-averse
β = 0 : risk-neutral
DEVELOPING THE OPTIMIZER (Contd.)
Workload
arrival
Predictive
filter
Workload
forecast
System
model
Estimated
state
Observed
state
System
Optimizer
Control
input
 We
Control
input to
evaluate
 Issue
2: Execution-time overhead
of the controller
– “Curse of dimensionality” - Problem will
show an exponential increase in
worst-case complexity with more
control options and longer prediction
horizons
use a control hierarchy to reduce execution-time overhead
– An L0 controller decides the CPU share to assign to VMs
– An L1 controller decides the number of VMs for each service and the
number of servers to keep powered on
– The average execution time of the L1 controller is about 10 seconds
32/48
OUTLINE
Motivation
and problem statement
Description
Problem
of the experimental testbed
formulation and controller design
Performance
Conclusions
33/48
results
EXPERIMENTAL SYSTEM
Six Dell servers (models 2950 and 1950) comprise the experimental testbed
 Virtualization of the CPU and memory is enabled by VMware ESX Server 3.0
 Virtual machines run SUSE Enterprise Linux Server Edition 10
 Control directives use the VMware API, Linux shell commands, and IPMI
 Silver application is Trade6 only; Gold application is Trade6 + extra CPU load

Host name
CPU speed
# of CPU cores
Memory
Apollo
2.3 GHz
8
8 GB
Bacchus
2.3 GHz
8
8 GB
Chronos
1.6 GHz
8
4 GB
Demeter
1.6 GHz
8
4 GB
Eros
1.6 GHz
8
4 GB
Poseidon
2.3 GHz
8
8 GB
34/48
EXPERIMENTAL PARAMETERS
Parameter
Cost per KiloWatt hour
Value
$0.3
Time delay to power on a VM
1 min. 45 sec.
Time delay to power on a host
2 min. 55 sec.
Prediction horizon
L1: 3 steps, L0: 1 step
Control sampling period
L1: 150 sec, L0: 30 sec
Initial configuration for Gold service (application tier)
3 VMs
Initial configuration for Silver Service (application tier)
3 VMs
35/48
MAIN RESULTS
 A risk-neutral
controller conserves, on average, 26% more
energy than a system without dynamic control with very few
SLA violations
Energy savings
% of SLA
violations (Silver)
% of SLA
violations (Gold)
Workload 1
18%
3.2%
2.3%
Workload 2
17%
1.2%
0.5%
Workload 3
17%
1.4%
0.4%
Workload 4
45%
1.1%
0.2%
Workload 5
32%
3.5%
1.8%
Workload

More SLA violations for Silver requests than for Gold requests.
36/48
RESULTS (Contd.)
 CPU
shares assigned to the Gold and Silver applications over
a 24-hour period – L0 layer
x 10
4
Total CPU Speed, Gold Application , Workload 1
1.8
x 10
4
Total CPU Speed, Silver Application, Workload 1
1.8
1.6
Total CPU cycles per second
Total CPU cycles per second
1.6
1.4
1.2
1
0.8
1.4
1.2
1
0.8
0.6
0.6
0.4
0.4
0
37/48
500
1000
1500
2000
Time instance in 30 second increments
2500
0
500
1000
1500
2000
Time instance in 30 second increments
2500
RESULTS (Contd.)
 Number
of virtual machines assigned to the Gold and Silver
applications over a 24-hour period – L1 layer
Switching Activity, Silver Application, Workload 1
Number of virtual machines on
Number of virtual machines on
Switching Activity, Gold Application, Workload 1
3
2
1
0
38/48
100
200
300
400
Time instance in 150 second increments
500
3
2
1
0
100
200
300
400
Time instance in 150 second increments
500
EFFECT OF THE RISK PREFERENCE
PARAMETER
( = 2) controller conserves about the same
amount of energy as a risk-neutral ( = 0) controller
 A risk-averse
Energy savings (riskneutral control)
( = 0)
Energy savings (riskaverse control)
( = 2)
Workload 6
20.8 %
20.9 %
Workload 7
25.3 %
25.2 %
Workload
39/48
EFFECT OF THE RISK PREFERENCE
PARAMETER (Contd.)

A risk-averse controller ( = 2) maintains a higher QoS (Less violations)
than a risk-neutral ( = 0) controller by reducing switching activity
SLA violations (riskneutral control)
( = 0)
SLA violations (riskaverse control)
( = 2)
% reduction in SLA
violations
Workload 6
28,635 (2.3%)
15,672 (1.7%)
45%
Workload 7
34,201 (2.7%)
25,606 (2.0%)
25%
Switching activity
(risk-neutral control)
( = 0)
Switching activity
(risk-averse control)
( = 2)
% reduction in
switching activity
Workload 6
30
28
7%
Workload 7
40
30
25%
Workload
Workload
Best-case risk-averse controller:   2
40/48
OPTIMALITY CONSIDERATIONS
 The
controller cannot achieve optimal performance
– Limited by errors in workload predictions
– Limited by constrained control inputs
– Limited by a finite prediction horizon
 To
evaluate optimality, profit gains of a risk-neutral and best-case
risk-averse controller were compared against an “oracle” controller
with perfect knowledge of the future
Controller
Total Energy
Savings
Total SLA violations
Num. times
hosts switched
Risk neutral
25.3%
34,201 (2.7%)
40
Risk averse
25.2%
25,606 (2.0%)
38
Oracle
16.3%
14,228 (1.1%)
32
41/48
CONCLUSIONS
 We
have addressed power and performance management in a
virtualized computing environment within a LLC framework
 The cost of control and the notion of risk is encoded explicitly in
the problem formulation
 A server cluster managed using LLC saves 26% in powerconsumption costs over a 24 hour period when compared to
an uncontrolled system
 Power savings are achieved with very few SLA violations (1.6%
of the total number of requests)
 Our recommendation is a risk-averse controller since it reduces
SLA violations and switching activity
42/48
Conclusion (1) – Why significant?
 Why
significant?
– Using virtualization, implement a dynamic
resource provisioning model
– Integrate power and performance
management, reduce energy cost (26%)
while causing little SLA( service level
agreement) violation (less than 3%)
43/48
Conclusion(2) - Alternate approach?
 Alternate
approach?
Technique
Selectively turn off core
components to increase
remaining unit efficiency
44/48
Efficiency
Impact
• 3-5%
Deploy virtualization for
existing and new demand
• 25-30%
Implement free cooling
• 0-15%
Introduce greener and
more power efficient
servers
• 10-20%
Conclusion (3) – Improvement?
 Simplify
the control logic to reduce the exe.
time
 Integrate the memory usage when modify VM
configuration
 Provide a mechanism to decide the
granularity to create VMs – One 6G VM can
handle more requests than that of two 3G
VMs.
45/48
SCALABILITY
Execution
time of the controller can be reduced
through various techniques
– Approximating control
– Implementing the controller in hardware
– Increasing the number of tiers in the control hierarchy
– Simplifying the iterative search process to “hold” a control
input constant over the prediction horizon
46/48
Predictive
filter
System
model
System
Optimizer

A neural network or
regression tree can
be trained to learn
the decision-making
behavior of the
optimizer
Scalability problem
Scalability
is not good, current result is based
on 5 hosts only. But there can be dozens or
thousands of servers in actual data center.
» 5 hosts - < 10 sec
»10 hosts - 2min. 30 sec
»15 hosts – 30 min.
47/48
Questions?
Thank you!
48/48
Download