Di Niu , Chen Feng, Baochun Li
Department of Electrical and Computer Engineering
University of Toronto
1
Part 1 A cloud bandwidth reservation
Part 2 Price such reservations
Large-scale distributed optimization
Part 3 Trace-driven simulations
2
Cloud Tenants
WWW
Problem: No bandwidth guarantee
Not good for Video-on-Demand, transaction processing web applications, etc.
3
0
10 Gbps Dedicated Network
1 2
Demand
Days
4
Good News:
Bandwidth reservations are becoming feasible between a VM and the Internet
H. Ballani, et al.
Towards Predictable Datacenter Networks
ACM SIGCOMM ‘11
C. Guo, et al.
SecondNet: a Data Center Network Virtualization
Architecture with Bandwidth Guarantees
ACM CoNEXT ‘10
5
reduces cost due to better utilization
Reservation
Demand
Days 0 1 2
Difficulty : tenants don’t really know their demand!
6
A New Bandwidth Reservation Service
A tenant specifies a percentage of its bandwidth demand to be served with guaranteed performance;
The remaining demand will be served with best effort
QoS (e.g., 95%)
Level Guaranteed
Portion
Tenant
Cloud
Provider
Workload history of the tenant
Demand
Prediction repeated periodically
Bandwidth Reservation
7
Each tenant i has a random demand D i
Assume D i is Gaussian , with mean μ i
= E [ D i
] variance σ i
2
= var [ D i
] covariance matrix Σ = [ σ ij
]
Service Level Agreement: Outage w.p.
8
Part 1 A cloud bandwidth reservation model
Part 2 Price such reservations
Large-scale distributed optimization
Part 3 Trace-driven simulations
9
Objective 1: Pricing the reservations
A reservation fee on top of the usage fee
Objective 2: Resource Allocation
Price affects demand, which affects price in turn
Social Welfare Maximization
10
(e.g., Netflix)
Tenant i can specify a guaranteed portion w i
Tenant i ’s expected utility (revenue)
Concave, twice differentiable, increasing
Utility depends not only on demand, but also on the guaranteed portion!
11
Given submitted guaranteed portions the cloud will guarantee the demands
It needs to reserve a total bandwidth capacity
Non-multiplexing :
Multiplexing : e.g.
Service cost
12
Price
Social
Welfare
Profit of the
Surplus of tenant i
Cloud
Provider
Impossible: the cloud does not know U i
13
Pricing function
Price guaranteed portion, not absolute bandwidth!
Example: Linear pricing
Under P i
( ⋅ ) , tenant i will choose
Surplus
(Profit)
14
Determine pricing policy to
Social Welfare where
Surplus
Challenge:
Cost not decomposable for multiplexing
15
Determine pricing policy to where
Since
Mean
, for Gaussian
Std
The General Case:
Lagrange Dual Decomposition
M. Chiang, S. Low, A. Calderbank, J. Doyle.
Layering as optimization decomposition: A mathematical theory of network architectures.
Proc. of
IEEE 2007
Original problem
Lagrange dual
Dual problem
17
Lagrange dual
Dual problem
Lagrange multiplier k i as price: P i
( w i
) := k i w i decompose
Subgradient Algorithm:
For dual minimization, update price: a subgradient of
18
Weakness of the Subgradient Method
4
Update
Social Welfare (SW) to increase
Cloud Provider
3 Guaranteed Portion Price
Tenant 1 . . .
Tenant i . . .
Tenant N
2
Surplus
Step size is a issue! Convergence is slow.
1
19
KKT Conditions of
2 Set
Cloud Provider
4
1
. . .
Tenant i . . .
Solve
3
Linear pricing P i
( w i
) = k i w i suffices!
20
Theorem 1 (Convergence)
Equation updates converge if for all i for all between and
21
Convergence: A Single Tenant (1-D)
Subgradient method
Equation Updates
Not converging
22
Covariance matrix: symmetric, positive semi-definite is a cone centered at 0 if is not zero and is small
Satisfies Theorem 1, algorithm converges.
23
Part 1 A cloud bandwidth reservation model
Part 2 Price such reservations
Large-scale distributed optimization
Part 3 Trace-driven simulations
24
200+ GB traces (binary) from UUSee Inc.
reports from online users every 10 minutes
Aggregate into video channels
25
Predict Expected Demand via Seasonal ARIMA
Time periods (1 period = 10 minutes)
26
Predict Demand Variation via GARCH
Time periods (1 period = 10 minutes)
27
Each tenant i has a random demand D i in each “10 minutes”
D i is Gaussian , with mean μ i
= E [ D i
] variance σ i
2
= var [ D i
] covariance matrix Σ = [ σ ij
]
28
A channel’s demand = weighted sum of factors
Find factors using Principal Component
Analysis (PCA)
Predict factors first, then each channel
29
3 Biggest Channels of 452 Channels
Time periods (1 period = 10 minutes)
30
The First 3 Principal Components
Time periods (1 period = 10 minutes)
31
Data Variance Explained
98%
8 components
Complexity Reduction:
452 channels 8 components
Number of principal components
32
Usage of tenant i : w.h.p.
Utility of tenant i (conservative estimate)
Linear revenue
Reputation loss for demand not guaranteed
33
CDF
Mean = 6 rounds
Mean = 158 rounds
Convergence Iteration of the Last Tenant
100 tenants (channels), 81 time periods ( 81 x 10
Minutes)
34
Primal/Dual Decomposition [Chiang et al.
07]
Contraction Mapping x := T ( x )
D. P. Bertsekas, J. Tsitsiklis, "Parallel and distributed computation: numerical methods"
Game Theory [Kelly 97]
Each user submits a price (bid), expects a payoff
Equilibrium may or may not be social optimal
Time Series Prediction
HMM [Silva 12], PCA [Gürsun 11], ARIMA [Niu
11]
35
A cloud bandwidth reservation model based on guaranteed portions
Pricing for social welfare maximization
Future work: new decomposition and iterative methods for very large-scale distributed optimization more general convergence conditions
36
Department of Electrical and Computer Engineering
University of Toronto http://iqua.ece.toronto.edu/~dniu
37
38
Root mean squared errors (RMSEs) over 1.25 days
Channel Index
39
i
1
Without multiplexing,
With multiplexing,
Correlation to the market, in [-1, 1]
Expected
Demand
Demand
Standard Deviation
40
Histogram of Price Discounts due to Multiplexing
Majority mean discount 44% total cost saving 35%
Risk neutralizers
Discounts of All Tenants in All Test Periods
41
Video Channel: F190E
Time periods (one period = 10 minutes)
42