The Only Constant is Change: Incorporating Time

advertisement
The Only Constant is Change:
Incorporating Time-Varying Bandwidth
Reservations in Data Centers
Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella
1
Cloud Computing is Hot
Private Cluster
2
Key Factors for Cloud Viability
• Cost
• Performance
3
Performance Variability in Cloud
• BW variation in cloud
due to contention
[Schad’10 VLDB]
Bandwidth (Mbps)
1000
900
800
700
600
• Causing unpredictable
performance
500
400
300
200
100
0
Local Cluster Amazon EC2
4
Reserving BW in Data Centers
• SecondNet [Guo’10]
– Per VM-pair, per VM access bandwidth
reservation
• Oktopus [Ballani’11]
– Virtual Cluster (VC)
– Virtual Oversubscribed Cluster (VOC)
5
How BW Reservation Works
Request
<N, B>
Only fixed-BW reservation
Bandwidth
B
Time
Virtual
Switch
0
T
...
N VMs
Virtual Cluster Model
1. Determine the model
2. Allocate and enforce the model
6
Network Usage for MapReduce Jobs
Time-varying network usage
Hadoop Sort, 4GB per VM
Hadoop Word Count, 2GB per VM
Hive Join, 6GB per VM
Hive Aggregation, 2GB per VM
7
Motivating Example
• 4 machines,
2 VMs/machine,
non-oversubscribed
network
– N: 4 VMs
– B: 500Mbps/VM
500Mbps
• Hadoop Sort
Not enough
BW
1Gbps
500Mbps
8
Motivating Example
• 4 machines,
2 VMs/machine,
non-oversubscribed
network
1Gbps
500Mbps
• Hadoop Sort
– N: 4 VMs
– B: 500Mbps/VM
9
Under Fixed-BW Reservation Model
1Gbps
Bandwidth
500
Job1
0
Job2
Job3
Time
500Mbps
5 10 15 20 25 30
Virtual Cluster Model
10
Under Time-Varying Reservation Model
Hadoop
Sort
1Gbps
Bandwidth
500
Job1Job2Job3Job4Job5
0
500Mbps
Time
5 10 15 20 25 30
TIVC Model
Doubling VM, network
utilization and the job
throughput
J5
J3
J1
J4
J2
11
Temporally-Interleaved Virtual Cluster
(TIVC)
• Key idea: Time-Varying BW Reservations
• Compared to fixed-BW reservation
– Improves utilization of data center
• Better network utilization
• Better VM utilization
– Increases cloud provider’s revenue
– Reduces cloud user’s cost
– Without sacrificing job performance
12
Challenges in Realizing TIVC
Q1: What are right model functions?
Q2: How to automatically derive the models?
Bandwidth
B
Time
Virtual
Switch
0
...
N VMs
T
Request
<N, B>
Bandwidth
B
Time
0
T
Request
<N, B(t)>
Virtual Cluster Model
13
Challenges in Realizing TIVC
Q3: How to efficiently
allocate TIVC?
Q4: How to enforce
TIVC?
14
Challenges in Realizing TIVC
• What are the right model functions?
• How to automatically derive the models?
• How to efficiently allocate TIVC?
• How to enforce TIVC?
15
Challenges in Realizing TIVC
• What are the right model functions?
• How to automatically derive the models?
• How to efficiently allocate TIVC?
• How to enforce TIVC?
16
How to Model Time-Varying BW?
Hadoop Hive Join
17
B
B
Bandwidth
Bandwidth
TIVC Models
Bb
Bb
T1
T2 Time
T
T11
0
T12 T21 Time T22 T31
T32 T
B
Bandwidth
0
Virtual Cluster
Bb
0
T11
T11T12 T21 Time T22 T31
T32T32 T
18
Hadoop Sort
19
Hadoop Word Count
v
20
Hadoop Hive Join
21
Hadoop Hive Aggregation
22
Challenges in Realizing TIVC
What are the right model functions?
• How to automatically derive the models?
• How to efficiently allocate TIVC?
• How to enforce TIVC?
23
Possible Approach
• “White-box” approach
– Given source code and data of cloud application,
analyze quantitative networking requirement
– Very difficult in practice
• Observation: Many jobs are repeated many times
– E.g., 40% jobs are recurring in Bing’s production data
center [Agarwal’12]
– Of course, data itself may change across runs, but size
remains about the same
24
Our Approach
• Solution: “Black-box” profiling based approach
1. Collect traffic trace from profiling run
2. Derive TIVC model from traffic trace
• Profiling: Same configuration as production runs
– Same number of VMs
– Same input data size per VM
– Same job/VM configuration
How much BW
should we give to
the application?
25
Impact of BW Capping
No-elongation BW
threshold
26
Choosing BW Cap
• Tradeoff between performance and cost
– Cap > threshold: same performance, costs more
– Cap < threshold: lower performance, may cost less
• Our Approach: Expose tradeoff to user
1. Profile under different BW caps
2. Expose run times and cost to user
3. User picks the appropriate BW cap
Only below
threshold ones
27
From Profiling to Model Generation
• Collect traffic trace from each VM
– Instantaneous throughput of 10ms bin
• Generate models for individual VMs
• Combine to obtain overall job’s TIVC model
– Simplify allocation by working with one model
– Does not lose efficiency since per-VM models are
roughly similar for MapReduce-like applications
28
Generate Model for Individual VM
1. Choose Bb
2. Periods where B > Bb, set to Bcap
BW
Bcap
Bb
Time
29
Maximal Efficiency Model
Application Traffic Volume
• Efficiency 
Reserved Bandwdith Volume
• Enumerate Bb to find the maximal efficiency
model
BW
Bcap
Bb
Time
30
Challenges in Realizing TIVC
What are the right model functions?
How to automatically derive the models?
• How to efficiently allocate TIVC?
• How to enforce TIVC?
31
TIVC Allocation Algorithm
• Spatio-temporal allocation algorithm
– Extends VC allocation algorithm to time dimension
– Employs dynamic programming
• Properties
– Locality aware
– Efficient and scalable
• 99th percentile 28ms on a 64,000-VM data center in
scheduling 5,000 jobs
32
Challenges in Realizing TIVC
What are the right model functions?
How to automatically derive the models?
How to efficiently allocate TIVC?
• How to enforce TIVC?
33
Enforcing TIVC Reservation
• Possible to enforce completely in hypervisor
– Does not have control over upper level links
– Requires online rate monitoring and feedback
– Increases hypervisor overhead and complexity
• Observation: Few jobs share a link simultaneously
– Most small jobs will fit into a rack
– Only a few large jobs cross the core
– In our simulations, < 26 jobs share a link in 64,000-VM
data center
34
Enforcing TIVC Reservation
• Enforcing BW reservation in switches
– Avoid complexity in hypervisors
– Can be implemented on commodity switches
• Cisco Nexus 7000 supports 16k policers
35
Challenges in Realizing TIVC
What are the right model functions?
How to automatically derive the models?
How to efficiently allocate TIVC?
How to enforce TIVC?
36
Proteus: Implementing TIVC Models
1. Determine the model
2. Allocate and enforce the model
37
Evaluation
• Large-scale simulation
– Performance
– Cost
– Allocation algorithm
• Prototype implementation
– Small-scale testbed
38
Simulation Setup
• 3-level tree topology
– 16,000 Hosts x 4 VMs
– 4:1 oversubscription
• Workload
– N: exponential
distribution
around mean 49
– B(t): derive
from real
Hadoop apps
50Gbps
20 Aggr Switch
…
10Gbps
20 ToR Switch
…
…
1Gbps
40 Hosts
…
…
…
…
39
Batched Jobs
• Scenario: 5,000 time-insensitive jobs
1/3 of
each type
Completion
time reduction
42%
21%
23%
35%
All rest results
are for mixed
40
Varying Oversubscription and Job Size
25.8% reduction for
non-oversubscribed
network
41
Dynamically Arriving Jobs
• Scenario: Accommodate users’ requests in
shared data center
– 5,000 jobs, Poisson arrival, varying load
Rejected:
VC: 9.5%
TIVC: 3.4%
42
Analysis: Higher Concurrency
• Under 80% load
Rejected jobs
are large
28% higher
VM utilization
Charge
VMs
28% higher
revenue
VM
7% higher job
concurrency
43
Tenant Cost and Provider Revenue
• Charging model
– VM time T and reserved BW volume B
– Cost = N (kv T + kb B)
Amazon target
utilization
– kv = 0.004$/hr, kb = 0.00016$/GB
12% less cost for
tenants
Providers make
more money
44
Testbed Experiment
• Setup
– 18 machines
– Tc and NetFPGA rate
limiter
• Real MapReduce jobs
• Procedure
– Offline profiling
– Online reservation
45
Testbed Result
TIVC finishes job faster than VC,
Baseline finishes the fastest
Baseline suffers elongation,
TIVC achieves similar
performance as VC
46
Conclusion
• Network reservations in cloud are important
– Previous work proposed fixed-BW reservations
– However, cloud apps exhibit time-varying BW usage
• We propose TIVC abstraction
–
–
–
–
Provides time-varying network reservations
Uses simple pulse functions
Automatically generates model
Efficiently allocates and enforces reservations
• Proteus shows TIVC benefits both cloud provider
and users significantly
47
Backup slides
48
Adding Cushions to Model
Without cushion
With 60s cushion
49
Network Utilization
VC reserves 26.4% abs.
more bandwidth
But less actual utilization
(8.9% vs. 20.1%)
50
BW Variability on Cloud
[Ballani’11]
51
Model Refinement
• Can we further reduced BW for low efficiency
pulses without elongation?
– This allows us potentially fit more jobs
Hadoop Hive Join
52
Model Refinement (cont.)
• If efficiency of a pulse < γ
lower the cap so that efficiency = α
• γ = 8%, α = 20%
53
Download