The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1 Cloud Computing is Hot Private Cluster 2 Key Factors for Cloud Viability • Cost • Performance 3 Performance Variability in Cloud • BW variation in cloud due to contention [Schad’10 VLDB] Bandwidth (Mbps) 1000 900 800 700 600 • Causing unpredictable performance 500 400 300 200 100 0 Local Cluster Amazon EC2 4 Reserving BW in Data Centers • SecondNet [Guo’10] – Per VM-pair, per VM access bandwidth reservation • Oktopus [Ballani’11] – Virtual Cluster (VC) – Virtual Oversubscribed Cluster (VOC) 5 How BW Reservation Works Request <N, B> Only fixed-BW reservation Bandwidth B Time Virtual Switch 0 T ... N VMs Virtual Cluster Model 1. Determine the model 2. Allocate and enforce the model 6 Network Usage for MapReduce Jobs Time-varying network usage Hadoop Sort, 4GB per VM Hadoop Word Count, 2GB per VM Hive Join, 6GB per VM Hive Aggregation, 2GB per VM 7 Motivating Example • 4 machines, 2 VMs/machine, non-oversubscribed network – N: 4 VMs – B: 500Mbps/VM 500Mbps • Hadoop Sort Not enough BW 1Gbps 500Mbps 8 Motivating Example • 4 machines, 2 VMs/machine, non-oversubscribed network 1Gbps 500Mbps • Hadoop Sort – N: 4 VMs – B: 500Mbps/VM 9 Under Fixed-BW Reservation Model 1Gbps Bandwidth 500 Job1 0 Job2 Job3 Time 500Mbps 5 10 15 20 25 30 Virtual Cluster Model 10 Under Time-Varying Reservation Model Hadoop Sort 1Gbps Bandwidth 500 Job1Job2Job3Job4Job5 0 500Mbps Time 5 10 15 20 25 30 TIVC Model Doubling VM, network utilization and the job throughput J5 J3 J1 J4 J2 11 Temporally-Interleaved Virtual Cluster (TIVC) • Key idea: Time-Varying BW Reservations • Compared to fixed-BW reservation – Improves utilization of data center • Better network utilization • Better VM utilization – Increases cloud provider’s revenue – Reduces cloud user’s cost – Without sacrificing job performance 12 Challenges in Realizing TIVC Q1: What are right model functions? Q2: How to automatically derive the models? Bandwidth B Time Virtual Switch 0 ... N VMs T Request <N, B> Bandwidth B Time 0 T Request <N, B(t)> Virtual Cluster Model 13 Challenges in Realizing TIVC Q3: How to efficiently allocate TIVC? Q4: How to enforce TIVC? 14 Challenges in Realizing TIVC • What are the right model functions? • How to automatically derive the models? • How to efficiently allocate TIVC? • How to enforce TIVC? 15 Challenges in Realizing TIVC • What are the right model functions? • How to automatically derive the models? • How to efficiently allocate TIVC? • How to enforce TIVC? 16 How to Model Time-Varying BW? Hadoop Hive Join 17 B B Bandwidth Bandwidth TIVC Models Bb Bb T1 T2 Time T T11 0 T12 T21 Time T22 T31 T32 T B Bandwidth 0 Virtual Cluster Bb 0 T11 T11T12 T21 Time T22 T31 T32T32 T 18 Hadoop Sort 19 Hadoop Word Count v 20 Hadoop Hive Join 21 Hadoop Hive Aggregation 22 Challenges in Realizing TIVC What are the right model functions? • How to automatically derive the models? • How to efficiently allocate TIVC? • How to enforce TIVC? 23 Possible Approach • “White-box” approach – Given source code and data of cloud application, analyze quantitative networking requirement – Very difficult in practice • Observation: Many jobs are repeated many times – E.g., 40% jobs are recurring in Bing’s production data center [Agarwal’12] – Of course, data itself may change across runs, but size remains about the same 24 Our Approach • Solution: “Black-box” profiling based approach 1. Collect traffic trace from profiling run 2. Derive TIVC model from traffic trace • Profiling: Same configuration as production runs – Same number of VMs – Same input data size per VM – Same job/VM configuration How much BW should we give to the application? 25 Impact of BW Capping No-elongation BW threshold 26 Choosing BW Cap • Tradeoff between performance and cost – Cap > threshold: same performance, costs more – Cap < threshold: lower performance, may cost less • Our Approach: Expose tradeoff to user 1. Profile under different BW caps 2. Expose run times and cost to user 3. User picks the appropriate BW cap Only below threshold ones 27 From Profiling to Model Generation • Collect traffic trace from each VM – Instantaneous throughput of 10ms bin • Generate models for individual VMs • Combine to obtain overall job’s TIVC model – Simplify allocation by working with one model – Does not lose efficiency since per-VM models are roughly similar for MapReduce-like applications 28 Generate Model for Individual VM 1. Choose Bb 2. Periods where B > Bb, set to Bcap BW Bcap Bb Time 29 Maximal Efficiency Model Application Traffic Volume • Efficiency Reserved Bandwdith Volume • Enumerate Bb to find the maximal efficiency model BW Bcap Bb Time 30 Challenges in Realizing TIVC What are the right model functions? How to automatically derive the models? • How to efficiently allocate TIVC? • How to enforce TIVC? 31 TIVC Allocation Algorithm • Spatio-temporal allocation algorithm – Extends VC allocation algorithm to time dimension – Employs dynamic programming • Properties – Locality aware – Efficient and scalable • 99th percentile 28ms on a 64,000-VM data center in scheduling 5,000 jobs 32 Challenges in Realizing TIVC What are the right model functions? How to automatically derive the models? How to efficiently allocate TIVC? • How to enforce TIVC? 33 Enforcing TIVC Reservation • Possible to enforce completely in hypervisor – Does not have control over upper level links – Requires online rate monitoring and feedback – Increases hypervisor overhead and complexity • Observation: Few jobs share a link simultaneously – Most small jobs will fit into a rack – Only a few large jobs cross the core – In our simulations, < 26 jobs share a link in 64,000-VM data center 34 Enforcing TIVC Reservation • Enforcing BW reservation in switches – Avoid complexity in hypervisors – Can be implemented on commodity switches • Cisco Nexus 7000 supports 16k policers 35 Challenges in Realizing TIVC What are the right model functions? How to automatically derive the models? How to efficiently allocate TIVC? How to enforce TIVC? 36 Proteus: Implementing TIVC Models 1. Determine the model 2. Allocate and enforce the model 37 Evaluation • Large-scale simulation – Performance – Cost – Allocation algorithm • Prototype implementation – Small-scale testbed 38 Simulation Setup • 3-level tree topology – 16,000 Hosts x 4 VMs – 4:1 oversubscription • Workload – N: exponential distribution around mean 49 – B(t): derive from real Hadoop apps 50Gbps 20 Aggr Switch … 10Gbps 20 ToR Switch … … 1Gbps 40 Hosts … … … … 39 Batched Jobs • Scenario: 5,000 time-insensitive jobs 1/3 of each type Completion time reduction 42% 21% 23% 35% All rest results are for mixed 40 Varying Oversubscription and Job Size 25.8% reduction for non-oversubscribed network 41 Dynamically Arriving Jobs • Scenario: Accommodate users’ requests in shared data center – 5,000 jobs, Poisson arrival, varying load Rejected: VC: 9.5% TIVC: 3.4% 42 Analysis: Higher Concurrency • Under 80% load Rejected jobs are large 28% higher VM utilization Charge VMs 28% higher revenue VM 7% higher job concurrency 43 Tenant Cost and Provider Revenue • Charging model – VM time T and reserved BW volume B – Cost = N (kv T + kb B) Amazon target utilization – kv = 0.004$/hr, kb = 0.00016$/GB 12% less cost for tenants Providers make more money 44 Testbed Experiment • Setup – 18 machines – Tc and NetFPGA rate limiter • Real MapReduce jobs • Procedure – Offline profiling – Online reservation 45 Testbed Result TIVC finishes job faster than VC, Baseline finishes the fastest Baseline suffers elongation, TIVC achieves similar performance as VC 46 Conclusion • Network reservations in cloud are important – Previous work proposed fixed-BW reservations – However, cloud apps exhibit time-varying BW usage • We propose TIVC abstraction – – – – Provides time-varying network reservations Uses simple pulse functions Automatically generates model Efficiently allocates and enforces reservations • Proteus shows TIVC benefits both cloud provider and users significantly 47 Backup slides 48 Adding Cushions to Model Without cushion With 60s cushion 49 Network Utilization VC reserves 26.4% abs. more bandwidth But less actual utilization (8.9% vs. 20.1%) 50 BW Variability on Cloud [Ballani’11] 51 Model Refinement • Can we further reduced BW for low efficiency pulses without elongation? – This allows us potentially fit more jobs Hadoop Hive Join 52 Model Refinement (cont.) • If efficiency of a pulse < γ lower the cap so that efficiency = α • γ = 8%, α = 20% 53