EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon Kim Albert Greenberg What are we depending on? Many customers … ineven the Netflix data centers, we have a high don’t realise network issues: capacity, super fast, highly reliable 5 Lessons We’ve Learned Using AWS network. This has afforded us the luxury of designing around chatty APIs to remote systems. AWS networking has more variable latency. Just “spin up more VMs!” Makes app more network dep. http://techblog.netflix.com/2010/12/5lessons-weve-learned-using-aws.html Overhaul apps to deal with variability 2 Cloud: Warehouse Scale Computer Multi-tenancy: To increase cluster utilisation Provisioning the Warehouse CPU, memory, disk Network http://research.google.com/people/jeff/latency.html 6/11/12 3 Sharing the Network • Policy Can we achieve this? – Sharing model • Mechanism 2Ghz VCPU 15GB memory 1Gbps network – Computing rates – Enforcing rates on entities… Customer X • Per-VM (multi-tenant) specifies Tenant X’s Tenant Y’s the thickness of Virtual Switch Virtual Switch • Per-service (search, map-reduce, etc.) each pipe. No traffic matrix. (Hose Model) … VMn … VM1 VM2 VM3 VM1 VM2 VM3 6/11/12 VMi 4 Why is it hard? (1) • Default policy insufficient: 1 vs many TCP flows, UDP, etc. • Poor scalability of traditional QoS mechanisms • Bandwidth demands can be… – Random, bursty – Short: few millisecond requests • Timescales matter! 10–100KB 10–100MB – Need guarantees on the order of few RTTs (ms) 6/11/12 5 Seconds: Eternity 1 Long lived TCP flow Shared 10G pipe Switch Bursty UDP session ON: 5ms OFF: 15ms 6/11/12 6 Under the hood Switch 6/11/12 7 Why is it hard? (2) • Switch sees contention, but lacks VM state • Receiver-host has VM state, but does not see contention (1) Drops in network: servers don’t see true demand Switch (2) Elusive TCP (back-off) makes true demand detection harder 6/11/12 8 Key Idea: Bandwidth Headroom • Bandwidth guarantees: managing congestion • Congestion: link util reaches 100% – At millisecond timescales • Don’t allow 100% util – 10% headroom: Early detection at receiver C C C γC γ<1 Single Switch: Headroom Shared pipe Limit to 9G What about a network? TCP N x 10G UDP 6/11/12 9 Network design: the old Over-subscription http://bradhedlund.com/2012/04/30/network-that-doesnt-suckfor-cloud-and-big-data-interop-2012-session-teaser/ 6/11/12 10 Network design: the new (1) Uniform capacity across racks (2) Over-subscription only at Top-of-Rack http://bradhedlund.com/2012/04/30/network-that-doesnt-suckfor-cloud-and-big-data-interop-2012-session-teaser/ 6/11/12 11 Mitigating Congestion in a Network VM VM 10Gbps pipe 10Gbps pipe Server Load balancing: ECMP, etc. Fabric Server Fabric Admissibility: e2e congestion Aggregate rate < 10Gbps Aggregate rate > 10Gbps control (EyeQ) Congestion free Fabric Fabric gets congested 6/11/12 Load balancing + Admissibility = Hotspot free network core [VL2, FatTree, Hedera, MicroTE] 12 TX EyeQ Platform RX untrusted VM VM VM untrusted VM VM Software VSwitch Software VSwitch Adaptive Rate Limiters Congestion Detectors TX packets RX packets DataCentre Fabric 3Gbps 6Gbps Congestion Feedback TX component reacts 6/11/12 End-to-end flow control (VSwitch—VSwitch) RX component detects 13 Does it work? TCP: 6Gbps UDP: 3Gbps Improves utilisation Provides protection Without EyeQ 6/11/12 With EyeQ 14 State: only at edge One Big Switch EyeQ 15 EyeQ Load balancing + Bandwidth headroom + Admissibility at millisec timescales = Network as one big switch = Bandwidth sharing at edge Linux, Windows implementation for 10Gbps ~1700 lines C code http://github.com/jvimal/perfiso_10g (Linux kmod) No documentation, yet. Thanks! jvimal@stanford.edu 16 6/11/12 17