Network Performance Isolation in Data Centres using ConEx Congestion Policing draft-briscoe-conex-policing-01 draft-briscoe-conex-data-centre-02 Bob Briscoe Chief Researcher, BT IRTF DCLC Jul 2014 Bob Briscoe’s work is part-funded by the European Community under its Seventh Framework Programme through the Trilogy 2 project (ICT-317756) © British Telecommunications plc 1 purpose of talk • work proposal for the data centre latency control r-g – data centre queuing delay control – designed for global scope (inter-data-centre,... Inter-net) – this talk: adds first step: intra-data centre • without any new protocols • started in the IETF congestion exposure (ConEx) w-g • generalised for initial deployment without ConEx – and even without ECN end-to-end – now even without ECN on switches (in slides, not draft) © British Telecommunications plc 2 Network Performance Isolation in Data Centres • An important problem – isolating between tenants, or departments – virtualisation isolates CPU / memory / storage – but network and I/O system is highly multiplexed & distributed bit-rate • SDN-based (edge) capacity partitioning* – configuration churn: nightmare at scale – poor use of capacity bit-rate time • edge-based weighted round robin (or WFQ) – More common – but biases towards heavy hitters (no concept of time) © British Telecommunications plc * every problem in computer science can be solved by not thinking then hiding the resulting mess under a layer of abstraction 3 time Outline Design – First Step edge bottlenecks by capacity design • Edge policing like Diffserv – but congestion policing (per guest) • isolation within FIFO queue w VM sender VM receiver congestion policer guest OS hypervisor switching hosts switches • no config on switches © British Telecommunications plc 4 bottleneck congestion policer • in a well-provisioned link, policer rarely intervenes • but whenever needed, it limits queue growth FIFO incoming packet stream Y AQM p(t) w1 congestion token bucket network buffer policer w2 wi … di(t) outgoing packet stream foreach pkt { i = classify_user(pkt) di += wi*(tnow-ti) //fill ti = tnow di -= s * p //drain if (di<0) {drop(pkt)} } ci … s: packet size p: drop prob of AQM meter © British Telecommunications plc 5 actually each bucket needs to be two buckets to limit bursts of congestion policer AQM police if either bucket empties kwi • similar code di(t) C congestion burst limiter p(t) wi – except 2 token buckets ci main congestion token bucket meter © British Telecommunications plc 6 ... if (di1<0 || di2<0) {drop(pkt)} performance isolation outcome • WRR or WFQ rate time • congestion policer rate – with unequal traffic loads time • congestion policer rate – treats equal traffic loads equivalently to WRR time © British Telecommunications plc 7 Outline Design edge and core queue control • Edge policing like Diffserv – but congestion policing (per-guest) • Hose model • intra-class isolation in all FIFO queues w VM sender VM receiver TEP / congestion policer TEP / audit guest OS hypervisor switching hosts switches • FIFO ECN marking on L3 switches • no other config on switches © British Telecommunications plc 8 trusted path congestion feedback • Initial deployment – all under control of infrastructure admin • ECN on guest hosts: optional transport sender ConEx packets policer transport receiver audit – ECN enabled across tunnel infrastructure • ConEx on guest hosts: optional – any ConEx-enabled packet doesn’t require tunnel feedback transport sender TEP policer Non-ConEx packets feedback between tunnel endpoints • details – see spare slide or draft infrastructure © British Telecommunications plc 9 transport receiver TEP Features of Solution • Network performance isolation between tenants • No loss of LAN-like multiplexing benefits • work-conserving • • • • Zero (tenant-related) switch configuration No change to existing switch implementations Weighted performance differentiation Simplest possible contract • per-tenant network-wide allowance • tenant can freely move VMs around without changing allowance • sender constraint, but with transferable allowance • Transport-Agnostic • Extensible to wide-area and inter-data-centre interconnect © British Telecommunications plc 10 call for interest • implementation in hypervisors • evaluation © British Telecommunications plc 11 Network Performance Isolation in Data Centres using congestion policing draft-briscoe-conex-policing-01 draft-briscoe-conex-data-centre-02 Q&A & spare slides measuring contribution to congestion = bytes weighted by congestion level = bytes dropped (or ECN-marked) = ‘congestion-volume’ as simple to measure as volume 1% bit-rate time congestion 0.01% 0.01% congestion 1% congestion 3MB 300MB 100MB 10GB 1MB 1MB © British Telecommunications plc 13 time unilateral deployment technique for data centre operator • exploits: • widespread edge-edge tunnels in multi-tenant DCs to isolate forwarding • a side-effect of standard tunnelling (IP-in-IP or any ECN link encap) D S 00 10 E C N ingress 1 D S policer E C N D S E C N 11 drop congested network 10 11 element 2 D S E C N E • for e2e transports that don’t support ECN, the operator can: 1. at encap: alter 00 to 10 in outer 2. at interior buffers: turn on ECN • defers any drops until egress E • audit A just before egress can see packets to be dropped 14 IPFIX egress A D S E C N audit E C N D S meter collector E C N D S D S E C N 3 exporter • for e2e transports that don’t support ConEx, the operator can create its own trusted feedback: 3. at decap: only for Not-ConEx packets, feedback aggregate congestion marking counters: • CE outer, Not-ECT inner = loss • CE outer, ECT inner = ECN