zUpdate: Updating Data Center Networks with Zero Loss Hongqiang Harry Liu (Yale University) Xin Wu (Duke University) Ming Zhang, Lihua Yuan, Roger Wattenhofer, Dave Maltz (Microsoft) 1 DCN is constantly in flux Upgrade ο Reboot New Switch Switches Traffic Flows 2 DCN is constantly in flux Switches Traffic Flows Virtual Machines 3 Network updates are painful for operators Switch Upgrade Holy C**p Two weeks before update, Bob has to: • Coordinate with application owners Complex • Prepare a detailed updatePlanning plan • Review and revise the plan with colleagues At the night of update, Bob executes plan by hands, but • Application alerts arePerformance triggered unexpectedly Unexpected Faults • Switch failures force him to backpedal several times. Eight hours later, Bob is still stuck with update: • No sleep over night Laborious Process • Numerous application complaints • No quick fix in sight Bob: An operator 4 Congestion-free DCN update is the key • Applications want network updates to be seamless • Reachability • Low network latency (propagation, queuing) • No packet drops Congestion • Congestion-free updates are hard • • • • Many switches are involved Multi-step plan Different scenarios have distinct requirements Interactions between network and traffic demand changes 5 A clos network with ECMP All switches: Equal-Cost Multi-Path (ECMP) Link capacity: 1000 CORE 1 2 3 4 150= 920150 620 + 150 + 150 AGG 1 2 300 ToR 4 6 5 300 300 1 600 3 2 3 4 300 5 600 6 Switch upgrade: a naïve solution triggers congestion Link capacity: 1000 CORE 1 2 3 4 1070 620 + 300 150 + 150 = 920 AGG 1 2 Drain AGG1 ToR 3 4 6 5 600 1 2 3 4 5 7 Switch upgrade: a smarter solution seems to be working Link capacity: 1000 CORE 1 2 3 4 50 = 1070 970 620 + 300 + 150 AGG 1 2 3 4 Drain AGG1 ToR 6 5 500 1 2 3 4 100 Weighted ECMP 5 8 Traffic distribution transition Initial Traffic Distribution Congestion-free CORE 1 AGG 1 2 2 300 ToR 3 3 4 1 4 6 5 300 300 2 3 4 Final Traffic Distribution Congestion-free 300 5 Transition ? CORE 1 AGG 1 2 2 0 ToR 3 3 4 6 5 600 1 4 500 2 3 4 100 5 Simple? NO! Asynchronous Switch Updates 9 Asynchronous changes can cause transient congestion When ToR1 is changed but ToR5 is not yet: Link capacity: 1000 CORE 1 2 3 4 620 + 300 + 150 = 1070 AGG 1 2 3 4 6 5 Drain AGG1 300 300 600 ToR 1 2 3 4 5 Not Yet 10 Solution: introducing an intermediate step Final Initial CORE 1 2 3 4 CORE 1 AGG 1 2 3 4 Transition AGG 1 2 300 ToR 3 4 300 1 6 5 300 2 3 Congestion-free regardless the asynchronizations 0 300 4 2 ToR 5 1 AGG 1 2 1 ToR ? 2 200 3 400 1 3 2 3 4 4 450 500 2 3 4 100 5 Congestion-free regardless the asynchronizations 150 5 6 5 6 5 4 4 600 Intermediate CORE 3 11 How zUpdate performs congestion-free update Update Scenario Operator Update requirements zUpdate Current Traffic Distribution Intermediate Traffic Distribution Intermediate Traffic Distribution Target Traffic Distribution Data Center Network 12 Key technical issues • Describing traffic distribution • Representing update requirements • Defining conditions for congestion-free transition • Computing an update plan • Implementing an update plan 13 Describing traffic distribution π ππ£,π’ : flow f’s load on the link from switch v to u CORE s4 s5 π ππ 2 ,π 4 =150 AGG 150 s2 s3 π ππ 1 ,π 2 =300 ToR 300 s1 f 600 Traffic Distribution: π· = π ππ£,π’ ∀π, ππ£,π’ 14 Representing update requirements CORE s4 s5 When s2 recovers AGG s2 s3 Drain s2 π π Constraint: ππ 1 ,π 2 =ππ 1 ,π 3 π Constraint: ππ 1 ,π 2 = 0 ToR s1 f To To restore upgrade ECMP: switch π 2 : π: π π = 0 π ∀π, π ∀π, π: π£,π π2π£,π’π£,π 2= ππ£,π’ 1 2 15 Switch asynchronization exponentially inflates the possible load values Transition from old traffic distribution to new traffic distribution f ingress 1 2 4 6 3 5 7 egress f 8 π π7,8 Asynchronous updates can result in 25 possible load values on link π7,8 during transition. In large networks, it is impossible to check if the load value exceeds link capacity. 16 Two-phase commit reduces the possible load values to two Transition from old traffic distribution to new traffic distribution f ingress 2 1 version flip 4 6 egress 8 5 3 f 7 • With two-phase commit, f’s load on link ππ£,π’ only has two possible values throughout a transition: π old ππ£,π’ or π new ππ£,π’ 17 Flow asynchronization exponentially inflates the possible load values f1 1 2 4 6 f1 + f2 8 f2 0 3 5 7 π old π old 2 + π7,8 π old 2 + π7,8 π2 π1 1 π7,8 + π7,8 = π7,8 1 π7,8 π new π old π new 2 + π7,8 π new 2 + π7,8 1 π7,8 1 π7,8 π new Asynchronous updates to N independent flows can result in 2π possible load values on link π7,8 18 Handling flow asynchronization f1 2 1 6 4 8 f2 0 5 3 7 2 + π7,8 π old 2 + π7,8 1 π7,8 Basic idea: π1 π2 π7,8 + π7,8 ≤ π1 new , π7,8 }+ π2 πππ max{π7,8 π2 new , π7,8 } π new π old π new 2 + π7,8 π new 2 + π7,8 1 π7,8 π1 πππ max{π7,8 π old π old π2 π1 1 π7,8 + π7,8 = π7,8 1 π7,8 π new [Congestion-free transition constraint] There is no congestion throughout a transition if and only if: π old ∀ππ£,π’ : max ππ£,π’ π new , ππ£,π’ ≤ ππ£,π’ ∀π ππ£,π’ : the capacity of link ππ£,π’ 19 Computing congestion-free transition plan Linear Programming Constant: Current Traffic Distribution Constraint: Congestion-free Variable: Intermediate Traffic Distribution Variable: Intermediate Traffic Distribution Constraint: Update Requirements Variable: Target Traffic Distribution Constraint: • Deliver all traffic • Flow conservation 20 Implementing an update plan • Computation time • Switch table size limit Weighted-ECMP ECMP Critical Flows Other Flows • Update overhead • Failure during transition Flows traversing bottleneck links • Traffic demand variation 21 Evaluations • Testbed experiments • Large-scale trace-driven simulations 22 Testbed setup Switch: OpenFlow 1.0 Link: 10Gbps ToR6,7: 6.2Gbps ToR6,7: 6.2Gbps CORE 1 AGG 1 3 2 2 3 4 5 ToR6,7: 6.2Gbps ToR6,7: 6.2Gbps 4 4 5 8 9 6 Drain AGG1 ToR 1 2 3 6 7 ToR5: 6Gbps 10 11 12 ToR8: 6Gbps Traffic Generator 23 zUpdate achieves congestion-free switch upgrade Initial CORE 1 AGG 1 2 2 3Gbps ToR 3 3 4 3Gbps 1 2 Intermediate 5 3Gbps 3 4 4 CORE 1 6 AGG 1 2 2 2Gbps 3Gbps 3 1 4 4 4Gbps ToR 5 3 6 5 4.5Gbps 2 3 1.5Gbps 4 5 Real-time link utilization Link Utilization 1.05 Final 1 0.95 CORE 1 AGG 1 2 3 4 0.9 0.85 0.8 0 5 10 15 Time (sec) Link: CORE1-AGG3 20 Link: CORE3-AGG4 25 2 0 ToR 3 4 6Gbps 1 2 6 5 5Gbps 3 4 1Gbps 5 24 One-step update causes transient congestion Initial CORE 1 AGG 1 2 2 3Gbps ToR 3 3 4 3Gbps 1 4 6 5 3Gbps 2 3 4 3Gbps 5 Real-time link utilization Final Link Utilization 1.1 1 CORE 1 AGG 1 2 3 4 0.9 0.8 0.7 0 5 10 Link: CORE1-AGG3 0 15 ToR Time (sec) 2 3 4 6Gbps 1 2 6 5 5Gbps 3 4 1Gbps 5 Link: CORE3-AGG4 25 Large-scale trace-driven simulations A production DCN topology CORE New Switch AGG ToR Flows Test flows (1%) 26 zUpdate beats alternative solutions Post-transition Loss Rate Transition Loss Rate Loss Rate (%) 15 10 5 0 zUpdate #step 2 zUpdate-OneStep 1 ECMP-OneStep 1 ECMP-Planned 300+ 27 Conclusion • Switch and flow asynchronization can cause severe congestion during DCN updates • We present zUpdate for congestion-free DCN updates • Novel algorithms to compute update plan • Practical implementation on commodity switches • Evaluations in real DCN topology and update scenarios 28 Thanks & Questions? 29 Updating DCN is a painful process Interactive Applications Switch Upgrade Any performance disruption? How bad will the latency be? Operator Uh?… This is Bob How long will the disruption last? What servers will be affected? 30 Network update: a tussle between applications and operators • Applications want network update to be fast and seamless • Update can happen on demand • No performance disruption during update • Network update is time consuming • Nowadays, an update is planned and executed by hands • Rolling back in unplanned cases • Network update is risky • Human errors • Accidents 31 Challenges in congestion-free DCN update • Many switches are involved • Multi-step plan • Different scenarios have distinctive requirements • • • • Switch upgrade/failure recovery New switch on-boarding Load balancer reconfiguration VM migration Help! • Coordination between changes in routing (network) and traffic demand (application) 32 Related work • SWAN [SIGCOMM’13] • maximizing the network utilization • Tunnel-based traffic engineering • Reitblatt et al. [SIGCOMM’12] • Control plane consistency during network updates • Per-packet and per-flow cannot guarantee “no congestions” • Raza et al. [ToN’2011], Ghorbani et al. [HotSDN’12] • One a specific scenario (IGP update, VM migration) • One link weight change or one VM migration at a time 33