SWAN: Software-driven wide area network Ratul Mahajan Partners in crime Chi-Yao Hong Vijay Gill Srikanth Kandula Rohan Gandhi Ratul Mahajan Xin Jin Mohan Nanduri Harry Liu Ming Zhang Roger Wattenhofer Inter-DC WAN: A critical, expensive resource Dublin Seattle New York Seoul Barcelona Hong Kong Los Angeles Miami But it is highly inefficient One cause of inefficiency: Lack of coordination Another cause of inefficiency: Local, greedy resource allocation B C D A B E H G Local, greedy allocation D A E H F C G F Globally optimal allocation [Latency inflation with MPLS-based traffic engineering, IMC 2011] SWAN: Software-driven WAN Goals Key design elements Highly efficient WAN Flexible sharing policies Coordinate across services Centralize resource allocation [Achieving high utilization with software-driven WAN, SIGCOMM 2013] Software-defined networking: Primer SDNs • Streamlined switches • Control plane: centralized, off-board • Data plane: direct configuration SWAN overview SWAN controller Traffic demand BW allocation Service broker Rate limiting Service hosts Topology, traffic Network config. Network agent WAN Key design challenges Scalably computing BW allocations Avoiding congestion during network updates Working with limited switch memory Scalably computing allocation Goal: Prefer higher-priority traffic and max-min fair within a class Challenge: Network-wide fairness requires many MCFs Approach: Bounded max-min fairness (fixed number of MCFs) Bounded max-min fairness α3𝑈 demand 𝑑5 demand 𝑑4 α2𝑈 demand 𝑑3 α𝑈 demand 𝑑1 demand 𝑑2 Geometrically partition the demand space with parameters α and 𝑈 Bounded max-min fairness α3𝑈 demand 𝑑5 demand 𝑑4 α2𝑈 demand 𝑑3 α𝑈 demand 𝑑1 demand 𝑑2 Maximize throughput while limiting all allocations below α𝑈 Bounded max-min fairness α3𝑈 demand 𝑑5 demand 𝑑4 α2𝑈 demand 𝑑3 α𝑈 demand 𝑑1 demand 𝑑2 Maximize throughput while limiting all allocations below α𝑈 Bounded max-min fairness α3𝑈 demand 𝑑5 demand 𝑑4 α2𝑈 demand 𝑑3 α𝑈 demand 𝑑1 demand 𝑑2 Fix the allocation for smaller flows Bounded max-min fairness α3𝑈 demand 𝑑5 demand 𝑑4 α2𝑈 demand 𝑑3 α𝑈 demand 𝑑1 demand 𝑑2 Continue until all flows fixed Bounded max-min fairness α3𝑈 demand 𝑑5 demand 𝑑4 α2𝑈 demand 𝑑3 α𝑈 demand 𝑑1 Fairness demand 𝑑2 1 bound: Each flow is within , 𝛼 of 𝛼 max(𝑑𝑖) Number of MCFs: log 𝛼 𝑈 its fair rate Relative Relative deviation deviation SWAN computes fair allocations MPLS TE SWAN (α = 2) Flows sorted by demand In practice, only 4% of the flows deviate more than 5% Key design challenges Scalably computing BW allocations Avoiding congestion during network updates Working with limited switch memory Congestion during network updates Congestion-free network updates Computing congestion-free update plan Leave scratch capacity 𝑠 on each link • Ensures a plan with at most 1 𝑠 − 1 steps Find a plan with minimal number of steps using an LP • Search for a feasible plan with 1, 2, …. max steps Use scratch capacity for background traffic Complementary CDF SWAN provides congestion-free updates Oversubscription ratio Extra traffic (MB) Key design challenges Scalably computing BW allocations Avoiding congestion during network updates Working with limited switch memory Working with limited switch memory Working with limited switch memory Install only the “working set” of paths Use scratch capacity to enable disruption-free updates to the set Throughput (relative to optimal) SWAN comes close to optimal SWAN SWAN w/o rate control MPLS TE Deploying SWAN in production WAN Data center Lab prototype Partial deployment WAN Data center Full deployment Key lesson from using SDNs Loops in SDNs Dependencies are worse than you might think Ongoing work: Robust, fast network updates Understand fundamental limits Develop practical solutions What are the minimal dependencies for a desired consistency property? [On consistent updates in software-defined networks, HotNets 2013] Network update pipeline Routing policy Consistency property Network characteristics Robust rule generation Dependency graph generation Update execution Robust rule generation: Example Capacity Capacity Capacity of Links: of Links: of 10 Links: 10 10 S4 S4 S4 S4 S4 S4 S4 S4 S4 10 10 10 10 10 10 5 5 5 10 10 10 10 10 10 5 5 5 S3 S S3 S3 S3 S1 S1 S1 S3 S3 S3 S1 S1 S1 S1 S1 5 5 5 Demands: Demands: Demands: Demands: Demands: Demands: ands: Demands: Demands:5 5 5 S4->S3:10 S4->S3:10 S4->S3:10 S4->S3:10 S4->S3:10 S4->S3:10 S3:10 S4->S3:10 S4->S3:10 5 5 5 10 10 10 5 5 5 S2->S3:10 S2->S3:10 S2->S3:10 S2->S3:10 S2->S3:10 S2->S3:10 S3:10 S2->S3:10 S2->S3:10 S2 S2 S2 S2 S2 S2 S2 S2 S2 S1->S3:10 S1->S3:10 S1->S3:10 S1->S3:10 S1->S3:10 S1->S3:10 S3:0 S1->S3:0 S1->S3:0 Capacity of Links: 10 (c) (c) Congestion Congestion when when S2 when fails S2 fa S2 (b) Unreliable (b) (b) Unreliable Unreliable new TEnew (TE-1) TE TE (TE-1) (TE-1)(c) Congestion (a) Initial (a) (a) Initial TE Initial TE TE S4 S4new to update to update to update to TE-1 to TE-1 to TE-1 10 10 5 5 S3 S1 S1 5 Demands: Demands: S4->S3:10 S4->S3:10 5 10 S2->S3:10 S2->S3:10 S2 S2 S1->S3:10 S1->S3:10 Robust rule generation Goal: No congestion if any k or fewer switches fail to update Challenge: Too many failure combinations 𝑛 1 + …+ 𝑛 𝑘 Approach: Use a sorting network to identify worst k failures – O(kn) constraints Robust rule generation: Preliminary results Normalized throughput 1 0.8 0.6 0.4 0.2 0 Network update pipeline Routing policy Consistency property Network characteristics Robust rule generation Dependency graph generation Update execution Dependency graph generation Add @𝑢 ∗𝑤 Del @𝑢 ∗𝑣 Add @𝑣 ∗𝑢 Del @𝑤 ∗𝑢 Add @𝑤 ∗𝑣 Del @𝑣 ∗𝑤 Update execution Updates without parents can be applied right away Break any cycles and shorten long chains Add @𝑤 𝑣 𝑣 Add @𝑢 ∗𝑤 Del @𝑢 ∗ 𝑣 Add @𝑣 ∗𝑢 Del @𝑤 ∗𝑢 Add @𝑤 ∗𝑣 Del @𝑣 ∗ 𝑤 Update execution: Preliminary results Summary SWAN yields a highly efficient and flexible WAN – Coordinated service transmissions and centralized resource allocation – Bounded fairness for scalable computation – Scratch capacities for “safe” transitions Change is hard for SDNs – Need to understand fundamental limits, develop practical solutions