Proteus: A Topology Malleable Data Center Network Ankit Singla (University of Illinois Urbana-Champaign) Atul Singh, Kishore Ramachandran, Lei Xu, Yueping Zhang (NEC Labs, Princeton) Data Centers Data centers: Foundation of Internet services, enterprise operation – Need good bandwidth connectivity between servers 2 “Good” Bandwidth Connectivity Connect all servers at full bandwidth? Fat-trees [SIGCOMM 2008], VL2 [SIGCOMM 2009] POWER CABLING CONSUMPTION? COMPLEXITY UPGRADE TO 40/100-GIGE? 3 Oversubscribed Networks Is all-to-all full bandwidth connectivity always necessary? – Small number of ‘hot’ ToR-ToR connections • Flyways [HotNets 2009] – >90% bytes flow in ‘elephant flows’ • VL2 [SIGCOMM 2009] – ~60% ToRs see <20% change in traffic for between 1.6-2.2 sec • The Case for Fine-grained TE in Data Centers [WREN 2010] Flyways [HotNets 2009], c-Through and Helios [SIGCOMM 2010] Supplement electrical network with wireless/optics – Wireless/Optical connections are set up between hot ToRs – Some flexibility to adjust to changes in traffic matrix 4 Proteus Proteus is a novel interconnect A NEW DESIGN POINT: ALL-OPTICS above the ToR layer Optical Interconnect ... ToR Servers ToR ... Topology adjusts to traffic demands Low cabling complexity Easier migration to 40/100-GigE Low power consumption ... – – – – Proteus is an oversubscribed network with topology malleability 5 Malleability H G E PICK F G 10 B H 10 C E 10 D F 10 B D 10 H F D A G C E B CHANGE B CHANGE ROUTES C D A A H F G TOPOLOGY D A C E B A G 10 B H 10 C E 10 G F 20 B D 10 CAPACITY 6 Optics: Perfect Fit MEMS A WSS C B 1 Gigabit X 64,000 D C D CIRCUIT A SETUP TIME C B B C MANAGEMENT A 64 Terabits* X 1 * Achieved by NEC Labs and AT&T D C TOPOLOGY B A LIMITED WAVELENGTHS D Low complexity, reconfigurability, low power consumption MEMS = Micro-Electro Mechanical Switch WSS = Wavelength Selective Switch 7 Problem Setting: Container-sized DCN Proteus-2560: Connect 80 ToRs, each with 32 servers Typical container-size in containerized data center architectures Image adapted from: www.sun.com/blackbox 8 ToR Perspective OPTICAL INTERCONNECT 32 PORTS TOWARDS INTERCONNECT … NON-BLOCKING TOR 32 PORTS FOR SERVERS … SERVERS 9 ToR Perspective LIMITED BY TOR TRANSCEIVERS WITH UNIQUEPORT CAPACITY CROSS-RACK WAVELENGTHS TRAFFIC TRANSIT TRAFFIC (HOP-BY-HOP) O O … NON-BLOCKING TOR … INTRA-RACK TRAFFIC (O-E-O conversions add sub-nanosecond latency at each hop) 10 ToR67 13 CHANGE TOPOLOGY INCOMING OUTGOING ToR11 21 ToR29 45 CHANGE LOW CAPACITY C… APACITYHIGH CAPACITY LINK LINK ToR55 73 OPTICAL COMPONENTS TOR1 … 11 MEMS (320 ports) C C C C 4 S R C C C C R S … TOPOLOGY (MEMS) WSS COUPLER BI-DIRECTIONALITY (CIRCULATORS) MUX DEMUX … … … … CAPACITY (WSS) ToR26 … 32 … ToR59 … … 12 Proteus-2560 Properties Build any 4-regular ToR topology Each link’s capacity varies in each direction – Capacity Є {10, 20, 30, …, 320 } Gbps – Provided sum of capacities of 4 links <= 320 Gbps – (Also avoid wavelength contention) Use hop-by-hop connections to other ToRs – Transit traffic doesn’t interfere with intra-ToR traffic 13 Topology Management D C ? A B ? C D MEMS COMPLEX PROBLEM: C A A WSS ?D B Hop-by-hop routing ALL CONFIGURATIONS ARE INTERDEPENDENT We formulate the problem as a mixed-integer linear program Describe a heuristic approach backed by graph-theoretic insights – Likely to take under a couple of hundred milliseconds 14 Heuristic Approach – Key Ideas Topology: Weighted 4-matching over hot ToR-ToR connections – Check and correct for connectivity Routing: Can use shortest paths – Ideally, need low-congestion routing schemes Capacities: Graph edge-coloring over wavelengths – Ensure each link carries at least one wavelength 15 Preliminary Analysis Cabling: #Fibers ≈ 1/5th #cables in a fat-tree Ease of upgrade: When ToRs move to 40/100-GigE, nothing else changes! Cost: similar to a fat-tree – Optics is yet to benefit from commoditization – To some extent, dispels the optics is expensive myth Power: 50% of fat-tree power consumption Fat-tree is also fault tolerant though 16 Conclusion, Ongoing Work A novel data center architecture – – – – – Unprecedented topology flexibility Reduced cabling complexity Easier migration to 40/100-GigE Reduced power consumption Explores a new design point – all-optics Experimental evaluation Incremental update heuristics Mega-data-center scale Fault tolerance TRANSIENT BEHAVIOR? ROUTING? SYNCHRONIZATION? 17 Thank You! Questions? Extras / Backup 19 Hop-by-hop Through ToRs MEMS – limited end-to-end circuits Need hop-by-hop routes over these circuits Feasibility assessment: works fine! 20 Helios [SIGCOMM ’10] Pods are still fat-trees Requires design-time decision on stable vs. unstable traffic Does not exploit multi-hop optical routes Does not leverage WSS technology for variable capacity Image from “Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers” – Farrington et al 21