Generic and Automatic Address Configuration for Data Center Networks 1Kai Chen, 2Chuanxiong Guo, 2Haitao Wu, 3Jing Yuan, 4Zhenqian Feng, 1Yan Chen, 5Songwu Lu, 6Wenfei Wu 1Northwestern University, 2Micrsoft Research Asia, 3Tsinghua, 4NUDT, 5UCLA, 6BUAA SIGCOMM 2010, New Delhi, India 1/25 Motivation Address autoconfiguration is desirable in networked systems Manual configuration is error-prone 50%-80% network outages are due to manual configuration DHCP for layer-2 Ethernet autoconfiguration Address autoconfiguration in data centers (DC) has become a problem Applications need locality information for computation New DC designs encode topology information for routing DHCP is not enough - no such locality/topology information 2/25 Research Problem Given a new/generic DC, how to autoconfigure the addresses for all the devices in the network? DAC: data center address autoconfiguration 3/25 Outline Motivation Research Problem DAC Implementation and Experiments Simulations Conclusion 4/25 DAC Input Blueprint Graph (Gb) A DC graph with logical IDs Logical ID can be any format Available earlier and can be automatically generated 10.0.0.3 (a). Blueprint: Each node has a logical ID Physical Topology Graph (Gp) A DC graph with device IDs Device ID can be MAC address Not available until the DC is built and topology is collected 00:19:B9:FA:88:E2 (b). Physical network topology: Each device has a device ID 5/25 DAC System Framework Malfunction Detection Physical Topology Collection Device-to-logical ID Mapping Logical ID Dissemination 6/25 Two Main Challenges Challenge 1: Device-to-logical ID Mapping Assign a logical ID to a device, preserving the topological relationship between devices Challenge 2: Malfunction Detection Detect the malfunctioning devices if the physical topology is not the same as blueprint (NP-complete and even APX-hard) 7/25 Roadmap Malfunction Detection Physical Topology Collection Device-to-logical ID Mapping Logical ID Dissemination 8/25 Device-to-logical ID Mapping How to preserve the topological relationship? Abstract DAC mapping into the Graph Isomorphism (GI) problem The GI problem is hard: complexity (P or NPC) is unknown Introduce O2: a one-to-one mapping for DAC O2 Base Algorithm and O2 Optimization Algorithm Adopt and improve techniques from graph theory 9/25 O2 Base Algorithm Gb: {l1 l2 l3 l4 l5 l6 l7 l8} Gp: {d1 d2 d3 d4 d5 d6 d7 d8} Decomposition Gb: {l1} {l2 l3 l4 l5 l6 l7 l8} Gp: {d1} {d2 d3 d4 d5 d6 d7 d8} Refinement Gb: {l1} {l5} {l2 l3 l4 l6 l7 l8} Gp: {d1} {d2 d3 d5 d7} {d4 d6 d8} 10/25 O2 Base Algorithm Gb: {l1 l2 l3 l4 l5 l6 l7 l8} Gp: {d1 d2 d3 d4 d5 d6 d7 d8} Decomposition Gb: {l5} {l1 l2 l3 l4 l6 l7 l8} Gp: {d1} {d2 d3 d4 d5 d6 d7 d8} Refinement Gb: {l5} {l1 l2 l7 l8} {l3 l4 l6 } Gp: {d1} {d2 d3 d5 d7} {d4 d6 d8} Refinement Gb: {l5} {l1 l2 l7 l8} {l6} {l3 l4} Gp: {d1} {d2 d3 d5 d7} {d6} {d4 d8} 11/25 O2 Base Algorithm Refinement Gb: {l5} {l6} {l1 l2} {l7 l8} {l3 l4} Gp: {d1} {d6} {d2 d7} {d3 d5} {d4 d8} Decomposition Gb: {l5} {l6} {l1} {l2} {l7 l8} {l3 l4} Gp: {d1} {d6} {d2} {d7} {d3 d5} {d4 d8} Decomposition & Refinement Gb: {l5} {l6} {l1} {l2} {l7} {l8} {l3} {l4} Gp: {d1} {d6} {d2} {d7} {d3} {d5} {d4} {d8} 12/25 O2 Base Algorithm O2 base algorithm is very slow for 3 problems: P1: Iterative splitting in Refinement: it tries to use each cell to split every other cell iteratively Gp: π1 π2 π3 …… πn-1 πn P2: Iterative mapping in Decomposition: when the current mapping is failed, it iteratively selects the next node as a candidate for mapping P3: Random selection of mapping candidate: no explicit hint for how to select a candidate for mapping 13/25 OR2: Algorithm R1: A cell cannot split another 2 Optimization If u in G cannot be mapped to v in G , then all nodes b p cell that is disjoint with itself. in the same orbit with cannot be mapped to v either. Heuristics based on uDC topology features nodesSplitting u, v in G(for cannot be b, GpProblem SparseR3: => Two Selective 1) mapped other Filtering if have different Symmetricto => each Candidate via OrbitSPLDs. (for Problem 2) Asymmetric => Candidate Selection via SPLD (Shortest Path Length Distribution) (for Problem3) We propose the last one and adopt the first two from graph theory 14/25 Speed of O2 Mapping 8.9 seconds 12.4 hours 8.9 seconds 15/25 Roadmap Malfunction Detection Physical Topology Collection Device-to-logical ID Mapping Logical ID Dissemination 16/25 Malfunction Detection Types of Malfunctions Node failure, Link failure, Miswiring Effects of Malfunctions O2 cannot find device-to-logical ID mapping Our Goal Detect malfunctioning devices Problem Complexity An ideal solution 1. Find Maximum Common Subgraph (MCS) between Gb and Gp say Gmcs 2. Remove Gmcs from Gp => the rest are malfunctions MCS is NP-complete and even APX-hard 17/25 Isomorphic Practical Solution 1 Observations 1 Isomorphic k Our Idea … … cause node degree change 2 Most node/link failures, miswirings 2 Special, rare miswiringsIsomorphic happen without degree change k Non-Isomorphic Degree change case: exploit the degree regularity in DC k+1 k+1(common sense) Devices in DC have regular degrees No degree change case: probe sub-graphs derived from anchor points, and correlate the miswired devices using majority voting Select anchor point pairs from 2 graphs probe sub-graphs iteratively, stop when k-hop subgraphs are isomorphic but (k+1)-hop are not, increase the counters for k- and (k+1)- hop nodes Output node counter list: high counter => high possible to be miswired 18/25 Simulations on Miswiring Detection 1.5% Over data centers with tens of thousands of devices with 1.5% nodes as anchor points to identify all hardest-to-detect miswirings 19/25 Roadmap Malfunction Detection Physical Topology Collection Device-to-logical ID Mapping Logical ID Dissemination 20/25 Basic DAC Protocols CBP: Communication Channel Building Protocol Top-Down, from root to leaves PCP: Physical Topology Collection Protocol Bottom-Up, from leaves to root LDP: Logical ID Dissemination Protocol Top-Down, from root to leaves Logical DAC Manager DAC manager: 1. handle all the intelligences 2. can be any server in the network 21/25 Implementation and Experiments Over a BCube(8,1) network with 64 servers 1. 2. 3. 4. 5. Communication Channel Building (CCB) Transition time Physical Topology Collection (TC) Device-to-logical ID Mapping Logical IDs Dissemination (LD) The total time used: 275 milliseconds 22/25 Simulations Over large-scale data centers (in milliseconds) 46 seconds for the DCell(6, 3) with 3.8+ million devices 23/25 Summary DAC: address autoconfiguration for generic data center networks, especially when the address is topology-aware Graph isomorphism for address configuration 275ms for a 64-sever BCube, and 46s for a DCell with 3.8+ million devices Anchor point probing for malfunction detection with 1.5% nodes as anchor points to identify all hardest-todetect miswirings DAC is a small step towards the more ambitious goal of automanagement of the whole data centers 24/25 Q & A? Thanks! 25/25