Infiniband subnet management • Discuss the Infiniband subnet management system • Discuss fat tree and subnet management in an Infiniband with a fat tree topology. • References • A. Bermudez, R. Casado, F.J. Quiles, T. M. Pinkston, J. Duato, “Evaluation of a Subnet Management Mechanism for Infiniband Networks”, ICPP 2003. • A. Vishnu, A. R. Mamidala, H. Jin, D. K. Panda, “Performance Modeling of Subnet Management on Fat Tree Infiniband Networks using OpenSM”, Workshop on System Management Tools on Large Scale Parallel Systems, Held in Conjunction with IPDPS 2005 • X. Lin, Y. Chung, T. Huang, “A Multiple LID Routing Scheme for Fat-TreeBased Infiniband Networks.” IPDPS 2004. • Infiniband devices and entities related to subnet management • Devices: Channel Adapters (CA), Host Channel Adapters, switches, routers • Subnet manager (SM): discovering, configuring, activating and managing the subnet • A subnet management agent (SMA) in every device generates, responses to control packets (subnet management packets (SMPs)), and configure local components for subnet management • SM exchange control packets with SMA with subnet management interface (SMI). • Subnet management packets (SMP) – 256 bytes of data – Use unreliable datagram service on the management virtual lane (VL 15) – LID routed: use lookup table for forwarding • Use after the subnet is setup. E.g. Check the status of an active port – Direct routed: has the information of the output port for each intermediate hop. • Subnet discovery for the subnet is setup • Subnet management packets (SMP) – – – – – Define the operation to be performed by SM Get: get the information about CA, switch, port Set: set the attribute of a port (e.g. LID) GetResp: get response Trap: inform SM about the state of a local node • A SMA stop sending Trap message until it receives TrapRepress packet. • Topology information can be obtained by a sweep and by peridical Traps. • Subnet Management phases: – Topology discovery: sending direct routed SMP to evert port and processing the responses. – Path computation: computing valid paths between each pair of end node – Path distribution phase: configuring the forwarding table • Subnet discovery – SM starts by sending a direct routed Get SMP to its local node. Upone receiving response, SM sends SMPs with additive depth. • Path computation: – Compute paths between all pair of nodes – For irregular topology: • Up/Down routing does not work directly – Need information about the incoming interface and the destination and Infiniband only uses destination – Potential solution: » find all possible paths » remove all possible down link following up links in each node » find one output port for each destination – Why is that still working? No clear to me. – Other solutions: destination renaming – Fat tree topology: • What is the best that can be achieved is also not clear. • Path distribution: – Ordering issue: the network may be in an inconsistent state when partially updated, which may result in deadlock during this period. • Traditional solution, no data packets for a period of time • deadlock free reconfiguration schemes. • Fat Tree: – A way to build large scale clusters • Fat Tree: – Routing in a complete fat tree: a up phase and a down phase (always contention free). • Fat Tree: – The complete fat tree has the scalability problem • The root has a very large nodal degree – How to build a fat tree with nodes that have a constant nodal degree. – M-port n-tree FT(m, n) • m is the number of port per switch • n+1 is the height of the tree • The tree consists of 2*(m/2)^n processing nodes and (2n-1)*(m/2)^(n-1) switches. – How is an m-port n-tree FT(m, n) connected? • m is the number of port per switch • n+1 is the height of the tree • The tree consists of 2*(m/2)^n processing nodes and (2n-1)*(m/2)^(n-1) switches. • A processing node is labeled as P(p_0 … p_{n-1}), – P_0 = 0..m-1, p_i (i!=0) = 0..m/2-1 • A switch is labeled as SW<w_0…w_{n-2}, l> – l = 0..n-1, – When l=0, w_i = 0..m/2-1 – When l!=0, w_0 = 0..m-1, w_I (I!=0) = 0..m/2-1 • Figure 5: a 4-port 3-tree • How is the tree connected? – SW<w, l>_k be the kth port of SW(w, l). – SW(w, l)_k and SW(w’, l’)_k’ is connected iff • l’ = l + 1 • w_0…w_{n-3} = w_0’…w_{l-1}’w_{l+1}’…w_{n-2}’ • k=w_l’, k’ = w_{n-2} + m/2 – Question: which switches are connected to SW<001, l>? How the ports are connected? • Fat tree properties: – Multiple routes between two nodes • Deterministic routing: one path between two nodes, how to map? – What is a good mapping? • In the case when the traffic pattern is unknown, common practice is to minimize the maximum load on a link. – Do we know how to do it? • Not clear even when there is no restriction on the routing. It is likely that an optimal solution exists for a particular FAT tree topology • In infiniband, destination based routing put some restriction on which path can be used.