Infiniband subnet management

advertisement
Infiniband subnet management
• Discuss the Infiniband subnet management system
• Discuss fat tree and subnet management in an
Infiniband with a fat tree topology.
• References
• A. Bermudez, R. Casado, F.J. Quiles, T. M. Pinkston, J. Duato, “Evaluation of
a Subnet Management Mechanism for Infiniband Networks”, ICPP 2003.
• A. Vishnu, A. R. Mamidala, H. Jin, D. K. Panda, “Performance Modeling of
Subnet Management on Fat Tree Infiniband Networks using OpenSM”,
Workshop on System Management Tools on Large Scale Parallel Systems,
Held in Conjunction with IPDPS 2005
• X. Lin, Y. Chung, T. Huang, “A Multiple LID Routing Scheme for Fat-TreeBased Infiniband Networks.” IPDPS 2004.
• Infiniband devices and entities related to
subnet management
• Devices: Channel Adapters (CA), Host Channel
Adapters, switches, routers
• Subnet manager (SM): discovering, configuring,
activating and managing the subnet
• A subnet management agent (SMA) in every device
generates, responses to control packets (subnet
management packets (SMPs)), and configure local
components for subnet management
• SM exchange control packets with SMA with subnet
management interface (SMI).
• Subnet management packets (SMP)
– 256 bytes of data
– Use unreliable datagram service on the
management virtual lane (VL 15)
– LID routed: use lookup table for forwarding
• Use after the subnet is setup. E.g. Check the status
of an active port
– Direct routed: has the information of the output
port for each intermediate hop.
• Subnet discovery for the subnet is setup
• Subnet management packets (SMP)
–
–
–
–
–
Define the operation to be performed by SM
Get: get the information about CA, switch, port
Set: set the attribute of a port (e.g. LID)
GetResp: get response
Trap: inform SM about the state of a local node
• A SMA stop sending Trap message until it receives
TrapRepress packet.
• Topology information can be obtained by a sweep
and by peridical Traps.
• Subnet Management phases:
– Topology discovery: sending direct routed SMP
to evert port and processing the responses.
– Path computation: computing valid paths
between each pair of end node
– Path distribution phase: configuring the
forwarding table
• Subnet discovery
– SM starts by sending a direct routed Get SMP to its
local node. Upone receiving response, SM sends SMPs
with additive depth.
• Path computation:
– Compute paths between all pair of nodes
– For irregular topology:
• Up/Down routing does not work directly
– Need information about the incoming interface and the
destination and Infiniband only uses destination
– Potential solution:
» find all possible paths
» remove all possible down link following up links in each
node
» find one output port for each destination
– Why is that still working? No clear to me.
– Other solutions: destination renaming
– Fat tree topology:
• What is the best that can be achieved is also not clear.
• Path distribution:
– Ordering issue: the network may be in an
inconsistent state when partially updated, which
may result in deadlock during this period.
• Traditional solution, no data packets for a period of
time
• deadlock free reconfiguration schemes.
• Fat Tree:
– A way to build large scale clusters
• Fat Tree:
– Routing in a complete fat tree: a up phase and a
down phase (always contention free).
• Fat Tree:
– The complete fat tree has the scalability
problem
• The root has a very large nodal degree
– How to build a fat tree with nodes that have a
constant nodal degree.
– M-port n-tree FT(m, n)
• m is the number of port per switch
• n+1 is the height of the tree
• The tree consists of 2*(m/2)^n processing nodes and
(2n-1)*(m/2)^(n-1) switches.
– How is an m-port n-tree FT(m, n) connected?
• m is the number of port per switch
• n+1 is the height of the tree
• The tree consists of 2*(m/2)^n processing nodes and
(2n-1)*(m/2)^(n-1) switches.
• A processing node is labeled as P(p_0 … p_{n-1}),
– P_0 = 0..m-1, p_i (i!=0) = 0..m/2-1
• A switch is labeled as SW<w_0…w_{n-2}, l>
– l = 0..n-1,
– When l=0, w_i = 0..m/2-1
– When l!=0, w_0 = 0..m-1, w_I (I!=0) = 0..m/2-1
• Figure 5: a 4-port 3-tree
• How is the tree connected?
– SW<w, l>_k be the kth port of SW(w, l).
– SW(w, l)_k and SW(w’, l’)_k’ is connected iff
• l’ = l + 1
• w_0…w_{n-3} = w_0’…w_{l-1}’w_{l+1}’…w_{n-2}’
• k=w_l’, k’ = w_{n-2} + m/2
– Question: which switches are connected to
SW<001, l>? How the ports are connected?
• Fat tree properties:
– Multiple routes between two nodes
• Deterministic routing: one path between two nodes,
how to map?
– What is a good mapping?
• In the case when the traffic pattern is unknown,
common practice is to minimize the maximum load
on a link.
– Do we know how to do it?
• Not clear even when there is no restriction on the
routing. It is likely that an optimal solution exists for
a particular FAT tree topology
• In infiniband, destination based routing put some
restriction on which path can be used.
Download