Interconnect Networks Basics Generic parallel/distributed system architecture • On-chip interconnects (manycore processor) • Off-chip interconnects (clusters of servers) Interconnection network performance • Latency: how much time does it take between the time when a send of 1 byte is issued and the time when the receive of the data is completed? – Signal propogation delay + router queuing delay • Bandwidth: how much time to send a large amount of data (e.g. 1MB)? • Examples: – Ethernet: • Bandwidth 100Mbps, 1Gbps, 10Gbps, 100Gbps • Latency: 25us -100us (user level, single hop, try ping between linprog’s) – InfiniBand • Bandwidth: 20Gbps, 40Gbps, 54Gbps, 80Gbps, …… • Latency: 1-3us (user level, single hop) Interconnection network performance • Latency and Bandwidth – Different levels • User level: the performance that users feel • Systems level, device level • Which level will have the highest bandwidth? – Example: 1Gbps Ethernet, 800Mbps at system level, 650Mbps at the user level. • 1Gbps Ethernet, which level? • 0.115ms ping latency, which level? – Some measurement trap: single pair .vs. multiple pair. Network components • Network interface (card) • Communication between a node and the network • Link • Bundle of wires and fibers that carry signals • Switches • Connects a fixed number of input channels to a fixed number of output channels. • In this community, switches may also have the router functions. Switch The cross-bar can realize a communication from any input port to any output port. • The simplest form is a dedicated computer with memory (e.g. linux router). Most expensive form: Cross-bar functionality – all permutations can be realized simultaneously i n p u t 1 2 1 2 1 2 3 3 3 4 4 4 1 2 3 4 1 2 3 4 output A 4x4 cross-bar (1,2, 3, 4)-> (3, 1, 2, 4) 1 2 3 4 (1,2,3,4)->(4,3,2,2) Only (1,2,3,4)->(4,3,2,-) Permutation: (1, 2, 3, 4) -> (3, 1, 2, 4) A communication pattern where each source happens once, each destination happens once. The input registers send control signals to the control, routing, scheduling module indicating the pattern; the control module computes and sets the dots. Switch example: 24-port 1Gbps Ethernet switch • 24 input ports and 24 output ports – each Ethernet jacket has one input port and one output port. • All 24 machines can send and receive simultaneously. switch Ethernet card machine Alternatives to cross-bars • A question: why buffers when we can always do permutation? • An N x N cross bar has O(N^2) cross points (on/off switches). – Not scalable, expensive • An alternative for low end switches: bus and memory – When bus and memory is fast enough, moving data between input and output ports are like memory copy in a typical computer. Bus and memory alternative to crossbar • Realizing (1, 2, 3, 4) -> (4, 3, 2, 1) – – – – – – – – – Read from input port 1 to memory A Read from input port 2 to memory B Read from input port 3 to memory C Read from input port 4 to memory D Run forwarding logic (find out the output ports) Write A to output port 4 Write B to output port 3 Write C to output port 2 Write D to output port 1 Bus and memory alternative to crossbar • A typical northbridge bandwidth is a few GBps. Let us assume the bandwidth is 4GBps, how many ports can the northbridge support in 100Mbps Ethernet swithes? Another alternative: multistage interconnection network • Realize all permutations without controlling O(N^2) cross-points. – Clos networks, Benes networks Each of the dot is a 2x2 switch, controlled by two states. 0 1 How to realize 0000->0000, 0001->0001, 0010->1011? Switch • All approximate crossbars – High end ones are equivalent to or close to crossbars: all permutations can happens simultaneously. – Low end ones will have limited total bandwidth (aggregate bandwidth). • Example: High end and low end 24 port 1Gbps switch connecting 24 computers. – With one pair of Source/destination, the throughput will be about 800Mbps for both (no difference). – When 24 pairs send/receive at the same time • High end one will get 24*800Mbps • Low end one will get a total of X Mbps, X < 24*800Mbps (X can sometimes be about 5*800Mbps) – Different pairs may also have different throughput depending on the scheduling algorithm. Network level components • Topology (what) – Physical interconnection structure of the network graph. – Physically limits the performance of the networks. • Routing algorithm (which) – Restricts the set of paths that messages can follow. • Switching strategy (how) – How data in a message traverses a route (passing routers) • Flow control mechanism (when) – When a message or portions of it traverse a route – What happens when traffic collides Topology • How the components are connected. • Important properties • Diameter: maximum distance between any two nodes in the network (hop count, or # of links). • Nodal degree: how many links connect to each node. • Bisection bandwidth: The smallest bandwidth between half of the nodes to another half of the nodes. • A good topology: small diameter, small nodal degree, large bisection bandwidth. Topology • Regular topologies – Nodes are connected with some kind of patterns. • The graph has a structure. – Nodes are identified by coordinates. – Routing can usually pre-determined by the coordinates of the nodes. • Irregular topologies – Nodes are connected arbitrarily. • The graph does not have a structure, e.g. internet • More extensible in comparison to regular topology. – Usually use variations of shortest path routing. Example regular topology: complete binary tree • Nodal degree = ? • Diameter = ? • Bisection bandwidth = ? Example regular topology: ring topology 0 1 2 3 • Nodal degree = ? • Diameter = ? • Bisection bandwidth = ? 4 Routing: deciding which path to take from a source to a destination 0 1 2 3 4 • 0 to 1: 0->1 or 0->4->3->2->1 • Which path to use? This is a routing issue. • Routing objective: – Minimize resources used • Shortest path routing – The load on all links are as balanced as possible (load balancing). • ??? Classification of routing schemes 0 1 2 3 4 • 0 to 1: 0->1 or 0->4->3->2->1 • Deterministic .vs. adaptive – Deterministic – always the same route – Adaptive – choose load depending on traffic condition? • Minimal routing: always use shortest path • Source routing: the source node supplies the path • Destination routing: routing based on destination ID Switching • Communication data units: – Message – Packet – Flit • How a packet passes a switch. • Circuit switching – circuit setup, all data pass through • Packet switching: the whole packet stored in a switch, and then forwarded to the next hop Flow-control • Used between hops to make sure that when data is sent, there is available buffer for the data. • Built into switching mechanism sometimes.