University of Ottawa ELG7187 - Scribing Interconnection Networks on Chip: Topologies, Routing John-Marc Desmarais Student Number 3198863 Contents Interconnection networks on chip: Topologies, Routing 3 Lecture Review 3 Definitions 4 Performance Metrics 5 Types of Topologies 7 Topology Examples 8 Presentation - Autotuning of Network Parameters 10 Introduction 10 Adaptive Techniques 10 Communication Architecture Tuners 11 Analysis for Design of Communication 11 Architectures 11 Communication Partitioning 12 Parameter Generation 12 Performance Analysis 13 Example Networks Using CAT 13 CONCLUSION 14 Conclusion 14 Future Work 14 References 15 2 Interconnection networks on chip: Topologies, Routing Lecture Review This lecture concentrates on performance and performance metrics of interconnection networks on a chip. We discussed the difference between direct and indirect networks and showed several examples of each type. We also looked at the important metrics used for evaluating System on Chip (SoC) networks. This review will begin by providing basic definitions for components of an interconnection network, will then provide useful performance metrics which can be used to evaluate and compare interconnection networks. Several topologies will then be described and analyzed based on these performance metrics. 3 Definitions Channel: A network channel is a conduit over which information traffic can flow. It is synonymous with connection. Connections: A network connection is a conduit over which information traffic can flow. It is synonymous with channel. Direct Network: A direct network is one in which each node contains a router. Flit: A flit is the largest amount of network traffic that can be transmitted during one cycle. Head Flit: The head flit is the first flit of a packet. Indirect Network: An indirect network is one in which each node does not contains a router. A router may be responsible for sending traffic to any of several nodes. Node: A network node is a source or a sink for information which will flow on the network. Network: A network is a combination of nodes, connections and routers. Packet: A packet is a logical division of information that is to be transmitted from one node to another, it usually consists of several flits. Router: A router is a device on the network which contains routing tables and algorithms which describe how information traffic will flow through the connections. Tail Flit: The tail flit is the last flit of a packet. Topology: A topology is a formal definition of the layout of nodes and connections. It defines which nodes are connected to which routers and which routers are interconnected. 4 Performance Metrics Degree: The degree of a network is the number of links to a given node. This can be used to estimate the number of connection wires in the network. E.g. In a direct network, a 3x3 router will have a node degree of two. Since one of the connections must connect to the core. Minimum Hop Count: The hop count is the minimum number of routers between a source and a destination. This can be used to estimate the network latency. Or, the number of separate connection lines that a packet needs to travel from source to destination. Network Diameter: The network diameter is the largest minimum hop count in the network. Calculating the minimum hop count between all of the nodes in the network and then selecting the largest of these will give the network diameter. This can be used to estimate the worst case network latency. Average Minimum Hop Count: The average minimum hop count is calculated by taking all of the possible source/destination pairs in the network. Counting the minimum number of hops between these two ends and then averaging all of these values will provide the average minimum hop count. Latency: The latency is the amount of time required for a packet to reach its destination through the network. 5 Head Latency: The head latency is the amount of time required for the head flit to reach its destination node through the network. Maximum Channel Load: The maximum channel load is the maximum amount of traffic that can flow between routers concurrently. When the network reaches its maximum channel load, it becomes saturated and cannot accept any more traffic. Bandwidth: The network bandwidth is defined as the number of bits per second that can be injected into the network before it saturates. Also, 𝜔 = 𝜔𝑐 ∗ 𝑓𝑐 Where ω is the network bandwidth, ωc is the bandwidth of channel c and fc is the frequency of injection into channel c. Channel Load: The channel load is the amount of traffic being inserted into a channel measured in bits/second. Bisection Width: The bisection width is the number of connections that need to be cut in order to divide the network into two networks with an equal number of nodes (plus or minus one node). Uniform traffic implies that ½ of the traffic will pass through this bisection. 6 Types of Topologies Switched: Bus and crossbar networks are switched networks. A bus network connects inputs to outputs using a single connection line for all nodes. Whereas a crossbar network connects each input to each output using separate connection lines, one for each input/output pair. A crossbar network is also known as “fully connected”. Direct/Indirect: As previously mentioned, direct networks have a router on each node whereas indirect networks separate the nodes from the routers. Static/Dynamic: A static network has fixed links between routers whereas a dynamic network can build this connections on the fly. A fully connected network (crossbar) is a static network whereas a bus network connects two nodes on the bus only when these two nodes need to communicate and hence, is dynamic. 7 Topology Examples Torus : A torus network is a network in which each node is connected to k other nodes based on the dimensionality of the torus. For example a two-dimensional torus will have four connections away from a given node. In general, a node in a k-dimension torus will connect to 2k other nodes. E.g. In this 3x3 two dimensional torus. Each node is connected to four other nodes. For any NxN two dimensional torus each node will be connected to four other nodes. Degree: 2k for a k dimensional torus. Minimum Hop Count: From (1) 𝐻𝑚𝑖𝑛 𝑛𝑘 , 4 ={ 𝑘 1 𝑛( − ), 4 4𝑘 𝑘 𝑖𝑠 𝑒𝑣𝑒𝑛 𝑘 𝑖𝑠 𝑜𝑑𝑑 Where n is the number of nodes in the torus and k is the dimensionality. Number of Connections: Each node has degree of 2k. So the number of connections is 2𝑛𝑘 where n is the number of nodes and k is the dimensionality of the torus. 8 Mesh: If the wrap-around connections are removed from a torus, a mesh network is created. A mesh network will have one or more fewer connections on border edge nodes and corner nodes. E.g. In this 3x3 two dimensional mesh network, the center node is connected to four other nodes. The border edge nodes are connected to three other nodes, while the corner nodes are only connected to two other nodes. Degree: 2k for a k dimensional mesh. Minimum Hop Count: From (1) 𝐻𝑚𝑖𝑛 𝑛𝑘 , 3 ={ 𝑘 1 𝑛( − ), 3 3𝑘 𝑘 𝑖𝑠 𝑒𝑣𝑒𝑛 𝑘 𝑖𝑠 𝑜𝑑𝑑 Where n is the number of nodes in the mesh and k is the dimensionality. Number of Connections: Each node has degree of 2k. Each node on an edge will have one fewer connection. For a two dimensional mesh, there will be 2𝑘 ∗ √𝑛 fewer nodes. For a higher dimensionality, the square 𝑘 root changes to cube root, 4th root etc. This gives us 2𝑘 ∗ √𝑛 fewer nodes than the torus. 𝑘 So the number of connections is 2𝑛𝑘 − 2𝑘 √𝑛 where n is the number of nodes and k is the dimensionality of the mesh. 9 Presentation - Autotuning of Network Parameters Introduction For the presentation portion of this paper the following questions were provided: 1. How can one perform monitoring of the network parameters including temporal characteristics of the traffic and current parameters of the networking protocol? 2. How can one report the parameters in real-time? 3. Which dynamic parameters can be monitored (routing protocol, burstiness of the traffic or priority levels of the packets, ...) 4. How can these parameters be used for autotuning? What can be tuned? Can this information help in dispatching threads? The presentation concentrated on using Communication Architecture Tuners to monitor and update network flow protocols on the fly. Two examples of systems that use the Communication Architecture Tuners to control network flow protocols are then given. Adaptive Techniques As connection network decrease in size unpredictable network congestion and link failures can occur. If a router continues to send data toward broken links, the network will block. Adaptive techniques will help prevent this blocking. Even after network design is complete adaptive techniques can still be used to adapt the network packet priorities. This adaptation can route packets around dead links, can send higher priority packets through busy links and can allow protocol and priority changes on the fly. Adaptive techniques do not change the underlying network topology, but can alter some network parameters thereby changing the on-chip network behaviour. 10 Communication Architecture Tuners A Communication Architecture Tuner (CAT), “provides the underlying communication architecture with an ability to adapt to runtime variations in the communication needs of system components”. (2) Figure 1 - Example CAT Network The CAT is a mechanism which allows adaptation of network parameters based on monitored events. In order to use this most effectively then, we need know to which dynamic parameters can be monitored. Lahiri et al. did their initial research based on monitoring the priority of packets through the network, but they also go on to claim that any network parameters can be monitored and used to dynamically alter the network flow. Parameters mentioned include burst modes, burst sizes, endianness, split transactions etc. (3). Therefore, any component on the interconnection network is a candidate for communication tuning. Analysis for Design of Communication Architectures There are three analysis methods for the design of interconnection networks. These are system simulation, static estimation and trace-based techniques. System simulation is a method by which the network and traffic flows are modelled using a computer. These models are then simulated, analyzed and then altered to determine the best traffic flow protocols and topologies for a given task and constraints. Simulation is not feasible for large systems and may be less accurate than any of the other methods. Static estimation requires the use of static models of communication latencies or power requirements. This is based on the assumption that network flow scheduling can be performed statically. 11 Finally, trace-based techniques begin with a detailed simulation of the network and based on flow patterns, the priority of any given packet between routers can be altered. The design space exploration space for interconnection networks can be viewed in a few different directions. Network topology, communications protocols, and path optimization are three areas for consideration. As we optimize path size, and as components get faster and their associated networks get smaller, we also run into problems of network congestion and link failure. (2) Adaptive techniques can assist with these problems. If a link goes down, an adaptive network can route around the dead area until it is available again. Likewise if there is a good deal of traffic between two nodes, adaptive networks can route around this or fix priority levels so that packets can get through the congested channel. The design flow for an interconnection network which can dynamically adapt to changes in network flow patterns is as follows. (3) Figure 2 - Design Flow Communication Partitioning A communication partition consists of a subset of the communication traffic generated by a component. A component can generate packets (tx), control flow events (ey), communication requests (cz) and possible application specific properties for the network. A full network partition can be described by its first and last request (e.g. <t1•c3>,<t1•c5>) can represent a packet t1 that requires three communication requests in order to pass through the network). The sub-partitions (or any request for network flow, e.g. flit) need to be identified by an identifier and a communication request. Parameter Generation Parameters are generated by the CAT to dynamically affect the network flow. These parameters come from LUTs containing specific precomputed values for each protocol. Lahiri et al. (3) use a priority heuristic to populate the LUTs, but any network parameters can be used to build heuristics to populate the LUTs. Other heuristics the can be used to populate the LUTs such as burst size, burst mode, packet size, etc. 12 Once these LUTs are populated in the initial design, they can be updated dynamically based on monitored network parameters according to a given heuristic. Performance Analysis CAT LUTs are then updated until no further performance increase is seen. Continual changes to the LUTs can be made as long as there is a better performance. Once there is no more improvement, there is not need to change the LUTs unless a change occurs in the monitored network parameters. Example Networks Using CAT LOTTERYBUS LOTTERYBUS is a technique for dynamically tuning the network based on CAT. In LOTTERYBUS the arbiters in Figure 1 are replaced by randomizing arbiters. The LUTs provide a pre-set number of lottery tickets based on requester type and location in the network. The when a bus or channel is free, the arbiter chooses one ticket from the tickets that the LUT has provided randomly and provides access to the channel based upon which packet is drawn. LOTTERYBUS is topology and protocol agnostic. The only change from a CAT system is in the channel arbiters. LOTTERYBUS can monitor any of the network parameters that can be monitored by any CAT system. LUTs are still used to assign priorities to packets, but these priorities take the form of the number of “lottery tickets” provided to each packet. These LUTs can be statically set when designing the network but can also be updated dynamically based on monitored network parameters. FLEXBUS FLEXBUS is a technique for dynamically tuning the network based on CAT. In FLEXBUS network traffic is continually monitored and when network channels are not available, the system has the ability to route around these unavailable links. FLEXBUS can be applied to any network topology and any communications protocols. FLEXBUS continually monitors variations in network traffic. If a channel is congested or if link failure occurs, components on the network can be dynamically routed to different busses. 13 CONCLUSION Conclusion We have seen that the CAT process can be used to dynamically configure the network at design time and on the fly. Extending on this idea, we have seen random and dynamic routing system that can be used to reduce network latency. In reference to our introduction questions then: 1. How can one perform monitoring of the network parameters including temporal characteristics of the traffic and current parameters of the networking protocol? One can use Communication Architecture Tuners (CAT) in order to monitor the network parameters and temporal characteristics of the network. 2. How can one report the parameters in real-time? The Communication Architecture Tuner can automatically update priority lookup tables to affect the routing of packets on the network. Updates to the lookup tables can happen in real time and it would not be a stretch to output the tables as a report. 3. Which dynamic parameters can be monitored (routing protocol, burstiness of the traffic or priority levels of the packets, ...) Any of the aforemention network parameters can be monitored and used to update the flow priorities in the system. 4. How can these parameters be used for autotuning? What can be tuned? Can this information help in dispatching threads? Packet priority lookup tables are located in each router and can be tuned dynamically. Future Work Knowing that routing heuristics can be based on any network parameters, work can be done to codify the quality and utility of any and all network parameters for use in a CAT system. The CAT process is based on reducing network latency while using priority based routing. It may be interesting to see if it could be used to dynamically reduce the power requirements of an interconnection network on a chip 14 References (1) N.Enright Jerger, ECE 1749H: Interconnection Networks for Parallel Computer Architectures – Topology. http://www.eecg.toronto.edu/~enright/interconnects-topology.pdf. November 2010. (2) K. Lahiri, S. Dey, and A. Raghunathan, Design of Communication Architectures for High-Performance and Energy-Efficient System-on-Chips book chapter, in Multiprocessor Systems-on-Chips. Morgan Kaufmann, 2004. (3) K. Lahiri, A. Raghunathan, G. Lakshminarayana, S. Dey.; "Design of high-performance system-on-chips using communication architecture tuners," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on , vol.23, no.5, pp. 620- 636, May 2004 (4) K. Lahiri, A. Raghunathan, G. Lakshminarayana , "The LOTTERYBUS on-chip communication architecture," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on , vol.14, no.6, pp.596608, June 2006 (5) K. Sekar, K. Lahiri, A. Raghunathan, S. Dey, "FLEXBUS: a high-performance system-on-chip communication architecture with a dynamically configurable topology," Design Automation Conference, 2005. Proceedings. 42nd , vol., no., pp. 571- 574, 13-17 June 2005 15