Dynamic Traffic Distribution among Hierarchy Levels in Hierarchical Networks-on-Chip Ran Manevich, Israel Cidon, and Avinoam Kolodny Electrical Engineering Department Technion – Israel Institute of Technology Haifa, Israel NOCS 2013 QNoC Research Group Module Module Module Module Module Module Module Module Module Module Module Module Hierarchical un-clustered NoCs Hierarchical Rings S. Bourduas and, Z. Zilic, “Latency reduction of global traffic in wormhole-routed meshes using hierarchical rings for global routing.” ASAP 2007. PyraMesh R. Manevich, I Cidon and, A. Kolodny. “Handling global traffic in future CMP NoCs” SLIP 2012. Routing in hierarchical NoCs Phase 1 Ascent to the highest level (LMAX). Phase 2 Travel on LMAX towards the destination. Phase 3 Descent from LMAX and reach the destination. Traffic distribution among hierarchy levels Highest level LMAX defines distribution of traffic among hierarchy levels. LMAX = Packets distribution policy Highest Level LMAX defined by the hop distance (D) a packet would travel at the bottom level. DThi – Distance Threshold of level i. If D > DThi , the packet is directed to level i+1. Example: DThi = 6, 12, 20 Bottom Mesh Travel Distance (D) LMAX D>20 4 12<D≤20 3 6<D≤12 2 D≤6 1 How to distribute traffic among hierarchy levels? SHORTEST PATH? Shortest path – light load Average latencyHierarchical < Average latencyFlat Shortest path – heavy load Congestion!!! Shortest path, but not for all? The upper levels are sparse! Average latencyHierarchical >> Average latencyFlat Shortest path only for distant packets – heavy load Average latencyHierarchical < Average latencyFlat Shortest path only for distant packets – light load Traffic distribution – static vs. dynamic Traffic distribution remains constant Traffic Distribution is adapted to the traffic conditions Dynamic traffic distribution – Two modes At light traffic loads: Under heavy loads: Example - 16x16 and 32x32 NoCs Topology 16x16 [5,8] [11,19] 32x32 [4,10,50] [23,42,61] Traffic Locality Model Bandwidth Version of Rent’s Rule B – Cluster external bandwidth. k – Average bandwidth per module. G – Number of modules in a cluster. R – Rent’s exponent, 0<R<1. G = 16 B=∑ Greenfield et al., “Implications of Rent’s Rule for NoC Design and Its Fault-Tolerance”, NOCS 2007 Feedback Average buffers occupancy at the bottleneck level among the upper levels: Average Buffers Occupancy Level 2 Feedback max Average Buffers Occupancy Level NL Feedback vs. injection rate 32x32, 4 Levels PyraMesh; Rentian traffic with R = 0.8 DTrD control scheme Switch between distribution modes using 2 feedback thresholds: System architecture and implementation costs Logic: Feedback logic : <10K NAND gates. Control logic : <1K gates. Routing logic: comparable to previous schemes. Wires: Feedback links of 4 wires to <10% of the routers. 1 broadcast control bit to all bottom mesh routers. Communication: 1 mode bit in head flits. Simulation set-up Virtual channels per input port 2 Input buffer size [flits] 4 Packet size [flits] 8 Simulation clock period 2ns Hierarchical NoC sizes 16x16, 32x32 Traffic Patterns Rentian (R=0.6, 0.7, 0.8) HNOCS – NoC simulation framework for OMNET++ http://hnocs.eew.technion.ac.il/ Yaniv Ben-Itzhak et. al., NOCS 2011 Average latency vs. injection rate @ Rent’s exp. 0.6 - 0.8 Dynamic Simulation – 32x32 NoC Conclusions Static traffic distribution (STrD) in hierarchical NoCs can optimize performance under either light or heavy traffic loads, but not both at the same time. Dynamic traffic distribution (DTrD) provides optimal performance under both light and heavy loads. DTrD is lightweight, effective and feasible in future systems with many thousands of modules. DTrD is useful and desirable in any un-clustered hierarchical NoC. Thank You!