Handling Global Traffic in Future CMP NoCs Ran Manevich, Israel Cidon, and Avinoam Kolodny. Electrical Engineering Department Technion – Israel Institute of Technology Haifa, Israel SLIP 2012 QNoC Research Group Module Module Module Module Module Module Module Module Module Module Module Module Bandwidth Version of Rent’s Rule B – Cluster external bandwidth. k – Average bandwidth per module. G – Number of modules in a cluster. R – Rent’s exponent, 0<R<1. G = 16 B=∑ Greenfield et al., “Implications of Rent’s Rule for NoC Design and Its Fault-Tolerance”, NOCS 2007 Rent’s Exponent Reflects Traffic Locality CMP NoC Traffic Follows Rent’s Rule 2D Mesh NoC ~Average of CMP parallel programs* * Heirman et al., “Rent’s Rule and Parallel Programs: Characterizing Network Traffic Behaviour”, SLIP 2008 2D Mesh – Packets Classification by Distance For illustration purposes, packets are classified according to distances between sources and destinations. Nearest Neighbor (NN) – Local – 1<Dist<2+K/8 Global – Dist ≥ 2+K/8 K=16 K=8 Dist = 1 Fraction of global packets decreases in large systems (Nearest Neighbor) Rent’s exponent (R) = 0.7 Dominance of Global Packets in BW/Router and Light Load Latency Nearest Neighbor traffic is dominant in small systems. * In large systems: 1. Global packets are minority. 2. Global packets dominate BW/router and average latency. * Zarkesh-Ha et al., “Hybrid Network on Chip (HNoC): local buses with globalmesh architecture”, SLIP 2010 Problem!!! In large systems, global packets (minority): Consume most of the network’s BW. Significantly increase average light load latency. Solution - PyraMesh Hierarchical 2D mesh. Global packets are routed through higher hierarchy levels. Overall hops-count is reduced. Average latency is reduced. Dest. Source Average BW per router is reduced. 4 hops 5 6 1 7 2 8 3 instead of 14! PyraMesh - Architecture K – The size of the base mesh. NL – Number of levels. NP – Number of pyramids on top of the base mesh. K = 8, NL = 2, NP = 1 αi = 4, Ci = 2 αi – Ratio between the sizes of levels i and i+1. Ci – Number of routers in level i that are connected to a router in level i+1 along a single dimension. K = 8, NL = 3, NP = 1 αi = 2, Ci = 1 K = 8, NL = 2, NP = 4 αi = 4, Ci = 1 PyraMesh – Addressing and Routing Addressing – On each level i, node (X,Y)Base Mesh is represented by the nearest router in the North-East quarter: Addressi ,( X ,Y ) i 1 X Y , ; ati am at m 1 i ati Routing – XY: PyraMesh – Packets Classification Packets are distributed among levels i according to their travel distance (D) in the base mesh. DThi – Distance threshold of level i. If D > DThi , the packet is directed to level i+1. Example: DThi = 6, 12, 20 Travel Distance Highest Level D>20 4 12<D≤20 3 6<D≤12 2 D≤6 1 (Base Mesh) PyraMesh – Optimization CONSTRAINTS OPTIMIZATION OBJECTIVES Area overhead, Wiring overhead, Maximum bandwidth per router*, Average light-load latency* = F(K,NL,NP,αi,Ci,Dthi*,R*) Optimization Results Example of 16x16 System, R = 0.7 Light load latency optimized PyraMesh: Packets distance thresholds D>8 5<D≤8 D≤5 Throughput optimized PyraMesh: D>18 6<D≤18 D≤6 Light Load Latency Performance BMesh – The baseline mesh Scaled Mesh (SMesh) – Links wider than in BMesh by PyraMesh area overhead factor. HNoC – Throughput Results, R = 0.7 Our Contributions Characterization of Rentian traffic in large NoCs. The observation that global packets limit scalability of large systems. PyraMesh – A novel framework for hierarchical NoCs design. Conclusions Global packets limit performance in large (future) CMP systems. PyraMesh – A novel class of hierarchical 2D mesh topologies. PyraMesh handles global traffic in future CMP NoCs. Thank You! Related Work CMesh Hierarchical 2-Levels Mesh Hierarchical Rings on2D a Mesh Long GigaNoC Range Links J.Bourduas Puttmann, D. Winter Balfour and J.-C. W. Niemann, J. “Latency Dally. M. “Design Porrmann, tradeoffs and U. for Rückert. tiled “GigaNoC on-chip networks”. – routing A hierarchical and Prusseit Gerhard P. Fettweis. Hierarchical architectures S. Z.Steffen Zilic. reduction of global traffic in wormhole-routed meshes usingin U.Markus Y.C. Ogras andand R. Marculescu. “ ‘It’s aand small world after all’: CMP NoC performance optimization network-on-chip International for scalable on chip-multiprocessors.” Supercomputing, 2006.Euromicro DSD(VLSI) 2007.Syst. 2006. 2D-mesh networks-on-chip. ISOCC 2010. hierarchical rings for global routing”. ASAP 2007. viaclustered long-range linkConference insertion”. IEEE Trans. on Very Large Scale Integr.