Do We Need Wide Flits in Networks-On-Chip? Junghee Lee, Chrysostomos Nicopoulos, Sung Joo Park, Madhavan Swaminathan and Jongman Kim Presented by Junghee Lee Introduction • Increasing number of cores Communication-centric Packet-based Networks-on-Chip • Unit – Packet: a meaningful unit of the upper-layer protocol – Flit: the smallest unit of flow control maintained by NoC • If a packet is larger than a flit, a packet is split into multiple flits • The flit size usually matches with the physical channel width 2 Motivation 256 64 or 128 Research papers Intel Sandy Bridge 144 Intel SingleWhat is the optimal flit size Chip Cloud in Networks-on-Chip for general purpose computing? 256 or 512 Research papers 3 160 Tilera Multifaceted Factors Global Wires A first attempt in drawing balanced conclusion Cost of Router Throughput Flit Size Latency 4 Workload Assumed NoC Router Architecture d v p c 5 Packet and Flit Header 6 Payload Simulation Environment 7 Parameter Default Value Simulator Simics + GEMS (Garnet) Benchmark PARSEC Number of processors 64 Operating system Linux Fedora L1 cache size 32 KB L1 cache number of ways 4 L1 cache line size 64 B L2 cache (shared) 16 MB, 16-way, 128-B line MSHR size 32 for I- and 32 for D- cache Main memory 2 GB SDRAM Cache coherence protocol MOESI directory Topology 2D mesh Default NoC Parameters 8 Parameter Default Value Number of virtual channels 3 Buffer depth 8 flits per virtual channel Number of pipeline stages 4 Number of ports 5 Header overhead 16 bits Key Questions Can we afford wide flits as technology scales? Is the cost of wide-flit routers justifiable? How much do wide flits contribute to overall performance? Do memory-intensive workloads need wide flits? Do we need wider flits as the number of processing elements increases? 9 #1) Global Wires Can we afford wide flits as technology scales? Item Unit Technology nm 65 45 32 22 Chip size* mm2 260 260 260 260 Transistors* MTRs 1106 2212 4424 8848 Global wiring pitch* nm 290 205 140 100 Power index* W/GHz cm2 1.6 1.8 2.2 2.7 Total chip power* W 198 146 158 143 1.00 1.53 1.66 2.28 Normalized power portion Value Technology scaling does not allow for a direct widening of the flits because the power portion of the global wires increases as technology scales * International Technology Roadmap for Semiconductors (ITRS) 2009 and 2011 10 #2) Cost of Router Is the cost of wide-flit routers justifiable? Cost of buffers Flit size Buffer depth Number of virtual channels Cost of switch (Flit size)2 (Number of ports)2 Switch Cost Flit size 2 cost of router 2.97 Flit size 4 cost of router 10.10 Buffer If the performance improvement does not compensate for the increase in the cost, widening of the flit size is hard to justifyFlit size 11 #3) Latency How much do wide flits contribute to overall performance? • The network traffic usually consists of packets of different sizes – ls: The size of shortest packet – ll: The size of longest packet Latency Suggested rule of thumb: Flit size = shortest packet size + header overhead Flit size ls+h 12 ll+h #4) Workload Characteristics Do memory-intensive workloads need wide flits? Application Cache misses / Kcycle / node Injected packets / Kcycle / node Blackscholes 0.41 2.21 Freqmine 0.28 1.48 Streamcluster 0.48 2.42 Vips 0.23 1.27 X264 0.28 1.54 The injection rate of real 0.67 applications is3.56 far less Bodytrack than the typical point of NoC1.43 Ferret saturation 0.26 Self-throttling effect [34] Fluidanimate 0.24 1.35 Up to 64 cores, we can keep the rule of thumb Swaptions 0.38 2.04 because of the low injection rate 13 #5) Throughput Do we need wider flits as the number of processing elements increases? • Widening the flit is not a cost-effective way because of fragmentation • If widening the physical channel is the only option for increasing the throughput, we suggest using physically separated networks Latency One 80-bit network One 160-bit network Two 80-bit networks Flit size 14 Conclusions Can we afford wide flits as technology scales? No, unless the power budget for NoC increases Is the cost of wide-flit routers justifiable? No, the cost increases sharply with the flit size How much do wide flits contribute to overall performance? Until the flit size reaches the shortest packet size Do memory-intensive workloads need wide flits? No, because of self-throttling effect Do we need wider flits as the number of processing elements increases? No, because of fragmentation 15 Final Conclusion • Suggested rule of thumb: Flit size = shortest packet size + header overhead • This paper provides a comprehensive discussion on all key aspects pertaining to the NoC’s flit size • This exploration could serve as a quick reference for the designers/architects of general-purpose multi-core microprocessors who need to decide on an appropriate flit size for their design. 16 Thank you! 17 Questions? Contact info Junghee Lee junghee.lee@gatech.edu Electrical and Computer Engineering Georgia Institute of Technology 18