R R Module R Module R Module Efficient Link Capacity and R R Module QoS Design for Wormhole Module Module Network-on-Chip R R Module R Module R Module R Module Zvika Guz, Walter, Evgeny Module R R Bolotin, R R Isask’har Israel Cidon, Ran Ginosar and Avinoam Kolodny Module Module R R Technion, Israel Institute of Technology Module Module Problem Essence How much capacity [bits/sec] should be assigned to each link? - All flows must meet delay requirements - Minimize total resources R R R R R DATE’06 NoC Capacity Allocation R R R R R R R R R R R R 2 Outline Wormhole based NoC The problem of link capacity allocation Solution: - Wormhole delay model - Capacity allocation algorithm Design examples Summary R R Module R Module R Module Module R R Module Module R R Module Module R R Module R Module R R Module R R Module Module R Module DATE’06 NoC Capacity Allocation R Module 3 Outline Wormhole based NoC The problem of link capacity allocation Solution: - Wormhole delay model - capacity allocation algorithm R R Module Design examples Summary R Module R Module Module R R Module Module R R Module Module R R Module R Module R R Module R R Module Module R Module DATE’06 NoC Capacity Allocation R Module 4 Wormhole Switching Suits on chip interconnect Small number of buffers Low latency Virtual Channels IP2 Interface IP1 DATE’06 NoC Capacity Allocation Interface - interleaving packets on the same link 5 Wormhole Switching Suits on chip interconnect Small number of buffers Low latency Virtual Channels IP2 IP3 Interface Interface IP1 DATE’06 NoC Capacity Allocation Interface - interleaving packets on the same link 6 Outline Wormhole based NoC The problem of link capacity allocation Solution: - Wormhole delay model - Capacity allocation algorithm Design examples Summary R R Module R Module R Module Module R R Module Module R R Module Module R R Module R Module R R Module R R Module Module R Module DATE’06 NoC Capacity Allocation R Module 7 NoC Design Flow Define intermodule traffic R R R Module R Module Module R R R Module R R Module Module R Allocate link capacities Module R Module Place modules R R Module R R R Module R Module R R Module R R Module R Module R R Module Verify QoS and cost DATE’06 NoC Capacity Allocation 8 NoC Design Flow Define intermodule traffic R R Module R Module R Module Module R R Module Module R Allocate link capacities Module Module R Place modules R R Module R Module R R Module R R Module Module R Module R Module Verify QoS and cost Too low capacity results in poor QoS Too high capacity wastes power/area DATE’06 NoC Capacity Allocation 9 Capacity Allocation Problem Simulation takes too long a simulation based solution is not scalable If no simulations are used: - How to extract flows’ delays? - How to reassign capacity? Our solution: - Analytical model to forecast QoS - Capacity allocation algorithm that exploit the model DATE’06 NoC Capacity Allocation 10 Outline Wormhole based NoC The problem of link capacity allocation Solution: - Wormhole delay model - Capacity allocation algorithm Design examples Summary R R Module R Module R Module Module R R Module Module R R Module Module R R Module R Module R R Module R R Module Module R Module DATE’06 NoC Capacity Allocation R Module 11 Delay Analysis Approximate per-flow latencies Given: - Network topology - Link capacities - Communication demands R R R R s1 R R R s2 R R R R R R d2 R R R R d1 DATE’06 NoC Capacity Allocation 12 Why Previous Models Do Not Apply? Because they assume: - Symmetrical communication demands - No virtual channels - Identical link capacity! Generally, they calculate the delay of an “average flow” - A per-flow analysis is needed DATE’06 NoC Capacity Allocation 13 Wormhole Delay Analysis The delivery resembles a pipeline pass IP2 Interface Packet transmission can be divided into two separated phases: - Path acquisition - Packet delivery We focus on packet delivery phase IP1 DATE’06 NoC Capacity Allocation Interface 14 Packet Delivery Time Packet delivery time is dominated by the slowest link IP2 Interface Low-capacity link IP1 DATE’06 NoC Capacity Allocation Interface - Transmission rate - Link sharing 15 Packet Delivery Time Packet delivery time is dominated by the slowest link IP2 IP3 Interface Interface IP1 DATE’06 NoC Capacity Allocation Interface - Transmission rate - Link sharing 16 Analysis Basics Determines the flow’s effective bandwidth Per link Account for interleaving t DATE’06 NoC Capacity Allocation 17 Single Hop Flow, no Sharing 1 t 1 l Cj i j t - mean time to deliver a flit of flow i over link j [sec] i j C j - capacity of link j [bits per sec] l - flit length [bits/flit] i j - total flit injection rate of all flows sharing link j except for flow i [flits/sec] t DATE’06 NoC Capacity Allocation 18 Single Hop Flow, with Sharing 1 t 1 i l Cj j i j Bandwidth used by other flows on link j t - mean time to deliver a flit of flow i over link j [sec] i j C j - capacity of link j [bits per sec] l - flit length [bits/flit] i j - total flit injection rate of all flows sharing link j except for flow i [flits/sec] t DATE’06 NoC Capacity Allocation 19 The Convoy Effect Consider inter-link dependencies - Wormhole backpressure - Traffic jams down the road t ij 1 i 1 C j j l t ji t ij k | k ij Account for all subsequent hops DATE’06 ik l tki Ck dist i ( j , k ) Link Load Basic delay weighted by distance NoC Capacity Allocation 20 Total Packet Transmission Time Weakest link dominates packet delivery time Packet size [flits/packet] T i = m max(t | j ) i i j i Account for weakest link T i - mean packet latency for flow i [sec] DATE’06 NoC Capacity Allocation 21 Outline Wormhole based NoC The problem of link capacity allocation Solution: - Wormhole delay model - Capacity allocation algorithm Design examples R R Module R Module Module Module R R Module Module R Summary R R Module Module R R Module R Module R R Module R R Module Module R Module DATE’06 NoC Capacity Allocation R Module 22 Capacity Allocation Algorithm Greedy, iterative algorithm For each src-dst pair: DATE’06 Use delay model to identify most sensitive link Increase its capacity Repeat until delay requirements are met NoC Capacity Allocation 23 Outline Wormhole based NoC The problem of link capacity allocation Solution: - Wormhole delay model - Capacity allocation algorithm Design examples R R Module R Module Module Module R R Module Module R Summary R R Module Module R R Module R Module R R Module R R Module Module R Module DATE’06 NoC Capacity Allocation R Module 24 Capacity Allocation – Example#1 Uniform traffic with identical requirements Uniform allocation: 74.4Gbit/sec Capacity allocation algorithm: 69Gbit/sec Before optimization After optimization DATE’06 25 Capacity Allocation – Example#2 A SoC-like system - Heterogeneous traffic demands and delay requirements Uniform allocation: 41.8Gbit/sec Capacity allocation algorithm: 28.7Gbit/sec Before optimization After optimization DATE’06 26 Outline Wormhole based NoC The problem of link capacity allocation Solution: - Wormhole delay model - Capacity allocation algorithm Design Examples Summary R R Module R Module R Module Module R R Module Module R R Module Module R R Module R Module R R Module R R Module Module R Module DATE’06 NoC Capacity Allocation R Module 27 Summary SoCs need non uniform link capacities - Capacity allocation Wormhole delay analysis - Heterogeneous link capacities - Heterogeneous communication demands - Multiple VCs Greedy allocation algorithm Design examples - NoC cost considerably reduced DATE’06 NoC Capacity Allocation 28 Questions? QNoC Research Group DATE’06 NoC Capacity Allocation Module Module Module Module Module Module Module Module Module Module Module Module QNoC Research Group 29 Backup DATE’06 NoC Capacity Allocation 30 QNoC Architecture Grid topology Router Packet-switched Wormhole switching Fixed path XY routing Heterogeneous link capacities Quality-of-Service Link R Module R Module R Module R R R R R R R Module R Module R Module Module Module Module R R R R Module R R R R Module Module Module Module R Module Module Module Module Module Module Module Module R R Module R Module R Module E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny, “QoS Architecture and Design Process for Cost-Effective Network on Chip”, Journal of Systems Architecture, 2004 DATE’06 NoC Capacity Allocation 31 Analysis Validation Analytical model was validated using simulations Analysis and Simulation vs. Load Normalized Delay - Different link capacities - Different communication demands DATE’06 NoC Capacity Allocation Utilization 32 Slack Elimination Slack [%] Packet Delay Slack Flow DATE’06 NoC Capacity Allocation 33