Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang Motivation & Goal Most on-chip networks (2D mesh): low-radix High-radix networks Pros: simple & short wires Cons: long network diameter & energy inefficiency (many hops) Intermediate routers: reduced a lot Small latency & lower power Goal: how does on-chip network use highradix routers to reduce latency & energy On-chip network Plentiful bandwidth due to inexpensive wires while buffers are expensive lower cost: from smaller distance By reducing number of channels & buffers Concentration: several terminal nodes share resources (routers) Latency: Reduce hop count at the expense of TS↑to get an overall reduced latency On-chip Flattened Butterfly Topology Fig. 3a Radix=10(concentration factor:4; 3:d1; 3:d2) 2 hops Longer wires-> deeper buffers Non-minimal global adaptive routing (UGAL) Load balance & performance: path diversity Routing minimally or non-minimally Non-minimal: minimal Direction-ordered routing (prevent deadlock) Only 2 VCs Bypass Channels & Microarchitecture Goal: reduce distance traveled by packets to reduce latency and energy Two types of muxes Yield arbiter to guarantee global fairness Input muxes: bypass inputs or direct inputs Output muxes: direct outputs or bypass inputs If primary input is idle, non-primary input is chosen Control packet: prevent starvation Combination of minimal and non-minimal routing Bypass Channels (continue) Switch architecture Minimal: simplified crossbar switch Non-minimal: more complexity Non-minimal with bypass channels: less complexity Flow control & routing Buffers for non-primary inputs Separate buffers for destination of control packets Modify UGAL to support bypass channels Evaluation Throughput: up to 50% throughput increase compared to concentrated mesh Power: about 38% power reduction compared to mesh Latency: about 28% latency reduction compared to mesh Scalability Lower channel increasing factor than hypercube Three ways to scale Concentrate factor Dimension of the flattened butterfly Hybrid approach Future technology helps long wires Increasing VCs will slightly reduce latency Conclusion & Concerns Flattened-butterfly: interesting idea Maximum distance between nodes=2 Non-minimal routing to balance load Bypassing channel to reduce latency Lower latency and power, high throughput compared to mesh Concerns: High channel count? (bigger than mesh & torus) Low channel utilization? (due to high channel) Control complexity? (arbitration, control packets) Bypass channel: good idea? (How about just use nonminimal or minimal?)