Flattened Butterfly Topology for On

advertisement
Flattened Butterfly Topology
for On-Chip Networks
John Kim, James Balfour, and William J.
Dally
Presented by Jun Pang
Motivation & Goal

Most on-chip networks (2D mesh): low-radix



High-radix networks



Pros: simple & short wires
Cons: long network diameter & energy inefficiency
(many hops)
Intermediate routers: reduced a lot
Small latency & lower power
Goal: how does on-chip network use highradix routers to reduce latency & energy
On-chip network


Plentiful bandwidth due to inexpensive wires
while buffers are expensive
lower cost: from smaller distance



By reducing number of channels & buffers
Concentration: several terminal nodes share
resources (routers)
Latency:

Reduce hop count at the expense of TS↑to get an
overall reduced latency
On-chip Flattened Butterfly

Topology




Fig. 3a
Radix=10(concentration factor:4; 3:d1; 3:d2)
2 hops
Longer wires-> deeper buffers
Non-minimal global adaptive routing (UGAL)




Load balance & performance: path diversity
Routing minimally or non-minimally
Non-minimal: minimal Direction-ordered routing
(prevent deadlock)
Only 2 VCs
Bypass Channels & Microarchitecture


Goal: reduce distance traveled by packets to reduce
latency and energy
Two types of muxes



Yield arbiter to guarantee global fairness



Input muxes: bypass inputs or direct inputs
Output muxes: direct outputs or bypass inputs
If primary input is idle, non-primary input is chosen
Control packet: prevent starvation
Combination of minimal and non-minimal routing
Bypass Channels (continue)

Switch architecture




Minimal: simplified crossbar switch
Non-minimal: more complexity
Non-minimal with bypass channels: less
complexity
Flow control & routing



Buffers for non-primary inputs
Separate buffers for destination of control packets
Modify UGAL to support bypass channels
Evaluation



Throughput: up to 50% throughput increase
compared to concentrated mesh
Power: about 38% power reduction
compared to mesh
Latency: about 28% latency reduction
compared to mesh
Scalability


Lower channel increasing factor than
hypercube
Three ways to scale





Concentrate factor
Dimension of the flattened butterfly
Hybrid approach
Future technology helps long wires
Increasing VCs will slightly reduce latency
Conclusion & Concerns

Flattened-butterfly:






interesting idea
Maximum distance between nodes=2
Non-minimal routing to balance load
Bypassing channel to reduce latency
Lower latency and power, high throughput compared to
mesh
Concerns:




High channel count? (bigger than mesh & torus)
Low channel utilization? (due to high channel)
Control complexity? (arbitration, control packets)
Bypass channel: good idea? (How about just use nonminimal or minimal?)
Download