FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers Gwangsun Kim, John Kim Sungjoo Yoo Dept. of Computer Science Dept. of Electronic and Electrical Engineering Korea Advance Institute of Science and Technology Pohang University of Science and Technology Motivation Buffer size has a huge impact on performance. Buffers take a large portion of router power. However, not all of the buffers are fully utilized even at a high load. Buffer 70% size Average utilization of buffers Average latency (cycles) On-chip network is becoming more critical. Router Power Breakdown Allocator 803% 60% Clock buffer 70 50% 60 40% 50 30% 40 20% 30 10% 20 16% 2 4 8 16 32 Crossbar Switch 35% Input buffer 0% 10 0 1 0 46% 0.2 0.4 0.6 0.8 1 0 Injection 0.1 Rate 0.2 (flits/node/cycle) 0.3 0.4 Injection [Kumar etrate al., (flits/node/cycle) ICCD’07] Use power-gating and turn off unused entries! Our Approach Dynamically adjust the active window size. • Active window: set of ON (or active) entries of a buffer. Active window At a low traffic load F F ON At a high traffic load F F F OFF F F F F Issue 1: Flow Control Need to communicate the availability of buffers Case 1: Increase the active window size using early credit Router 0 F CR 21 Router 1 flit credit Router 2 flit ON OFF credit When? There is an incoming flit. There is an OFF buffer entry. There is congestion in both upstream and local router. Issue 1: Flow Control (cont’d) Case 2: decrease the active window size by withholding credit. Router 0 Router 1 flit CR 2 F F credit Router 2 flit credit When? There is an outgoing flit. There is more than the minimum # of ON entries. Issue2: Circular Queue Problem When utilization is low, each incoming flit turns on an entry. → Each activation of an entry incurs power overhead! Problematic circular buffer • Each flit activates an entry. OFF ON FLIT OFF0 FLIT 1 OFF ON FLIT 2 OFF ON ON FLIT 3 OFF ON FLIT 4 OFF Large power overhead Ideal buffer management • The same entry is reused. OFF ON 0 2 FLIT 4 1 3FLIT OFF OFF OFF OFF No power overhead Split Queue A buffer is separated into two regions. Use the primary region only (as long as possible). Adjust the active window size dynamically. Operate like a circular queue Unified FLIT ON 0 mode Primary region FLIT ON 1 ON FLIT 2 OFF ON OFF Secondary region OFF OFF Not used Split Queue (cont’d) Cannot stay in the unified mode indefinitely. Switch to split mode. When the primary region is empty, Switch back to unified mode. FLIT 3 ON Primary region FLIT ON 1 FLIT ON 2 OFF Primary region FLIT 4 OFF ON Secondary regionFLIT 5 OFF ON OFF ON Secondary region OFF Yet, there are unused entries. Unified Split queue mode Primary is empty! Flits are region read out from here. Next flit’s place is NOT available. Flits are written to here. Summary of Evaluation Simulator : Cycle-accurate OCN simulator - Booksim Power Measurement - Orion 2.0 Parameter Topology Parameter 8x8 2D mesh Technology node # of VCs 4 Clock frequency VC buffer depth 8 Vdd Performance 75 60 45 30 15 0 0 0.1 0.2 0.3 0.4 Injection rate (flits/node/cycle) 1.5GHz 1.0 V FlexiBuffer (SQ) Total router power (w) Average latency (cycles) baseline 32nm 0.25 Power consumption 0.2 0.15 13% 0.1 0.05 39% 0 0 0.25 0.5 0.75 1 Injection rate (flits/node/cycle) Conclusions There’s a huge opportunity of power-saving with finegrained power gating when buffers are large. Proposed modified credit-based flow control. Split queue is proposed to minimize activation power overhead. Our simulation results show that, with minimal performance loss, FlexiBuffer + SQ can save 39% of router power at low traffic load 13% of router power at high traffic load Thank you! Questions? For more discussion, please come to my poster!