PPT - Microarch.org

advertisement
NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers
Lizhong Chen, Timothy M. Pinkston
University of Southern California
Node-Router Decoupling
Problems in Applying Power-gating to Routers:
• Intensified BET limitation
- Intermittent packet arrivals break long idle periods into fragments
- For PARSEC, 61% of total number of idle periods is below BET
• Cumulative wakeup latency in multi-hop NoCs
- Worse for larger networks
• Disconnection problem
- Idle period is upper bounded by
local node’s traffic
- Disconnected network
Advantages of NoRD: Solving All Three Problems:
• Mitigate BET limitation: use bypass paths instead of waking up routers
• Hide wakeup latency: use bypass paths while routers are waking up
• Eliminate disconnection: all nodes are always connected by bypass ring
Power-gating overhead energy
blackscholes
bodytrack
Simulation Platform:
• Platform: Simics + Gems (Garnet+Orion2.0)
• Workloads: PARSEC 2.0 + Synthetic traffic
Schemes Under Comparison:
• No power-gating (No_PG)
• Conventional power-gating (Conv_PG)
- Apply power-gating technique conventionally to routers
• Optimized conventional power-gating (Conv_PG_OPT)
- Conv_PG + early wakeup (hide some wakeup latency)
• Node-router decoupling (NoRD)
dedup
fluidanimate
raytrace
swaptions
vips
x264
NORD
Conv_PG_OPT
No_PG
Conv_PG
NORD
Conv_PG_OPT
Conv_PG
NORD
No_PG
Conv_PG_OPT
No_PG
Conv_PG
NORD
Conv_PG
Conv_PG_OPT
NORD
No_PG
Conv_PG_OPT
No_PG
Conv_PG
NORD
Conv_PG
Conv_PG_OPT
NORD
ferret
AVG
Performance:
• Average packet latency penalty
- Conv_PG: 63.8%, Conv_PG_OPT: 41.5%, NoRD: 15.2%
• Execution time penalty
- Conv_PG: 11.7%, Conv_PG_OPT: 8.1%, NoRD: 3.9%
No_PG
Evaluation Methodology
canneal
No_PG
0%
Conv_PG
Conv_PG_OPT
No_PG
NoRD
40
35
30
25
20
15
10
5
0
Conv_PG
Conv_PG_OPT
NoRD
130%
45
Execution time (norm. to No_PG)
Two Concerns:
• Breakeven-time (BET): the minimum number of gated-off idle
cycles to offset power-gating energy overhead (~10 cycles for router)
• Wakeup latency: around 10 to 15 cycles for router
power-gating overhead
20%
Conv_PG_OPT
Power-gating Challenges
• The red ones are
performance-centric routers
• The blue ones are
power-centric routers
router static power
No_PG
Canonical router at 45nm and 1.0V
router dynamic power
40%
Conv_PG
32nm
link dynamic power
NORD
45nm
link static power
60%
Conv_PG
65nm
The left figures shows the
classification of routers:
80%
Conv_PG_OPT
1.2V 1.1V 1.0V 1.2V 1.1V 1.0V 1.2V 1.1V 1.0V
100%
NORD
0%
NoRD
120%
No_PG
Clock_static 4%
Conv_PG_OPT
Overall NoC Energy Savings:
• Conv_PG: 9.1%, Conv_PG_OPT: 9.4%, NoRD: 20.6%
• Static energy savings vs. dynamic energy losses
Conv_PG
20%
Increasing NoRD Efficiency:
• Routers have different impact on performance based on their location
• Classify routers in to performance-centric class and power-centric class
• Wake up early a few performance-critical routers to improve
performance by adding “shortcuts” in routing
• Wake up late the rest (majority) of the routers to save more static power
by allowing those routers to stay in gated-off state for a longer time
• Use an off-line program based on Floyd-Warshall all-pair shortest path
algorithm to classify routers in this work; further exploration can be
done for future work
Conv_PG_OPT
Xbar_static 5%
NoRD
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
No_PG
SA_static 2%
Routing:
• Based on Duato’s Protocol
- Escape resources are comprised of escape VCs of the bypass ring
formed by (Bypass Inport, Bypass Outport) pairs
- Other VCs are adaptive resources
• Packets on adaptive VCs
- First routed minimally
- If not possible, detoured by one
May still routed on adaptive VCs
- If misrouted hops reach threshold
Forced to enter escape VCs
• Packets on escape VCs
- Confined to bypass ring until destination
NORD
Dynamic
62%
Conv_PG
Conv_PG
40%
Conv_PG_OPT
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Conv_PG_OPT
VA_static 7%
Conv_PG
Power-gating Overhead Reduction:
• NoRD reduces power-gating overhead by over 80%
NORD
60%
No_PG
No_PG
Buffer_static
21%
Router/NI-level:
• Two bypass paths and control logic are added
• Router is power-gated off when its datapath is empty
• Router is turned on when the wakeup metric exceeds a threshold
- VC request rate at the local NI
• Low implementation cost (3.1% of router area)
Conv_PG_OPT
80%
Chip-level:
• A bypass ring connecting all nodes
• Receiving : add a bypass path from Bypass Inport to the NI ejection
• Sending: add a bypass path from the NI injection to Bypass Outport
• Forwarding: packets bypass a gated-off router by using the above two
bypass paths together
No_PG
Static power percentage
100%
• Breaks the node-router dependence via decoupling bypass paths
Conv_PG
• Issue of high NoC power consumption
• The increasing static power of on-chip routers
Static Energy Savings:
• Conv_PG: 51.2%, Conv_PG_OPT : 47.0%, NoRD: 62.9%
• Relative improvement of NoRD: 23.9% and 29.9%
Breakdown of power (normalized to No_PG)
NoC Power Consumption
Basic Idea:
Average packet latency (cycles)
While power-gating is a promising technique to mitigate the increasing
static power of a chip, a fundamental requirement is for the idle periods
to be sufficiently long to compensate for the power-gating and
performance overhead. On-chip routers are potentially good targets for
power optimizations, but few works have explored effective ways of
power-gating them due to the intrinsic dependence between the node and
router – any packet (sent, received or forwarded) must wakeup the router
before being transferred, thus breaking the potentially long idle period
into fragmented intervals. Simulation shows that directly applying
conventional power-gating techniques would cause frequent statetransitions and significant energy and performance overhead. In this
work, we propose NoRD (Node-Router Decoupling), a novel poweraware on-chip network approach that provides for power-gating bypass
to decouple the node’s ability for transferring packets from the poweredon/off status of the associated router, thereby maximizing the length of
router idle periods. Full system evaluation using PARSEC benchmarks
shows that the proposed approach can substantially reduce the number
of state-transitions, completely hide wakeup latency from the critical
path of packet transport and eliminate node-network disconnection
problems. Compared to an optimized conventional power-gating
technique applied to on-chip routers, NoRD can further reduce the
router static energy by 29.9% and improve the average packet latency by
26.3%, with only 3% additional area overhead.
Results
Static energy (norm. to No_PG)
Abstract
{lizhongc, tpink}@usc.edu
120%
110%
100%
90%
80%
70%
60%
50%
Acknowledgements
We thank the anonymous reviewers for their helpful comments and
suggestions. We especially acknowledge the efforts of Yuho Jin in
creating Simics checkpoints prior to this work. We also thank LiShiuan Peh’s research group for their assistance in Orion 2.0. This
research was supported, in part, by the National Science Foundation
(NSF), grant CCF-0946388.
Download