Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation Advanced Computer Architecture Laboratory The University of Michigan Dan Ernst, Nam Sung Kim, Shidhartha Das, Sanjay Pant, Rajeev Rao, Toan Pham, and Conrad Ziesler Faculty Members: David Blaauw, Todd Austin, and Trevor Mudge Krisztián Flautner, ARM Ltd. December 3rd, 2003 Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Dynamic Voltage Scaling and Design Uncertainty • DVS - Adapting voltage/frequency to meet performance demands of workload – Lower processor voltage during periods of low utilization – Lower Voltage is a Good Thing™ for power • Minimum voltage is limited by Safety Margins – Error-free operation must be guaranteed! • Intra-die variations in ILD thickness Technology trends are Maximizing the Minimums – Process and temperature variation – Capacitive and inductive noise • Key Observation: worst-case conditions also highly improbable – Significant gain for circuits optimized for common case – Efficient mechanisms needed to tolerate infrequent worst-case scenarios Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Shaving Voltage Margins with Razor • Goal: reduce voltage margins with in-situ error detection and correction for delay failures Percentage Errors 60 40 Zero margin Sub-critical 20 Traditional DVS 0 • Proposed Approach: 0.8 1.0 1.2 1.4 1.6 1.8 2.0 Supply Voltage – Remove safety margins and tolerate occasional errors – Tune processor voltage based on error rate – Purposely run below critical voltage • Data-dependent latency margins • Trade-off: voltage power savings vs. overhead of correction – Analogous to wireless power modulation Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 4 clk 9 3 clk MEM Shadow Latch 5 Main FF Main FF Razor Timing Error Detection 9 clk_del • Second sample of logic value used to validate earlier sample • Key design issues: – Maintaining pipeline forward progress – Short path impact on shadow-latch – Power overhead of error detection and correction Advanced Computer Architecture Lab The University of Michigan - Meta-stable results in main flip-flop - Recovering pipeline state after errors Razor DVS Dan Ernst – 12/3/2003 4 2 clk 9 8 clk Hold Constraint (~1/2 cycle) MEM Shadow Latch 5 3 Main FF Main FF Razor Short Path Constraint 8 clk_del • Second sample of logic value used to validate earlier sample • Key design issues: – Maintaining pipeline forward progress – Short path impact on shadow-latch – Power overhead of error detection and correction Advanced Computer Architecture Lab The University of Michigan - Meta-stable results in main flip-flop - Recovering pipeline state after errors Razor DVS Dan Ernst – 12/3/2003 Centralized Razor Pipeline Error Recovery Cycle: 1 0 6 5 4 3 2 clock recover recover error recover MEM error Razor FF error EX Razor FF ID Razor FF PC IF Razor FF inst2 inst5 inst4 inst3 inst1 inst6 WB (reg/mem) error recover • Once cycle penalty for timing failure • Global synchronization may be difficult for fast, complex designs Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Distributed Razor Pipeline Error Recovery Cycle: 7891234560 recover Flush Control flushID bubble error recover flushID bubble MEM (read-only) error recover flushID bubble error Stabilizer FF error EX Razor FF ID Razor FF PC IF Razor FF inst2 Razor FF inst5 inst2 inst1 inst6 inst8 inst7 inst4 inst3 WB (reg/mem) bubble recover flushID • Multiple cycle penalty for timing failure • Scalable design since all recovery communication is local • Builds on existing branch / data speculation recovery framework Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Error-Rate Studies – Hardware Measurement Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Error Rate Studies – Empirical Results 100.0000000% 10.0000000% 1.0000000% 0.1000000% 0.0100000% 35% energy savings with 1.3% error 22% saving random 0.0010000% 0.0001000% 0.0000100% 0.0000010% 0.0000001% 0.0000000% Error rate 18x18-bit Multiplier Block at 90 MHz and 27 C 1.78 1.74 1.70 1.66 1.62 1.58 1.54 1.50 1.46 1.42 1.38 1.34 1.30 1.26 1.22 1.18 1.14 Environmental-margin @ 1.69 V Zero-margin @ 1.54 V Advanced Computer Architecture Lab The University of Michigan Supply Voltage (V) once every 20 seconds! Razor DVS Dan Ernst – 12/3/2003 Error Rate Studies – SPICE-Level Simulations Based on a SPICE-level simulations of a Kogge-Stone adder Kogge-Stone Adder at 870 MHz and 27 C 100.00% 10.00% 1.00% 0.10% Error rate • random bzip 200 mV 0.01% ammp 0.00% 2 1.8 1.6 1.4 1.2 1 0.8 0.6 Supply Voltage Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor I - Prototype Razor Implementation 4 stage 64-bit Alpha pipeline: – 200MHz expected operation in 0.18mm technology, 1.8V, ~500mW – Tunable via software from 50-200MHz, 1.1-1.8V – Razor applied to combinational logic • Razor overhead: 3 mm I-Cache Register File WB IF ID EX MEM • 3.3 mm D-Cache – Total of 192 Razor flip-flops out of 2408 total (9%) – Error-free power overhead: ~ 3% Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Effects of Razor DVS Pipeline Throughput Energy IPC Total Energy, Etotal = Eproc + Erecovery Optimal Etotal Energy of Processor Operations, Eproc Energy of Processor w/o Razor Support Energy of Pipeline Recovery, Erecovery Decreasing Supply Voltage Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 EX-Stage Analysis – Optimal Voltage Sweep BZIP 1.4 1.2 Relative IPC and Energy Recovery cost includes energy to recover entire pipeline (18x an add) Rel Energy Rel Performance 1 0.8 0.6 0.31% Error Rate, 58% Energy Savings 0.4 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 1.8 0.2 Voltage Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 EX-Stage Analysis – Optimal Voltage Sweep GCC 1.4 Rel Energy Rel Performance Relative IPC and Energy 1.2 1 0.8 0.6 1.62% Error Rate, 24% Energy Savings 0.4 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 1.8 0.2 Voltage Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Analysis – Energy-Optimal Voltage 120 Percentage of Baseline (zero-margin) 100 80 Total Energy IPC 60 40 20 0 bzip crafty eon gap Advanced Computer Architecture Lab The University of Michigan gcc gzip mcf parser twolf vortex vpr Average Razor DVS Dan Ernst – 12/3/2003 Simulation Analysis – Razor DVS Execution GCC 2 40.00% Voltage Error Rate 1.8 35.00% 1.6 30.00% 25.00% 1.2 1 20.00% 0.8 15.00% Error Rate Supply Voltage 1.4 0.6 10.00% 0.4 5.00% 0.2 0 0.00% Time Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Analysis – Razor DVS Performance 120 Percentage of Baseline (zero-margin) 100 Total Energy DVS Energy IPC DVS IPC 80 60 40 20 0 bzip crafty eon gap Advanced Computer Architecture Lab The University of Michigan gcc gzip mcf parser twolf vortex vpr Average Razor DVS Dan Ernst – 12/3/2003 Conclusions In-situ detection/correction of timing errors clk – Eliminate process, temperature, and safety margins – Tune processor voltage based on error rate – Purposely run below critical voltage to capture data-dependent latency margins Implemented with architecture/circuit support Advanced Computer Architecture Lab The University of Michigan comparator Error error bu bb le recover flushI D MEM EX error bu bbl e error bu bb le rec ove r recover flushI D (read-only) flushI D Razor FF PC Razor FF ID error bu bbl e Stabilizer FF clk_del – Running with error is good! Flush Control Error_L RAZOR FF IF Trade-off: supply voltage power savings vs. overhead of correction Q1 Main Flip-Flop Shadow Latch – Double-sampling metastability-tolerant Razor flip-flops validate logic results – Pipeline initiates recovery after circuit timing errors, no voltage/clock re-tuning needed • 0 1 Razor FF • D1 Razor FF • WB (reg/mem) recover flushI D Razor DVS Dan Ernst – 12/3/2003 Future Directions • Research opportunities – – – – Razor for caches/memory and control logic Voltage control algorithms, especially per-stage tuning Typical-case energy optimized designs (instead of worse-case latency optimized) Turnkey application of Razor technology • Prototype design, fabrication, evaluation – Razor I – Q4 2003 – Razor-ized combinational logic, global tuning – Razor II – Q3 2004 – Razor-ized caches and control logic, per-stage tuning • Other applications – Single-event upset (SEU) protection using Razor error detection/re-execution – Over-clocking for performance improvement (large gains among hobbyists) Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Questions ? ? ? ? ? ? Advanced Computer Architecture Lab The University of Michigan ? ? ? ? ? ? Razor DVS Dan Ernst – 12/3/2003 Back-up Slides Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Other Approaches to Dynamic Voltage Scaling • Traditional DVS – Valid voltage / delay combinations “blessed” at design time – Approach leaves a significant amount of energy “on the table” – Temperature, process, data, and safety margins placed on voltage • Other approaches miss some margins – Slack detector – automatic tuning • ARM’s Intelligent Energy Manager (IEM) • Processor voltage automatically tuned to external ambient conditions • Inverter chain designed to track most restrictive critical path, margin still required Advanced Computer Architecture Lab The University of Michigan M e m C o nt ro l control Data cache Floating point and graphics Ex Unit Control Unit L2 Cache I O U N I T Cache control L2 tags L2 Cache Razor DVS Dan Ernst – 12/3/2003 Razor Flip-Flop Implementation clk Logic Stage 0 1 L1 D Main Flip-Flop Shadow Latch RAZOR FF Logic Stage Q L2 Error_L comparator Error clk_del • • Compare latched data with shadow-latch on delayed clock Upon failure: place data from shadow-latch in main latch – Ensure shadow latch always correct using conservative design techniques – Correct value in shadow latch guarantees forward progress • Recover pipeline using microarchitectural recovery mechanism Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor Flip-Flop Circuit clk_b clk D Q clk_b clk Meta-stability detector Inv_n Error_L clk_del_b Inv_p Error_L clk_del Shadow Latch Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Overcoming Short Path Constraints • Delayed clock imposes a short-path constraint clock intended path short path Min. Path Delay > tdelay + thold clock_del tdelay thold Min. path delay – Razor necessary only for latches on slow paths – Pad fast path for latches with mixed path delays – Trade-off between DVS headroom and short path constraints Advanced Computer Architecture Lab The University of Michigan ff Pad with extra delay Razor_ff clock Long Paths Short Paths Razor DVS Dan Ernst – 12/3/2003 Hardware Measurement Setup 36 18 X 18x18 clk/2 Slow Pipeline B 36 clk/2 X 48-bit LFSR != clk/2 40-bit Error Counter 48-bit LFSR Slow Pipeline A 18x18 clk/2 18 clk/2 Fast Pipeline 36 stabilize X 18x18 clk Advanced Computer Architecture Lab The University of Michigan clk clk Razor DVS Dan Ernst – 12/3/2003 Simulation Methodology • Challenge: instruction latency depends on circuit evaluation latency – May vary with changes in stage inputs, stage logic, voltage, temperature… • Dynamic timing simulation combines architectural/circuit simulation • Initial implementation utilized a hand-generated EX-stage circuit model – Effort ongoing to automate extraction/decomposition/integration into SimpleScalar Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Supply Voltage Control System reset Ediff = Eref - Esample Eref - Voltage Control Function Voltage Regulator Vdd Pipeline error signals Ediff . . . Esample • Current design utilizes a very simple proportional control function – Control algorithm implemented in software Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Pipeline Recovery IF inst ID inst EX inst MEM inst MEM WB inst inst clk clk_d ID.d EX.d MEM.d error Advanced Computer Architecture Lab The University of Michigan Redo instruction in MEM No Error Error Razor DVS Dan Ernst – 12/3/2003 Voltage Scaling under Dynamic Workloads • Adapt frequency/voltage to performance demands of workload Vdd Freq Voltage Utilization – Software controlled processor speed – Lower processor voltage during periods of low operating frequency Time • Quadratic reduction in dynamic power and energy • Super-quadratic reduction in leakage Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Flow • Automatic creation of very detailed power/delay C-models MEM FF EX FF ID FF IF FF PC High-level HDL Specification WB Circuit Extraction with Parasitics Variable Voltage SDF generation Architecture Specification Power/Delay C-model SimpleScalar + DTA Advanced Computer Architecture Lab The University of Michigan Detailed Power/Delay Analysis Voltage Control Algorithm Razor DVS Dan Ernst – 12/3/2003 Simulation Methodology 01 01 1 1 1 10 0 1 • Dynamic timing simulation combines architectural/circuit simulation – Contrast to static timing simulation which is only concerned with critical path – SimpleScalar/Alpha architectural-level simulation – Gate-level simulation of per-stage logic blocks • Logic block model describes cells, local and global interconnect • Cells characterized with SPICE at varied slew/cap-load/voltage • Each cycle, circuit simulator evaluates delay of each stages’ logic block\ Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Analysis – Razor DVS Execution Gap 2 30.00% Voltage Error Rate 1.6 27.00% 24.00% 21.00% 18.00% 1.4 15.00% 1.2 12.00% Error Rate Supply Voltage 1.8 9.00% 1 6.00% 0.8 3.00% 0.6 0.00% Time Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor Demo Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 More Details on Meta-Stability • Sub-critical operation invites meta-stability – Meta-stability detector itself can become meta-stable – double latch error signal to obtain sufficient small probability clk_b clk D Q clk_b pos clk neg clk_del_b restore clk_del – Flush entire pipe – No forward progress – Reduce frequency Advanced Computer Architecture Lab The University of Michigan pos error fail restore bubble flush restore neg Dynamic Or / Latch bubble flush Razor DVS Dan Ernst – 12/3/2003 Short Path Failure IF inst1 inst2 ID inst1 inst2 EX inst1 MEM WB clk clk_d ID.d I1 EX.d I2 I1 I2 MEM.d Short Path error Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003