Razor: Dynamic Voltage Scaling Based on Circuit

advertisement
Razor: Dynamic Voltage Scaling Based on
Circuit-Level Timing Speculation
Advanced Computer Architecture Laboratory
The University of Michigan
Dan Ernst, Nam Sung Kim,
Shidhartha Das, Sanjay Pant, Rajeev Rao, Toan Pham,
and Conrad Ziesler
Faculty Members: David Blaauw, Todd Austin, and Trevor Mudge
Krisztián Flautner, ARM Ltd.
December 3rd, 2003
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Dynamic Voltage Scaling and Design Uncertainty
•
DVS - Adapting voltage/frequency to meet performance demands of workload
– Lower processor voltage during periods of low utilization
– Lower Voltage is a Good Thing™ for power
•
Minimum voltage is limited by Safety Margins
– Error-free operation must be guaranteed!
•
Intra-die variations in ILD thickness
Technology trends are
Maximizing the Minimums
– Process and temperature variation
– Capacitive and inductive noise
•
Key Observation: worst-case conditions also highly improbable
– Significant gain for circuits optimized for common case
– Efficient mechanisms needed to tolerate infrequent worst-case scenarios
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Shaving Voltage Margins with Razor
•
Goal: reduce voltage margins with in-situ error detection and correction
for delay failures
Percentage Errors
60
40
Zero margin
Sub-critical
20
Traditional
DVS
0
• Proposed Approach:
0.8
1.0
1.2
1.4
1.6
1.8
2.0
Supply Voltage
– Remove safety margins and tolerate occasional errors
– Tune processor voltage based on error rate
– Purposely run below critical voltage
• Data-dependent latency margins
• Trade-off: voltage power savings vs. overhead of correction
– Analogous to wireless power modulation
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
4
clk
9
3
clk
MEM
Shadow Latch
5
Main FF
Main FF
Razor Timing Error Detection
9
clk_del
• Second sample of logic value used to validate earlier
sample
• Key design issues:
– Maintaining pipeline forward progress
– Short path impact on shadow-latch
– Power overhead of error detection and correction
Advanced Computer Architecture Lab
The University of Michigan
- Meta-stable results in main flip-flop
- Recovering pipeline state after errors
Razor DVS
Dan Ernst – 12/3/2003
4
2
clk
9
8
clk
Hold Constraint
(~1/2 cycle)
MEM
Shadow Latch
5
3
Main FF
Main FF
Razor Short Path Constraint
8
clk_del
• Second sample of logic value used to validate earlier
sample
• Key design issues:
– Maintaining pipeline forward progress
– Short path impact on shadow-latch
– Power overhead of error detection and correction
Advanced Computer Architecture Lab
The University of Michigan
- Meta-stable results in main flip-flop
- Recovering pipeline state after errors
Razor DVS
Dan Ernst – 12/3/2003
Centralized Razor Pipeline Error Recovery
Cycle: 1
0
6
5
4
3
2
clock
recover
recover
error
recover
MEM
error
Razor FF
error
EX
Razor FF
ID
Razor FF
PC
IF
Razor FF
inst2
inst5
inst4
inst3
inst1
inst6
WB
(reg/mem)
error
recover
• Once cycle penalty for timing failure
• Global synchronization may be difficult for fast, complex designs
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Distributed Razor Pipeline Error Recovery
Cycle: 7891234560
recover
Flush
Control
flushID
bubble
error
recover
flushID
bubble
MEM
(read-only)
error
recover
flushID
bubble
error
Stabilizer FF
error
EX
Razor FF
ID
Razor FF
PC
IF
Razor FF
inst2
Razor FF
inst5
inst2
inst1
inst6
inst8
inst7
inst4
inst3
WB
(reg/mem)
bubble
recover
flushID
• Multiple cycle penalty for timing failure
• Scalable design since all recovery communication is local
• Builds on existing branch / data speculation recovery framework
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Error-Rate Studies – Hardware Measurement
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Error Rate Studies – Empirical Results
100.0000000%
10.0000000%
1.0000000%
0.1000000%
0.0100000%
35% energy savings with 1.3% error
22% saving
random
0.0010000%
0.0001000%
0.0000100%
0.0000010%
0.0000001%
0.0000000%
Error rate
18x18-bit Multiplier Block at 90 MHz and 27 C
1.78 1.74 1.70 1.66 1.62 1.58 1.54 1.50 1.46 1.42 1.38 1.34 1.30 1.26 1.22 1.18 1.14
Environmental-margin
@ 1.69 V
Zero-margin
@ 1.54 V
Advanced Computer Architecture Lab
The University of Michigan
Supply Voltage (V)
once every 20 seconds!
Razor DVS
Dan Ernst – 12/3/2003
Error Rate Studies – SPICE-Level Simulations
Based on a SPICE-level simulations of a Kogge-Stone adder
Kogge-Stone Adder at 870 MHz and 27 C
100.00%
10.00%
1.00%
0.10%
Error rate
•
random
bzip
200 mV
0.01%
ammp
0.00%
2
1.8
1.6
1.4
1.2
1
0.8
0.6
Supply Voltage
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Razor I - Prototype Razor Implementation
4 stage 64-bit Alpha pipeline:
– 200MHz expected operation in 0.18mm
technology, 1.8V, ~500mW
– Tunable via software from
50-200MHz, 1.1-1.8V
– Razor applied to combinational logic
•
Razor overhead:
3 mm
I-Cache
Register File
WB
IF ID
EX
MEM
•
3.3 mm
D-Cache
– Total of 192 Razor flip-flops
out of 2408 total (9%)
– Error-free power overhead: ~ 3%
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Effects of Razor DVS
Pipeline
Throughput
Energy
IPC
Total Energy,
Etotal = Eproc + Erecovery
Optimal Etotal
Energy of Processor
Operations, Eproc
Energy of Processor
w/o Razor Support
Energy of
Pipeline
Recovery,
Erecovery
Decreasing Supply Voltage
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
EX-Stage Analysis – Optimal Voltage Sweep
BZIP
1.4
1.2
Relative IPC and Energy
Recovery cost includes energy to
recover entire pipeline (18x an add)
Rel Energy
Rel Performance
1
0.8
0.6
0.31% Error Rate,
58% Energy Savings
0.4
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1.45
1.5
1.55
1.6
1.65
1.7
1.75
1.8
0.2
Voltage
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
EX-Stage Analysis – Optimal Voltage Sweep
GCC
1.4
Rel Energy
Rel Performance
Relative IPC and Energy
1.2
1
0.8
0.6
1.62% Error Rate,
24% Energy Savings
0.4
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1.45
1.5
1.55
1.6
1.65
1.7
1.75
1.8
0.2
Voltage
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Simulation Analysis – Energy-Optimal Voltage
120
Percentage of Baseline (zero-margin)
100
80
Total Energy
IPC
60
40
20
0
bzip
crafty
eon
gap
Advanced Computer Architecture Lab
The University of Michigan
gcc
gzip
mcf
parser
twolf
vortex
vpr
Average
Razor DVS
Dan Ernst – 12/3/2003
Simulation Analysis – Razor DVS Execution
GCC
2
40.00%
Voltage
Error Rate
1.8
35.00%
1.6
30.00%
25.00%
1.2
1
20.00%
0.8
15.00%
Error Rate
Supply Voltage
1.4
0.6
10.00%
0.4
5.00%
0.2
0
0.00%
Time
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Simulation Analysis – Razor DVS Performance
120
Percentage of Baseline (zero-margin)
100
Total Energy
DVS Energy
IPC
DVS IPC
80
60
40
20
0
bzip
crafty
eon
gap
Advanced Computer Architecture Lab
The University of Michigan
gcc
gzip
mcf
parser
twolf
vortex
vpr
Average
Razor DVS
Dan Ernst – 12/3/2003
Conclusions
In-situ detection/correction of timing errors
clk
– Eliminate process, temperature, and safety margins
– Tune processor voltage based on error rate
– Purposely run below critical voltage to capture
data-dependent latency margins
Implemented with architecture/circuit support
Advanced Computer Architecture Lab
The University of Michigan
comparator
Error
error
bu
bb
le
recover
flushI
D
MEM
EX
error
bu
bbl
e
error
bu
bb
le
rec
ove
r
recover
flushI
D
(read-only)
flushI
D
Razor FF
PC
Razor FF
ID
error
bu
bbl
e
Stabilizer FF
clk_del
– Running with error is good!
Flush
Control
Error_L
RAZOR FF
IF
Trade-off: supply voltage power savings
vs. overhead of correction
Q1
Main
Flip-Flop
Shadow
Latch
– Double-sampling metastability-tolerant
Razor flip-flops validate logic results
– Pipeline initiates recovery after circuit timing errors,
no voltage/clock re-tuning needed
•
0
1
Razor FF
•
D1
Razor FF
•
WB
(reg/mem)
recover
flushI
D
Razor DVS
Dan Ernst – 12/3/2003
Future Directions
• Research opportunities
–
–
–
–
Razor for caches/memory and control logic
Voltage control algorithms, especially per-stage tuning
Typical-case energy optimized designs (instead of worse-case latency optimized)
Turnkey application of Razor technology
• Prototype design, fabrication, evaluation
– Razor I – Q4 2003 – Razor-ized combinational logic, global tuning
– Razor II – Q3 2004 – Razor-ized caches and control logic, per-stage tuning
• Other applications
– Single-event upset (SEU) protection using Razor error detection/re-execution
– Over-clocking for performance improvement (large gains among hobbyists)
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Questions
?
?
?
?
?
?
Advanced Computer Architecture Lab
The University of Michigan
?
?
?
?
?
?
Razor DVS
Dan Ernst – 12/3/2003
Back-up Slides
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Other Approaches to Dynamic Voltage Scaling
• Traditional DVS
– Valid voltage / delay combinations “blessed” at design time
– Approach leaves a significant amount of energy “on the table”
– Temperature, process, data, and safety margins placed on voltage
• Other approaches miss some margins
– Slack detector – automatic tuning
• ARM’s Intelligent Energy Manager (IEM)
• Processor voltage automatically tuned to
external ambient conditions
• Inverter chain designed to track most
restrictive critical path, margin still required
Advanced Computer Architecture Lab
The University of Michigan
M
e
m
C
o
nt
ro
l
control
Data
cache
Floating point
and
graphics
Ex Unit
Control
Unit
L2 Cache
I
O
U
N
I
T
Cache
control
L2
tags
L2 Cache
Razor DVS
Dan Ernst – 12/3/2003
Razor Flip-Flop Implementation
clk
Logic Stage
0
1
L1
D
Main
Flip-Flop
Shadow
Latch
RAZOR FF
Logic Stage
Q
L2
Error_L
comparator
Error
clk_del
•
•
Compare latched data with shadow-latch on delayed clock
Upon failure: place data from shadow-latch in main latch
– Ensure shadow latch always correct using conservative design techniques
– Correct value in shadow latch guarantees forward progress
•
Recover pipeline using microarchitectural recovery mechanism
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Razor Flip-Flop Circuit
clk_b
clk
D
Q
clk_b
clk
Meta-stability detector
Inv_n
Error_L
clk_del_b
Inv_p
Error_L
clk_del
Shadow Latch
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Overcoming Short Path Constraints
• Delayed clock imposes a short-path constraint
clock
intended path
short path
Min. Path Delay > tdelay + thold
clock_del
tdelay
thold
Min. path delay
– Razor necessary only for
latches on slow paths
– Pad fast path for latches with
mixed path delays
– Trade-off between DVS
headroom and short path
constraints
Advanced Computer Architecture Lab
The University of Michigan
ff
Pad with extra delay
Razor_ff
clock
Long Paths
Short Paths
Razor DVS
Dan Ernst – 12/3/2003
Hardware Measurement Setup
36
18
X
18x18
clk/2
Slow Pipeline B
36
clk/2
X
48-bit LFSR
!=
clk/2
40-bit Error Counter
48-bit LFSR
Slow Pipeline A
18x18
clk/2
18
clk/2
Fast Pipeline
36
stabilize
X
18x18
clk
Advanced Computer Architecture Lab
The University of Michigan
clk
clk
Razor DVS
Dan Ernst – 12/3/2003
Simulation Methodology
• Challenge: instruction latency depends on circuit evaluation latency
– May vary with changes in stage inputs, stage logic, voltage, temperature…
• Dynamic timing simulation combines architectural/circuit simulation
• Initial implementation utilized a hand-generated EX-stage circuit model
– Effort ongoing to automate extraction/decomposition/integration into SimpleScalar
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Supply Voltage Control System
reset
Ediff = Eref - Esample
Eref
-
Voltage
Control
Function
Voltage
Regulator
Vdd
Pipeline
error
signals
Ediff
.
.
.

Esample
• Current design utilizes a very simple proportional control function
– Control algorithm implemented in software
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Pipeline Recovery
IF
inst
ID
inst
EX
inst
MEM
inst
MEM
WB
inst
inst
clk
clk_d
ID.d
EX.d
MEM.d
error
Advanced Computer Architecture Lab
The University of Michigan
Redo
instruction
in MEM
No
Error
Error
Razor DVS
Dan Ernst – 12/3/2003
Voltage Scaling under Dynamic Workloads
• Adapt frequency/voltage to performance demands of workload
Vdd
Freq
Voltage
Utilization
– Software controlled processor speed
– Lower processor voltage during periods of low operating frequency
Time
• Quadratic reduction in dynamic power and energy
• Super-quadratic reduction in leakage
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Simulation Flow
• Automatic creation of very detailed power/delay C-models
MEM
FF
EX
FF
ID
FF
IF
FF
PC
High-level HDL Specification
WB
Circuit Extraction
with Parasitics
Variable Voltage
SDF generation
Architecture
Specification
Power/Delay
C-model
SimpleScalar + DTA
Advanced Computer Architecture Lab
The University of Michigan
Detailed Power/Delay
Analysis
Voltage Control
Algorithm
Razor DVS
Dan Ernst – 12/3/2003
Simulation Methodology
01
01
1
1
1
10
0
1
• Dynamic timing simulation combines architectural/circuit simulation
– Contrast to static timing simulation which is only concerned with critical path
– SimpleScalar/Alpha architectural-level simulation
– Gate-level simulation of per-stage logic blocks
• Logic block model describes cells, local and global interconnect
• Cells characterized with SPICE at varied slew/cap-load/voltage
• Each cycle, circuit simulator evaluates delay of each stages’ logic block\
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Simulation Analysis – Razor DVS Execution
Gap
2
30.00%
Voltage
Error Rate
1.6
27.00%
24.00%
21.00%
18.00%
1.4
15.00%
1.2
12.00%
Error Rate
Supply Voltage
1.8
9.00%
1
6.00%
0.8
3.00%
0.6
0.00%
Time
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Razor Demo
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
More Details on Meta-Stability
• Sub-critical operation invites meta-stability
– Meta-stability detector itself can become meta-stable
– double latch error signal to obtain sufficient small probability
clk_b
clk
D
Q
clk_b
pos
clk
neg
clk_del_b
restore
clk_del
– Flush entire pipe
– No forward progress
– Reduce frequency
Advanced Computer Architecture Lab
The University of Michigan
pos
error
fail
restore
bubble
flush
restore
neg
Dynamic
Or / Latch
bubble
flush
Razor DVS
Dan Ernst – 12/3/2003
Short Path Failure
IF
inst1
inst2
ID
inst1
inst2
EX
inst1
MEM
WB
clk
clk_d
ID.d
I1
EX.d
I2
I1
I2
MEM.d
Short Path
error
Advanced Computer Architecture Lab
The University of Michigan
Razor DVS
Dan Ernst – 12/3/2003
Download