Circuit-Level Timing Speculation

advertisement
Circuit-Level Timing Speculation:
The Razor Latch
Developed by Trevor Mudge’s group
at the University of Michigan,
2003
We’ve Already Encountered
Speculation in ECE 568
• Branch prediction
– When a branch is encountered, guess whether it is taken
or not
• If the guess is correct, we have gained time
• If the guess is incorrect, we must undo any incorrectly executed
instructions and move on
• Multi-word cache lines
– When a cache miss is encountered, we bring in the entire
cache line, not just the word we’re looking for
• If the access pattern shows spatial locality, we are prefetching
other words that the program will soon ask for, thereby saving
time.
• If the speculation is too aggressive (i.e., the cache lines are too
long), we’ll fetch many words uselessly.
Speculation (contd.)
• Value Prediction
– (Not covered in this course)
– Idea is to predict what the value of a variable will
be and use the predicted value.
• If the predicted value was right, we gain some time; if it
was wrong, we did some useless execution. If this
execution changed processor state, these changes will
have to be undone.
• Not used in practice (to my knowledge): mainly an
academic exercise so far.
Speculating on Time
• The pipeline clock cycle is the time by which each stage is
guaranteed to complete its assigned operation
• This time is a function of
– Actual hardware parameters: Gate and wire delays vary within
the same die, from one die to another, and from one wafer to
another.
– Data involved in the computation:
• Example: Ripple-carry adder. Worst-case execution time is the time it
takes to ripple the carry through from carry-in of the least significant,
to the carry-out of the most significant, stage. Actual execution times
may vary considerably.
• Requiring the worst-case delays to be accounted for often
forces designers to be overly conservative in setting the
clock rates
Timing Speculation: Basic Idea
• Suppose F is the frequency at which the pipeline
is guaranteed to function correctly
• Run the pipeline at a somewhat higher rate, f.
– Much of the time, this clock period, t_p=1/f, will be
sufficient for all pipeline stages, and we’ll gain in
execution speed
– Some of the time, we may need more time:
• Need to discover when this is the case
• When this is the case, provide additional time by allowing
the pipeline stages additional cycles to complete their
operation
Implementation
• Recall that pipeline stages are separated by latches
• Duplicate each pipeline latch by introducing a shadow latch
• Consider any stage of the pipeline. Suppose it starts some activity at time
0.
–
–
–
–
At time t_p=1/f, latch the output of that stage into the regular pipeline latch.
At time T_p=1/F, latch the output of the stage into the shadow latch.
Compare the results of the regular and shadow latches
If they agree,
• do nothing: running at a higher speed has paid off
– If they don’t agree,
• Use the result of the shadow latch as the correct one
• Squelch the computation that the following stage began on the basis of
the incorrect shadow latch results
• Restart the computation in the following stage using the correct results, as
stored in the shadow latch
Unless otherwise stated, all figures are from Ernst, et al., MICRO-36, 2003.
Issues to Consider
• How aggressive should we be?
– If f is too high, a large fraction of the results will
require correction with the shadow latch and we’ll
actually lose time
– If f is too low, the clock will be unnecessarily too
slow and we won’t gain much
Issues to Consider (contd.)
• What about F?
– Lower bound of F is given by the worst-case path (for
the worst-case inputs)
– What happens if F is too small? [This is one of the few
instances in design when being too conservative at
one level affects correctness of functioning!]
• F may be so small that the results of the next computation
propagate through the stage and arrive at the shadow latch
– We’d then be comparing the results of two different operations!
Metastability
• If the input data is not stable when the clock
transition happens, the output of the latch may
float at a voltage that is in neither the 0 nor in the
1 logic ranges
– Duration of metastable stage is not bounded
– Different gates may interpret such indeterminate
voltages differently (in terms of logic values)
• Cannot reduce the probability of metastability to
zero: all we can do is to keep it sufficiently low for
all practical purposes
Recovery Technique 1:
Global Clock Gating
• If any stage detects a timing problem
– Stall the entire pipeline for one clock cycle.
– Use this additional clock cycle to recompute using
the correct shadow-latch values
Recovery Technique 2:
Counterflow Pipelining
• When a mismatch (between regular and shadow
latch contents) is detected:
– Assert a bubble signal, to specify that the erring
pipeline slot is now to be considered a bubble.
– In the subsequent cycle, inject the shadow latch value
into the next stage, allowing the errant operation to
continue with the correct values
– Trigger a flush train, traveling backwards from the
errant stage, flushing operations at each stage it visits
(Question: Is this flush operation necessary?? Can we
do something else to avoid it?)
Power Consumption
Using a Processor to Fry an Egg
From: www.phys.ncku.edu.tw/~htsu/humor/fry_egg.html
Power Density
From: Hsu and Feng, “A Power-Aware Real-Time System…”, 2005
Power Implications: Dynamic Power
From Krishna & Lee: IEEE Trans. Computers, 2003.
Static Power
• Even when there is no switching, transistors
leak current
• Leakage power is a strongly increasing
function of temperature and supply voltage; it
is inversely proportional to the threshold
voltage.
Subthreshold leakage vs temperature
From: Do, et al: Tech Report 2007-06, Dept of CSE, Chalmers Instt of Tech
Leakage Current vs Vdd
From Do et al., op cit.
Voltage Control for
Razor Latch System
Download