CS 7810 Lecture 13 Energy Reduction

advertisement

CS 7810 Lecture 13

Pipeline Gating: Speculation Control For

Energy Reduction

S. Manne, A. Klauser, D. Grunwald

Proceedings of ISCA-25

June 1998

Cost of Speculation

Mispredict rates  9.9 12.2 23.9 10.4 6.9

4.6 11.3 1.7

Pipeline Gating

• Low confidence branches throttle instr fetch until they are resolved

• Pipeline gating usually lasts for fewer than five cycles

Metrics

• SPEC (specificity): fraction of all mispredicted branches detected as low-confidence by the confidence estimator (coverage)

• PVN (predictive value of a negative test): probability of a low-confidence branch being incorrectly branch-predicted (accuracy)

Confidence Estimators

• Perfect: to gauge potential benefits

• Static: branches that have low prediction rates

• JRS: if a branch has yielded N successive correct predictions, it has high confidence

• Saturating counters: unbiased counter value or disagreement in two predictors  low confidence

• Distance: mpreds are clustered, hence the first 4 branches after a mispredict have low confidence

SPEC and PVN

SPEC (coverage): mispred branches detected by low-confidence estimator

PVN (accuracy): % of low-confidence branches that are branch mpreds

• It is easier to achieve a high SPEC value than PVN

• A high PVN value can be achieved by using N low-confidence branches to invoke gating – if PVN is 30%, re-defining low-confidence as two low-confidence branches increases PVN to 51%

Perfect

Gating Results

Results

• Can gating improve performance? – only if cache pollution is significant

• Less than 1% performance loss and up to 38% reduction in extra work

• Energy consumption could go up – some work is independent of number of executed instrs (clock distribution) – incr. execution time can incr. Energy

• Pipeline gating should reduce power consumption

Results

CS 7810 Lecture 13

Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power

S. Kaxiras, Z. Hu, M. Martonosi

Proceedings of ISCA-28

July 2001

Leakage Power Trends

• Circuit delay a

1/(V – V th

)

• Leakage a num transistors (incr) supply voltage (decr)

(exp) low thresh. voltage (incr)

• L1 and L2 caches are the biggest contributors (high transistor budgets)

V

dd

-Gating

• Leakage can be reduced by gating off the supply voltage to the circuit

• When applied to a cache, the contents of the

SRAM cell are lost

• Cache decay: apply Vdd-gating when you do not care about cache contents

Lifetime of a Cache Line

Overheads

• Hardware to determine when to decay

• Introduces additional cache misses

• Normalized cache leakage power =

Activeratio (fraction of cache that is powered on) +

(Counter overhead : Leak) x activity +

(L2 access energy : Leak) x num-misses

• Increased execution time (< 0.7%)

• L2 access/leakage ratio is ~9

Skier’s Dilemma

New skis: $400 Ski rentals: $20

Heuristic: Buy skis after rental cost = purchase price

Ski trips:

Optimal:

5 10 15 20 25 50

$100 $200 $300 $400 $400 $400

Heuristic: $100 $200 $300 $800 $800 $800

Likewise, decay a cache line when the cost of an additional miss equals leakage dissipated so far

Tracking Dead Time

• Each line has a 2-bit counter that gets reset on every access and gets incremented every 2500 cycles through a global signal (negligible overhead)

• After 10,000 clock cycles, the counter reaches the max value and triggers a decay

• Adaptive decay: Start with a short decay period; if you have a quick miss, double the period; if there is no miss, halve the period

Results

Overheads

Other Results

• L2 cache is equally suitable to decay techniques

-- lifetimes are scaled by a factor of 10, an extra miss also costs a lot more

• For their experiments, there is little interference from multiprogramming

• Some instructions can easily be identified as last touches to a cache block – potential for early cache decay

• Can this apply to bpred, register file?

Title

• Bullet

Download