Ming_slides

advertisement
Runahead Execution
A review of “Improving Data Cache Performance by Preexecuting Instructions Under a Cache Miss”
Ming Lu
Oct 31 , 2006
1
Outline

Why

How

Conclusions

Problems
2
Why?
The Memory Latency Bottleneck
Computer Architecture, A quantitative Approach. Third Edition Hennessy, Patterson
3
Solutions:

Cache

A safe place for hiding or storing things
-Webster’s New World Dictionary of the American Language (1976)

Reduce average memory latency by caching data in a small,
fast RAM

Data Pre-fetching

Parallelism
4
A New Problem Arise

Cache misses are the main reason of processor
stall in modern superscalars, especially for L2,
each miss can take hundreds cycles to complete.
5
Runahead: A Solution for Cache Missing
Runahead history
Author
year
Achievement
Dundas, Mudge
1997
In-order scalar Runahead
Mutlu, Patt
2003
Out-of-order superscalar Runahead
Akkary, Rajwar
2003
Checkpoint
Ceze, Torrellas
2005-2006 Checkpointing and value prediction
6
How?
Initiated on an instruction or
data cache miss
Restart at the initiating instruction once the miss is
serviced
Adapted from Dundas
7
Hardware Support Required for Runahead

We need to be able to compute load/store addresses,
branch conditions, and jump targets



Must be able to speculatively update registers during runahead
Register set contents must be checkpointed
Shadow each RF RAM cell, these cells form the BRF




Copy RF to BRF when entering runahead
Copy BRF to RF when resuming normal operation
Pre-processed stores cannot modify the contents of
memory
Fetch logic must save the PC of the Runahead-initiating
instruction
RF : Register File BRF : Backup Register File
Adapted from Dundas
8
Entering and Exiting Runahead

Entering runahead




Save the contents of the RF in the BRF
Save the PC of the runahead-initiating instruction
Restart instruction fetch at the first instruction in the next
sequential line if runahead is initiated on an instruction
cache miss
Exiting runahead



Set all of the RF and L1 data cache runahead-valid bits to
the VALID state
Restore the RF from the BRF
Restart instruction fetch at the PC of the instruction that
initiated runahead
Adapted from Dundas
9
Instructions

Register-to-register



Mark their destination register INV if any of their source registers are
INV
Can replace an INV value in their destination register if all sources are
valid
Load

Mark their destination register INV if:




the base register used to form the effective address is marked INV, or
a cache miss occurs, or
the target word in the L1 data cache is marked INV due to a preceding store
Can replace an INV value in their destination register if none of the
above apply
Adapted from Dundas
10
Instructions (cont.)

Store


Pre-processed stores do not modify the contents of memory
Stores mark their destination L1 data cache word INV if:




the base register used to form the effective address is not INV, and
a cache miss does not occur
Values are only INV with respect to subsequent loads during the same
runahead episode
Conditional branch



Branches are resolved normally if their operands are valid
If a branch condition is marked INV, then the outcome is determined
via branch prediction
If an indirect branch target register is marked INV, then the pipeline
stalls until normal operation resume
Adapted from Dundas
11
Instructions (cont.)

jump register indirect

assume that the return stack contains the address of the next
instruction
Adapted from Dundas
12
Two Runahead Branch Policies
When a conditional branch or jump is preexecuted that is dependent on an invalid
register,


Conservative: halt runahead until the miss is ready.
Aggressive: keep going but assumes that the branch
prediction or subroutine call return stack performance is
good enough to accurately resolve the branch or jump
13
An Example
IRV : Invalid Register Vector
0: Invalid 1: Valid
14
Benefit


Early execution of memory operations which are
potential cache misses
Re-execution of these instructions will most probably
be cache hits
It allows further instructions to be execute.
But these instructions are executed again after exit from
runahead mode.
15
Conclusions

Pre-process instructions while cache misses are serviced





Don’t stall for instructions that are dependent upon invalid or
missing data
Loads and stores that miss in the cache can become data
prefetches
Instruction cache misses become instruction prefetches
Conditional branch outcomes are saved for use during normal
operation
All pre-processed instruction results are discarded


Only interested in generating prefetches and branch outcomes
Runahead is a form of very aggressive, yet inexpensive,
speculation
Adapted from Dundas
16
Problems

Increases the number of executed instructions

Pre-executed instructions consume energy

What if a short-time runahead happen
17
Reference
[1] J. Dundas and T. Mudge. Improving data cache performance by preexecuting instructions under a cache miss. In ICS-11, 1997.
[2] J. D. Dundas. Improving Processor performance by Dynamically PreProcessing the Instruction Stream. PhD thesis, Univ. of Michigan, 1998.
[3] O. Mutlu, J. Stark, C.Wilkerson, and Y. N. Patt. Runahead execution: An
alternative to very large instruction windows for out-of-order processors. In
HPCA-9, pages 129–140, 2003.
[4] H. Akkary, R. Rajwar, and S. T. Srinivasan. Checkpoint processing and
recovery: Towards scalable large instruction window processors. In MICRO36, pages 423–434, 2003.
[5] L.Ceze, K.Strauss, J.Tuck, J. Renau and J.Torrellas CAVA: Hiding L2 Misses
with Checkpoint-Assisted Value Pridiction, In Computer Architecture Letters,
2006
18
Thank You & Questions?
19
Download