Trace Caches

advertisement
Trace Caches
Michele Co
CS 451
Page 1
Motivation

High performance superscalar processors
 High
instruction throughput
 Exploit ILP
–Wider dispatch and issue paths
 Execution units designed for high parallelism
–Many functional units
–Large issue buffers
–Many physical registers

Fetch bandwidth becomes performance bottleneck
Page 2
Fetch Performance Limiters

Cache hit rate

Branch prediction accuracy

Branch throughput
 Need
to predict more than one branch per cycle

Non-contiguous instruction alignment

Fetch unit latency
Page 3
Problems with Traditional Instruction Cache

Contain instructions in compiled order
 Works
well for sequential code with little branching, or code
with large basic blocks
Page 4
Suggested Solutions

Multiple branch target
address prediction

Branch address cache
(1993, Yeh, Marr, Patt)
– Provides quick access to
multiple target addresses
– Disadvantages
• Complex alignment
network, additional
latency
Page 5
Suggested Solutions (cont’d)

Collapsing buffer

Multiple accesses to btb
(1995, Conte, Mills,
Menezes, Patel)
– Allows fetching nonadjacent cache lines
– Disadvantages
• Bank conflicts
• Poor scalability for
interblock branches
• Significant logic added
before and after
instruction cache

Fill unit

Caches RISC-like
instructions derived from
CISC instruction stream
 (1988, Melvin, Shebanow,
Patt)
Page 6
Problems with Prior Approaches

Need to generate pointers for all noncontiguous
instruction blocks BEFORE fetching can begin
 Extra
stages, additional latency
 Complex alignment network necessary

Multiple simultaneous access to instruction cache
 Multiporting

is expensive
Sequencing
 Additional
stages, additional latency
Page 7
Potential Solution – Trace Cache

Rotenberg, Bennett, Smith (1996)

Advantages
 Caches
dynamic instruction sequences
–Fetches past multiple branches
 No additional fetch unit latency

Disadvantages
 Redundant
instruction storage
–Between trace cache and instruction cache
–Within trace cache
Page 8
Trace Cache Details

Trace
 Sequence
of instructions potentially containing branches and
their targets
 Terminate on branches with indeterminate number of targets
–Returns, indirect jumps, traps

Trace identifier
 Start

address + branch outcomes
Trace cache line
 Valid
bit
 Tag
 Branch
flags
 Branch mask
 Trace fall-through address
 Trace target address
Page 9
Page 10
Next Trace Prediction (NTP)


History register
Correlating table


Secondary Table


Complex history indexing
Indexed by most recently
committed trace ID
Index generating function
Page 11
NTP Index Generation
Page 12
Return History Stack
Page 13
Trace Cache vs. Existing Techniques
Page 14
Trace Cache Optimizations

Performance
 Partial
matching [Friendly, Patel, Patt (1997)]
 Inactive issue [Friendly, Patel, Patt (1997)]
 Trace preconstruction [Jacobson, Smith (2000)]

Power
 Sequential
access trace cache [Hu, et al., (2002)]
 Dynamic direction prediction based trace cache [Hu, et al.,
(2003)]
 Micro-operation cache [Solomon, et al., 2003]
Page 15
Trace Processors

Trace Processor Architecture

Processing elements (PE)
– Trace-sized instruction buffer
– Multiple dedicated functional units
– Local register file
– Copy of global register file
 Use hierarchy to distribute execution resources

Addresses superscalar processor issues

Complexity
– Simplified multiple branch prediction (next trace prediction)
– Elimination of local dependence checking (local register file)
– Decentralized instruction issue and result bypass logic
 Architectural limitations
– Reduced bandwidth pressure on global register file (local register
files)
Page 16
Trace Processor
Page 17
Trace Cache Variations

Block-based trace cache (BBTC)
 Black,
Rychlik, Shen (1999)
 Less storage capacity needed
Page 18
Trace Table:
BBTC Trace Prediction
Page 19
Block Cache
Page 20
Rename Table
Page 21
BBTC Optimization

Completion time multiple branch prediction (Rakvic, et al.,
2000)
 Improvement
over trace table predictions
Page 22
Tree-based Multiple Branch Prediction
Page 23
Tree-PHT
Page 24
Tree-PHT Update
Page 25
Trace Cache Variations (cont’d)

Software trace cache
 Ramirez,
Larriba-Pey, Navarro, Torrellas (1999)
 Profile-directed code reordering to maximize sequentiality
–Convert taken branches to not-taken
–Move unused basic blocks out of execution path
–Inline frequent basic blocks
–Map most popular traces to reserved area of i-cache
Page 26
Download