Feedback Directed Prefetching § ¥

advertisement
Feedback Directed
Prefetching
Santhosh Srinath ¥
Onur Mutlu §
Hyesoon Kim
Yale N. Patt

¥
§
Problem
Solution

Prefetching can significantly improve
performance
When prefetches are accurate
Feedback Directed Prefetching is a
 And timely
comprehensive mechanism which reduces the
negative effects of prefetching as well as improves
the
However,
Prefetching can also
positive effects

significantly degrade performance


HPCA-13
Due to Memory Bandwidth impact
Pollution of the cache
Feedback Directed Prefetching
2
Outline

Background and Motivation

Feedback Directed Prefetching (FDP)


Metrics and How to collect
How to adapt



Prefetcher Aggressiveness
Cache Insertion Policy for Prefetches
Results
HPCA-13
Feedback Directed Prefetching
3
Background (Prefetcher Aggressiveness)

Prefetch Distance
Access Stream
Prefetch Degree
XX+1
Predicted
Predicted
Stream
Stream
123
Pmax P
Pmax
Very Conservative
Middle of
Prefetch
the
Very
Road
Aggressive
Distance

Pmax Pmax
Prefetch Degree
HPCA-13
Feedback Directed Prefetching
4
Background (Prefetcher Aggressiveness)

Very Aggressive




Well ahead of the load access stream
Hides memory access latency better
More speculative
Very Conservative



HPCA-13
Closer to the load access stream
Might not hide memory access latency completely
Reduces potential for cache pollution and
bandwidth contention
Feedback Directed Prefetching
5
Motivation
Instructions per Cycle
5.0
No Prefetching
Very Conservative
Middle-of-the-Road
Very Aggressive
4.0
3.0
48%
 29%
2.0
1.0


n
gm
ea
ise
up
w
im
w
sw
l
m
es
a
m
gr
id
si
xt
ra
ck
lg
e
ga
ec
ke
fa
ce
r
ua
ar
t
eq
pl
u
ap
r
p
am
m
vp
x
r
rte
vo
rs
e
pa
m
cf
p
ga
bz
ip
2
0.0
Very Aggressive improves average performance by 84%
However it can also significantly reduce performance on some
benchmarks
HPCA-13
Feedback Directed Prefetching
6
Outline

Background and Motivation

Feedback Directed Prefetching (FDP)


Metrics and How to collect
How to adapt



Prefetcher Aggressiveness
Cache Insertion Policy for Prefetches
Results
HPCA-13
Feedback Directed Prefetching
7
Feedback Directed Prefetching

Comprehensive mechanism which takes in
account:




Prefetcher Accuracy
Prefetcher Lateness
Prefetcher-caused Cache Pollution
Adapts


HPCA-13
Prefetcher Aggressiveness
Cache Insertion Policy for Prefetches
Feedback Directed Prefetching
8
Metrics

Prefetch Accuracy

Prefetch Lateness

Prefetcher-caused Cache Pollution
HPCA-13
Feedback Directed Prefetching
9
Prefetch Accuracy
Prefetcher Accuracy 

Number of Useful Prefetches
Number of Prefetches Sent to Memory
Useful Prefetches are referenced by the
demand requests when in L2
HPCA-13
Feedback Directed Prefetching
10
Prefetch Accuracy
400%
Percentage IPC change over No Pref etching
350%
300%
250%
200%
150%
100%
50%
0%
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-50%
-100%
Pref etcher Accuracy

Low Accuracy
 More likely that Prefetching can reduce performance
HPCA-13
Feedback Directed Prefetching
11
Prefetch Accuracy
Prefetcher Accuracy 

used_total
pref_total
Implementation


HPCA-13
pref-bit added to each L2 tag-store entry
Tracked using two counters: pref_total,
used_total
Feedback Directed Prefetching
12
Prefetch Lateness
late_total
Number of
Late Prefetches
Prefetch Lateness 
Prefetch Lateness
Number ofused_total
Useful Prefetches



Measure of how timely prefetches are
Used to determine if increasing the
aggressiveness helps
Implementation


HPCA-13
pref-bit added to each L2 MSHR entry
New counter: late_total
Feedback Directed Prefetching
13
Prefetcher-caused Cache Pollution
Prefetcher  caused Cache Pollution 
Number of Demand Misses caused by the Prefetcher
Number of Demand Misses


Measure of the disturbance caused by
prefetched data in the cache
Used to determine if the prefetcher is evicting
useful data from the cache
HPCA-13
Feedback Directed Prefetching
14
Prefetcher-caused Cache Pollution (2)
pollution_ total
Prefetcher - caused Cache Pollution 
demand_tot al

Hardware Implementation


Insight – this does not need to be exact
Track pollution using Pollution filter




HPCA-13
Based on Bloom Filter concept
Bit set when a prefetch evicts a demand miss
Bit reset when a prefetch is serviced
Two Counters – pollution_total, demand_total
Feedback Directed Prefetching
15
Feedback Directed Prefetching

Comprehensive mechanism which takes in
account:




Prefetcher Accuracy
Prefetcher Lateness
Prefetcher-caused Cache Pollution
Adapts


HPCA-13
Prefetcher Aggressiveness
Cache Insertion Policy
Feedback Directed Prefetching
16
How to adapt?
Prefetcher Aggressiveness

Dynamic Configuration Counter
 Current Aggressiveness
HPCA-13
Distance
Degree
1
Very Conservative
4
1
2
Conservative
8
1
3
Middle-of-the-Road
16
2
4
Aggressive
32
4
5
Very Aggressive
64
4
Feedback Directed Prefetching
17
How to adapt?
Prefetcher Aggressiveness (2)
High Accuracy
Not-Late
Late
Polluting Increase
Decrease

Med Accuracy
Not-Poll
Polluting
Late
Decrease
Increase
Low Accuracy
Not-Poll Decrease
Not-Late
No Change
For Current Phase, based on static thresholds, classify
 Accuracy
ReduceReduce
memory
bandwidth
usage and
ImproveCache
Timeliness
Pollution
 Lateness
CachebyPollution
 Cache-Pollution caused
Prefetches
HPCA-13
Feedback Directed Prefetching
18
How to Adapt?
Cache Insertion Policy for Prefetches

Why adapt?


Reduce the potential for cache pollution
Classify Cache Pollution based on static
thresholds:

Low – Insert at MID(n/2) Position


Medium – Insert at LRU-4(n/4) Position


HPCA-13
Eg: For a 16-way cache, MID = 8 in LRU stack
Eg: For a 16-way cache, LRU-4 = 4 in LRU stack
High – Insert at LRU Position
Feedback Directed Prefetching
19
Outline

Background and Motivation

Feedback Directed Prefetching


Metrics and How to collect
How to adapt



Prefetcher Aggressiveness
Cache Insertion Policy for Prefetches
Results
HPCA-13
Feedback Directed Prefetching
20
Evaluation Methodology

Execution-driven Alpha simulator
Aggressive out-of-order superscalar processor
1 MB, 16-way, 10-cycle unified L2 cache
500-cycle minimum main memory latency
Detailed memory model

Prefetchers Modeled:







HPCA-13
Stream Prefetcher tracking 64 different streams
Global History Buffer Prefetcher (in paper)
PC-based Stride Prefetcher (in paper)
Feedback Directed Prefetching
21
Results: Adjusting Only Aggressiveness


4.7% higher avg IPC over the Very Aggressive configuration
Most of the performance losses have been eliminated
HPCA-13
Feedback Directed Prefetching
22
Results: Adjusting Only Cache Insertion Policy
5.0
No Prefetching
LRU
LRU-4
MID
MRU
Dynamic Insertion
Instructions per Cycle
4.0
Very Aggressive Prefetcher
3.0
2.0
1.0
0.0


5.1% better than inserting prefetches in MRU position
1.9% better than inserting prefetches in LRU-4 position
HPCA-13
Feedback Directed Prefetching
23
Results: Putting it all together (FDP)
11% 13%


6.5% IPC improvement over Very Aggressive configuration
Performance losses converted to performance gains!
HPCA-13
Feedback Directed Prefetching
24
Bandwidth Impact

BPKI - Memory Bus Accesses per 1000 retired Instructions

Includes effects of L2 demand misses as well as pollution
6.5%
13.6%
higher
higher
performance
performance
and18.7%
with similar
less
induced misses and prefetches
bandwidth
bandwidth usage
No. Pref. Very Cons
Mid
Very Aggr
FDP
IPC
0.85
1.21
1.47
1.57
1.67
BPKI
8.56
9.34
10.60
13.38
10.88

FDP significantly improves bandwidth efficiency
HPCA-13
Feedback Directed Prefetching
25
Hardware Cost



pref-bits for L2 cache 16384 blocks
16384 bits
Pollution Filter
4096 entries * 1bit
4096 bits
16-bit counters
11 counters
176 bits
pref-bits for MSHR
128 entries
128 bits
Total hardware cost 20784 bits = 2.54 KB
Percentage area overhead compared to baseline
1MB L2 cache 2.5KB/1024KB = 0.24%
NOT on the critical path
HPCA-13
Feedback Directed Prefetching
26
Outline

Background and Motivation

Feedback Directed Prefetching




Metrics and collecting this information in
Hardware
How to adapt
Results
Conclusions
HPCA-13
Feedback Directed Prefetching
27
Contributions


Comprehensive and low-cost feedback mechanism
for hardware prefetchers
Uses




Adapts





Prefetcher Accuracy
Prefetcher Lateness
Prefetcher-caused Cache Pollution
Aggressiveness
Cache Insertion Policy for prefetches
6.5% higher performance and 18.7% less bandwidth
compared to Very Aggressive Prefetching
Eliminates negative impact of prefetching
Applicable to any data prefetch algorithm
HPCA-13
Feedback Directed Prefetching
28
Questions?
HPCA-13
Feedback Directed Prefetching
29
Backups
HPCA-13
Feedback Directed Prefetching
30
FDP vs Prefetch Cache





Prefetch Caches eliminate prefetcher induced
cache pollution
However, prefetches are now limited to the
size of the prefetch cache
5.3% higher perf. than Very Aggr.+32KB
Within 2% of Very Aggr.+64KB
Memory bandwidth of FDP is 16% less than
32KB and 9% less than 64KB.
HPCA-13
Feedback Directed Prefetching
31
Performance on Other Prefetch algorithms

Global History Buffer Prefetcher



20.8% less memory bandwidth than very
aggressive with similar perf.
9.9% better performance than middle-of-the-road
with similar bandwidth usage
PC-based Stride Prefetcher


HPCA-13
4% better performance than the very aggressive
24% reduction in bandwidth usage
Feedback Directed Prefetching
32
IPC Performance
HPCA-13
Feedback Directed Prefetching
33
Dynamic Prefetcher Accuracy
HPCA-13
Feedback Directed Prefetching
34
Prefetch Lateness
HPCA-13
Feedback Directed Prefetching
35
Pollution Filter
HPCA-13
Feedback Directed Prefetching
36
Thresholds
HPCA-13
Feedback Directed Prefetching
37
Prefetches Sent
HPCA-13
Feedback Directed Prefetching
38
Distribution of dynamic
aggressiveness level
HPCA-13
Feedback Directed Prefetching
39
Distribution of insertion position of
prefetched blocks
HPCA-13
Feedback Directed Prefetching
40
Effect of FDP on memory bandwidth
consumption
HPCA-13
Feedback Directed Prefetching
41
Performance of Prefetch cache vs
FDP
HPCA-13
Feedback Directed Prefetching
42
Bandwidth consumption of prefetch
cache vs. FDP
HPCA-13
Feedback Directed Prefetching
43
Effect of FDP on GHB
HPCA-13
Feedback Directed Prefetching
44
Effect of FDP on GHB
(Bandwidth)
HPCA-13
Feedback Directed Prefetching
45
Effect of varying L2 size and
memory latency
HPCA-13
Feedback Directed Prefetching
46
IPC on other benchmarks
HPCA-13
Feedback Directed Prefetching
47
BPKI on other benchmarks
HPCA-13
Feedback Directed Prefetching
48
Download