Towards Complexity-Effective Intelligent

advertisement
Efficiently Prefetching
Complex Address Patterns
Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian
University of Utah
Chris Wilkerson, Zeshan Chishti, Seth Pugsley
*Intel Labs
Variable Length Delta Prefetcher
1
Prefetchers
Confirmation Based Prefetchers
• Issue predictions after a few deltas
• High Accuracy
• Short Streams Lose out
Immediate Prefetchers
•
•
•
Aggressive
Low Accuracy
Waste DRAM bandwidth and
cache capacity
Accurate
Fast
Variable Length Delta Prefetcher
2
Spatial Correlation
• Learn Access (Delta) Patterns
• Apply patterns when similar conditions re-occur.
• Eg: PC, physical address, delta patterns
Delta Patterns
• Regular Delta Patterns. Eg: ( +1, +1, +1)…, (+2, +2, +2, +2)…
• Irregular Delta Patterns. Eg: ( +1, +2, +3 )…
Variable Length Delta Prefetcher
3
Long Repeatable Streams of Irregular Deltas
Delta patterns for milc
Page Num: 479218
Deltas: 1, 9, -8, 1, 8, 1, -8, 1, 1, 7……..
Variable Length Delta Prefetcher
4
Long Repeatable Streams of Irregular Deltas
Deltas
: 1,
9, -8, 1,
8,
1, -8,
1, 1,
7,
-1, -5,…..
Cache Line: A+1, A+10, A+2, A+3, A+11, A+12, A+4, A+5, A+6, A+13, A+12, A+7……
Stream 1 : A+1, A+2, A+ 3, A+4, A+5, A+6, A+7
Stream2: A+10, A+11, A+12, A+13
Confirmation
Prefetches
Stride Prefetcher
Coverage: 5/11
SandBox Prefetcher
Coverage: 9/11
Neither are perfectly timely!
Variable Length Delta Prefetcher
5
Variable Length Delta Prefetcher
Variable Length Delta Prefetcher
6
$ Access
Delta Prediction
Tables
Offset Prediction
Tables
Predicted
Delta/Offset
Last Level $$
Core 1
Per Page
Delta History
Tables
Core 8
$ Access
Per Page
Delta History
Tables
Delta Prediction
Tables
Offset Prediction
Tables
Predicted
Delta/Offset
Structure of VLDP
Variable Length Delta Prefetcher
7
Delta History Table
 Tracks delta within a page
for (i=0;i<BIGNUM; i++)
{
a[i]=b[i]+c[i];
}
Delta = Last Address- Current Address
 a, b, c can each belong to different pages
 So Deltas between pages is meaningless
Variable Length Delta Prefetcher
8
Delta History Table
Page
Num.
Last
Add.
Last 4
Deltas
Last
Predictor
Num. Times
Used
Variable Length Delta Prefetcher
Last Four Prefetched
Offsets
9
Delta Prediction Tables
Highest Priority (t=3)
Lowest Priority (t=1)
Delta(1)
Pred.
Accuracy
8b
8b
2b
Deltas (3)
8b 8b
8b
Pred. Accuracy
8b
2b
64 Rows
per Table
…
Match?
Match?
MUX
Predicted Delta
Variable Length Delta Prefetcher
10
Offset Prediction Table
First Page
Offset
Pred.
Offset
Accuracy
7b
7b
2b
OPT is used only to predict the second access to a page
Variable Length Delta Prefetcher
11
Need for Multiple Tables
Repeating Delta Pattern- (1, 2, 3, 5, 2, 4)…
50%
Accuracy
Table 1
Delta
Pred.
1
2
2
3
3
5
5
2
Table 2
Delta
Pred.
1,2
3
2,3
5
3,5
2
5,2
4
Search for Delta pattern match starts from right most table
Variable Length Delta Prefetcher
12
Looking farther than one Delta ahead
Repeating Delta Pattern- (1, 2, 3), (1, 2, 3)…….
Current Delta
Delta
1
2
Pred.
2
3
Delta
1,2
2,3
Pred.
3
1
3
-
1
-
3,1
-,-
2
-
Variable Length Delta Prefetcher
Degree 1 Prediction
13
Looking farther than one Delta ahead
Repeating Delta Pattern- 1, 2, 3, 1, 2, 3…….
Current Delta
Deg 1 Prediction
Delta
1
Pred.
2
Delta
1,2
Pred.
3
2
3
-
3
1
-
2,3
3,1
-,-
1
2
-
Degree 2 Prediction
Degree 1 Prediction
Use Recursive lookup to look farther than one Delta
Variable Length Delta Prefetcher
14
Case Study: Streaming Workloads
Repeating Delta Pattern- 1, 1, 1, 1, 1…
Table 1
Delta
Pred.
1
1
-
Table 2
Delta
Pred.
-,-,-,-,-
Patterns learned from one page is applied to another
Variable Length Delta Prefetcher
15
Updating the Delta History Tables
Evict Not Recently Used
If Page not
present,
replace
Page
Num.
Last
Add.
Last 4
Deltas
Last
Predictor
Num.
Used
Last 4
Prefetches
Page
Num.
Last
Add.
Last 4
Deltas
Last
Predictor
Num.
Used
Last 4
Prefetches
LLC
Access
If Page
present, add
Delta
Variable Length Delta Prefetcher
16
Updating the Prediction Tables
Last
Predictor
Last 3
Deltas
Last
Add.
Page
Num.
B, C, D
Can the current state predict
Latest Delta?
E
Latest Delta
Delta
D
Pred.
F?
Delta
C,D
Pred.
E?
Table 1
-
Table 2
-
-
-
-
-
Delta Pred.
B,C,D
E?
Table 3
-
Variable Length Delta Prefetcher
If Prediction is Correct
Increment Accuracy
If Prediction of Wrong
Decrement Accuracy
If Accuracy==0
Update + Promote Prediction
If Prediction is Missing
Seed T1 with prediction
17
Populating the Prediction Tables
Delta
1
Pattern
Missing
Pred.
A
Table 1
NRU
Delta
1,1
Table 1
Wrong
Pred.
B
-,Table 2
-,-,NRU
Table 2
Wrong
Delta Pred.
1,1,1
C
Table 3
NRU
If mis-predict, a longer Delta history might be needed
Variable Length Delta Prefetcher
18
Evaluation Methodology
• Simics + USIMM
•
•
•
•
8 RISC cores, UltraSPARC III ISA
3.2 GHz, 4-wide OoO, 128-entry RoB
32 KB I&D L1 caches, 4 cycles
8 MB shared (1MB per core) L2 cache, 10 cycles
• DRAM Specifications
• 2Channels, 2 Ranks per Channel, 8 Banks per Rank
• 800MHz DDR3 DRAM
• SPEC 2006, NPB, and Cloudsuite
• Mix1- milc, astar, lbm, libq; Mix2- xalancbmk, lbm, zeusmp, milc;
Variable Length Delta Prefetcher
19
VLDP Configuration
•
•
•
•
•
Per-Core VLDP
1 Offset Prediction Table, 64 entry
3 Delta Prediction Tables, 64 entries each
16 entry Delta History Table
Only Delta Prediction Tables 2,3 contribute to multi degree prefetch
Offset Prediction Table
128 B
Delta History Table
222 B
Delta Prediction Table
648 B
Total
998 B/Core
Variable Length Delta Prefetcher
20
Speedup
Performance Improvement (Vs No PC)
2,0
1,8
1,6
1,4
1,2
1,0
0,8
FDP
SBP
AMPM
VLDP
VLDP is
6% better than AMPM
9% better than SBP
17% better than FDP
Variable Length Delta Prefetcher
21
Speedup
Performance Improvement (Vs PC)
2,0
1,8
1,6
1,4
1,2
1,0
0,8
SMS
GHB_PC_DC
VLDP
VLDP is
7.1% better than GHB
7.6% better than SMS
Variable Length Delta Prefetcher
22
Coverage
Coverage
120%
100%
80%
60%
40%
20%
0%
FDP
NPB
SMS
CloudSuite
FDP 16%
SMS 55%
SBP 40%
SBP
GHB_PC_DC
AMPM
Spec2006 Spec2006-Mix
VLDP
GM
GHB 33%
AMPM 49%
VLDP 61%
Variable Length Delta Prefetcher
23
Speedup
Sensitivity to table size
1,03
1,02
1,01
1,00
0,99
0,98
2% increase in performance when DPT size is increased
Variable Length Delta Prefetcher
24
Sensitivity number of Delta Prediction Tables
1,5
Speedup
DRAM Accesses
1,4
1,3
1,2
1,1
1
1DPT_NoOPT
1DPT+OPT
2DPT+OPT
3DPT+OPT
4DPT+OPT
3DPT improves efficiency despite a modest 1% performance improvement by
reducing DRAM requests by 3%
Variable Length Delta Prefetcher
25
Conclusions
• OPT Issues predictions without confirmation
• DPT recognizes Irregular Delta Patterns
• Long delta patterns provide high accuracy
• Less than 1KB per core overhead
• 6% better performance
Variable Length Delta Prefetcher
26
Thank You
Variable Length Delta Prefetcher
27
Download