slides

advertisement
1
TAGE-SC-L Branch Predictors
André Seznec
INRIA/IRISA
The TAGE-SC-L branch predictor
Sorry, nothing really new ..
•
TAGE, JILP 2006

•
Considered as state-of-the-art global history predictor
Can be augmented with small adjunct predictors
Loop predictor: CBP-2 (2006)
Statistical Corrector + Loop Predictor,
Global history CBP-3 (2011)
Local history Micro 2011
2
3
Optimized all parameters
•
Number, size, width of the tables
•
Types of the histories for the statistical components
All that for decreasing the
misprediction number by 3% !!
4
PC +Global history
Global, local,
skeleton
histories
(Main)
TAGE
Predictor
Prediction +
Confidence
Stat.
Cor.
Loop
Predictor
TAGE:
multiple tables, global history predictor
5
The set of history lengths forms a geometric series
L(0) = 0
L(i) = a i -1L(1)
Capture correlation
on very long histories
{0, 2, 4, 8, 16, 32, 64, 128}
most of the storage
for short history !!
TAGE:
6
Tagged and prediction by the longest history matching entry
pc h[0:L1]
pc
ctr
tag
u
pc h[0:L2]
ctr
=?
1
tag
u
ctr
=?
1
1
1
tag
=?
1
1
1
1
Tagless base
predictor
pc h[0:L3]
prediction
1
u
7
Miss
Hit
Pred
1
1
=
1
?
1
=
?1
1
1
Hit
Altpred
1
=
?1
8
Prediction computation
•
General case:

•
Longest matching component provides the prediction
Special case:

Many mispredictions on newly allocated entries: weak Ctr
On many applications, Altpred more accurate than Pred

Property dynamically monitored through 4-bit counters
9
A tagged table entry
•
Ctr: 3-bit prediction counter
•
U: 2-bit counters

•
Was the entry recently useful ?
Tag: partial tag
U
Tag
Ctr
10
Allocate entries on mispredictions
•
Allocate entries in longer history length tables

On tables with U unset
•
Set Ctr to Weak and U to 0
•
Limited storage budget:
•

Allocate 2 entries for 256Kbits

Allocate 1 or 2 for 32Kbits
UNLIMITED STORAGE BUDGET:

multiple entries allocated in different tables
11
Managing the (U)seful counter
•
Increment when avoids a misprediction

(Pred = taken) & (Alt ≠ taken)
•
256K: Global decrement if « difficult » to allocate
•
32K: Probabilistic decrement when conflict
•
Unlimited: don’t care
12
Adjunct predictors
•
TAGE tracks strong correlation with the global
branch history
•
Small adjunct predictors to capture some
missed correlation:

Loop predictor

Statistical Corrector
13
The loop predictor
•
Predict loop with constant number of iterations:

16/32 entries

less than 5 bytes per entry

Capture loops with long bodies and/or
irregular internal branches
S: 1.2 % 
M: 1 % 
U:0.4% 
Good tradeoff for the Championship
Implementation: Not that great
14
The Statistical Corrector predictor
•
Branches with poor correlation with global history:

•
Sometimes better predicted by a single wide
PC indexed counter than by TAGE
More generally, track cases such that:

« In this case (PC, history, prediction),
TAGE is likely (>50 %) to mispredict »
Small predictor:
very limited budget for the SC predictor
•
Just track the statistically PC biased branches

•
« TAGE predicts this direction on this
branch, but in most cases this was wrong »
The corrector filter:
A small partially tagged associative table
1.5 % misp.
reduction:
Much simpler than a loop predictor
15
16
Medium predictor
« Statistically » correlated branches:
•Not strongly correlated with the global history, but
exhibit a bias
•better predicted by averaging than tags
neural  tags
Branches correlated with local history,
but irregular global history pattern (on other branches)
•TAGE does not learn the pattern
MultiGehl Statistical Correlator
Predictor
TAGE
+
H + LH
PC
Pred
Gehl-like
Prediction +
ctr value
Stat.
Corr.
H
PC
Local hist.
17
18
Why does it work
•
The bias table indexed with PC+TAGE output:

Correct (most of the time)
 High counter value


Dominates, not many updates
Wrong
 Other counters can be trained

Correlation (if it exists) can be captured
MultiGehl Statistical Correlator
Predictor for the Championship
+ RAS associated history
+ 2 different local histories
+ simple choser
6.8 % misp reduction
TAGE
Prediction +
ctr value
Stat.
Corr.
H
PC
Local hist.
19
20
« Realistic » 256 Kbits TAGE-SC-L
« Only »
•12 equal size TAGE tables +
•(local hist., global hist.) 4-tables SC
•+ loop predictor
•No history tuning
Only 2.8 % extra mispredictions
21
SC for Unlimited predictor
•
GEHL based SC predictor:

Use any form of history information

Very long global

Mutiple local

« Skeleton » global history


ignore some branches
Recycle old ideas from the MAC-RHSP
predictor (2004)
22
SC for unlimited predictor
•
460 predictor tables + 10 choser tables

•
Globally about 20 % less misp. than TAGE
alone
If one removes only :

The bias: 1.6 % for a single table

All global history components: 3.7 %

All local history components: 3.9 %

The choser: 3.2 %
23
Conclusion
•
•
TAGE-SC-L fits (nearly) all storage sizes

32Kbits ≈ 64Kbits CBP1 champion on CBP1 traces

256Kbits ≈ 512Kbits CBP3 champion on CBP4 traces
Unlimited predictor:

poTAGE-SC does better
Download