slides

1 TAGE-SC-L Branch Predictors André Seznec INRIA/IRISA The TAGE-SC-L branch predictor Sorry, nothing really new .. • TAGE, JILP 2006  • Considered as state-of-the-art global history predictor Can be augmented with small adjunct predictors Loop predictor: CBP-2 (2006) Statistical Corrector + Loop Predictor, Global history CBP-3 (2011) Local history Micro 2011 2 3 Optimized all parameters • Number, size, width of the tables • Types of the histories for the statistical components All that for decreasing the misprediction number by 3% !! 4 PC +Global history Global, local, skeleton histories (Main) TAGE Predictor Prediction + Confidence Stat. Cor. Loop Predictor TAGE: multiple tables, global history predictor 5 The set of history lengths forms a geometric series L(0) = 0 L(i) = a i -1L(1) Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage for short history !! TAGE: 6 Tagged and prediction by the longest history matching entry pc h[0:L1] pc ctr tag u pc h[0:L2] ctr =? 1 tag u ctr =? 1 1 1 tag =? 1 1 1 1 Tagless base predictor pc h[0:L3] prediction 1 u 7 Miss Hit Pred 1 1 = 1 ? 1 = ?1 1 1 Hit Altpred 1 = ?1 8 Prediction computation • General case:  • Longest matching component provides the prediction Special case:  Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred  Property dynamically monitored through 4-bit counters 9 A tagged table entry • Ctr: 3-bit prediction counter • U: 2-bit counters  • Was the entry recently useful ? Tag: partial tag U Tag Ctr 10 Allocate entries on mispredictions • Allocate entries in longer history length tables  On tables with U unset • Set Ctr to Weak and U to 0 • Limited storage budget: •  Allocate 2 entries for 256Kbits  Allocate 1 or 2 for 32Kbits UNLIMITED STORAGE BUDGET:  multiple entries allocated in different tables 11 Managing the (U)seful counter • Increment when avoids a misprediction  (Pred = taken) & (Alt ≠ taken) • 256K: Global decrement if « difficult » to allocate • 32K: Probabilistic decrement when conflict • Unlimited: don’t care 12 Adjunct predictors • TAGE tracks strong correlation with the global branch history • Small adjunct predictors to capture some missed correlation:  Loop predictor  Statistical Corrector 13 The loop predictor • Predict loop with constant number of iterations:  16/32 entries  less than 5 bytes per entry  Capture loops with long bodies and/or irregular internal branches S: 1.2 %  M: 1 %  U:0.4%  Good tradeoff for the Championship Implementation: Not that great 14 The Statistical Corrector predictor • Branches with poor correlation with global history:  • Sometimes better predicted by a single wide PC indexed counter than by TAGE More generally, track cases such that:  « In this case (PC, history, prediction), TAGE is likely (>50 %) to mispredict » Small predictor: very limited budget for the SC predictor • Just track the statistically PC biased branches  • « TAGE predicts this direction on this branch, but in most cases this was wrong » The corrector filter: A small partially tagged associative table 1.5 % misp. reduction: Much simpler than a loop predictor 15 16 Medium predictor « Statistically » correlated branches: •Not strongly correlated with the global history, but exhibit a bias •better predicted by averaging than tags neural  tags Branches correlated with local history, but irregular global history pattern (on other branches) •TAGE does not learn the pattern MultiGehl Statistical Correlator Predictor TAGE + H + LH PC Pred Gehl-like Prediction + ctr value Stat. Corr. H PC Local hist. 17 18 Why does it work • The bias table indexed with PC+TAGE output:  Correct (most of the time)  High counter value   Dominates, not many updates Wrong  Other counters can be trained  Correlation (if it exists) can be captured MultiGehl Statistical Correlator Predictor for the Championship + RAS associated history + 2 different local histories + simple choser 6.8 % misp reduction TAGE Prediction + ctr value Stat. Corr. H PC Local hist. 19 20 « Realistic » 256 Kbits TAGE-SC-L « Only » •12 equal size TAGE tables + •(local hist., global hist.) 4-tables SC •+ loop predictor •No history tuning Only 2.8 % extra mispredictions 21 SC for Unlimited predictor • GEHL based SC predictor:  Use any form of history information  Very long global  Mutiple local  « Skeleton » global history   ignore some branches Recycle old ideas from the MAC-RHSP predictor (2004) 22 SC for unlimited predictor • 460 predictor tables + 10 choser tables  • Globally about 20 % less misp. than TAGE alone If one removes only :  The bias: 1.6 % for a single table  All global history components: 3.7 %  All local history components: 3.9 %  The choser: 3.2 % 23 Conclusion • • TAGE-SC-L fits (nearly) all storage sizes  32Kbits ≈ 64Kbits CBP1 champion on CBP1 traces  256Kbits ≈ 512Kbits CBP3 champion on CBP4 traces Unlimited predictor:  poTAGE-SC does better

slides

Related documents

Products

Support

slides

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib