15-740 Computer Architecture Fall 2011 Class Project Analyzing Branch Mispredictions Bernardo Toninho, Ligia Nistor and Filipe Militão Analysis and Results Introduction Branch Prediction: fundamental component of modern pipelined architectures, keeps pipeline full in presence of changes in control flow. System Performance is highly sensitive to predictor accuracy. Mispredictions require flushing the entire pipeline! Clustered Mispredictions Definition: when a predictor is wrong in consecutive guesses. How large and frequent are misprediction clusters? Determine the sets of correlated predictions Correlations are linked to collisions in table index. We decomposed GShare into: Base: BHR⊕ PC % TABLE_SIZE Branch Prediction in a Nutshell Filter a program’s global branch behavior to exploit branch correlations (eg. common history, path). Train a prediction mechanism (eg. 2-bit counter, perceptron) with the filtered behavior stream. Correlation Analysis How can we use this to improve prediction rates? Keep track of past consecutive mispredictions “Flip” prediction when a cluster is found Problem: when to stop flipping? • ideal/optimal: without making errors • worst case: after causing a misprediction Implementations (invert after 2 wrongs): • gshare+: stop when first error is made • gshare++: track cluster size average and stop after sum of average and std. deviation or after mistake Analyzed Predictors Unlimited: BHR ⊕ PC no TABLE_SIZE ambiguity No collisions: (BHR , PC) no XOR ambiguity Path history: (path, BHR , PC) no path ambiguity We decomposed Local into: Limited: (BHR[PC % TABLE_SIZE] ⊕ PC) % TABLE_SIZE Unlimited: (BHR[PC] ⊕ PC) no TABLE_SIZE ambiguity No collisions: (BHR[PC] , PC) no XOR ambiguity BHR = Branch History Register For MM01 (10K predictions) avg. set predictor mispred. # sets size 9.09 1.46 base no collision 11.69 1.105 unlimited 11.69 1.105 11.69 1.095 path best mix 5.99 (3.10) For MM01 (10K Predictions) predictor base GSHARE: global history, combined with the PC to train a table of 2-bit counters. LOCAL: local history of a branch combined with the PC to train a table of 2-bit counters. TAGE: variable length global history combined with PC indexes into several cache-like prediction tables. Questions How do indexing collisions affect predictions? Are there problematic classes of branches? Are the captured correlations useful? Are mispredictions isolated or clustered? Contributions In-depth analysis of “state of the practice” predictors: Global history, 2-bit counter (GShare) Local history, 2-bit counter (PAg-like, Local) • New analysis methods to measure: - Correlation (decomposing into correlation sets) - Training stream characteristics (2-bit counter adequacy) • New predictor to reduce clustered mispredictions [1] An Analysis of Correlation and Predictability: What makes two-level branch predictors work, M. Evers et al., 1998. [2] Two-level Adaptive Training Branch Prediction, Yeh and Patt, 1991. [3] Combining Branch Predictors, S. McFarling, 1993. [4] The L-TAGE predictor, André Seznec, 2007. base no collision unlimited path 682 94.8 1.89 0 0 906 906 914 1.19 0 0 0 0 0 mispred. avg. set size # sets base no collision unlimited 9.69 3.34 2988 94.63 0.3 0.01 2.4 2.36 4153 4232 2.04 0 0 no collision 12.62 12.63 unlimited best mix 8.47 (0.02!) Misprediction Rates correlation set training set base no collision unlimited path training set base no collision unlimited Training Stream Analysis 1-bit counter is a last value predictor N-bit counter is a last value predictor with a slack of N Tolerates alternating sequences Exploits repeating values Analyzed stream characteristics: Length of consecutive repeating values (bursts) Transition (noise) length between bursts Burst value switches Warm-up value sensitivity (how long until initial value is irrelevant?) Branch Classification Classify branches according to their behavior (static, single execution, alternating, loops, block patterns, complex patterns, others): Conclusions Simple methods to exploit misprediction clusters yield small gains. Cluster distribution causes high inversion penalty. Performance gains due to decreasing indexing collisions are not significant overall Irregular/random patterns are generally the main cause of mispredictions (despite being a small class of branches) Training Stream/Correlation Analysis: We observed some instances of destructive interference Speculative predictions due to deep pipelines and superscalar execution cause errors that could have been easily predicted if the updates were committed sooner Initial values affect predictions for a short time 2-bit counter adequately models most patterns