Exploring Branch Prediction Aditya Akella Shuchi Chawla Jia-Yu Pan A Framework for Branch Prediction Dividing Streams • First level: per branch substream • Finer decomposition – Branch pattern history – Global pattern history – Global path history Loop1: addq r1, r2, r3 addq r2, r4, r5 loop 2: subq r2,r3,r5 beq r5, loop2 bne r5, loop1 Executes twice 1011… 1110… (b1,1)(b2,1)(b2,1)… Path vs Pattern • As expected, path history provides better correlation Predictors • Static vs Dynamic Biased stream: Majority prediction • Adapt to changing bias • Damping: 2 (or more) – bit predictor • Is more better? Static is better on a large number of branches But, dynamic far outperforms static on the rest Adaptive schemes should exploit this Implementation Issues: Limited space • Many branches map to same stream: Aliasing • Mostly a destructive effect • More aggressive schemes may not be necessarily better gshare GAs 6 bits branch identifier 6 bits history 12 bits branch identifier concatenate 12 bit history 12 bits stream identifier 12 bits stream identifier truncate to 1024 streams truncate to 1024 streams Implementation Issues: Limited space • Many branches map to same stream: Aliasing • Mostly a destructive effect • More aggressive schemes may not be necessarily better Cross-procedure Correlation • Some static prediction schemes “forget” history on returning from a procedure call Fun_call: if (x>0) return(1); else return(0); Main: a = fun_call(); if (a>0) goto… Direction taken by the branch inside the procedure completely specifies the direction taken by the branch outside Branch Prediction Using Data Values • Traditional approach – Local history information – Global path and history information – Reducing table interference (better indexing) • Use PCs and branch outcomes as input – Do not contain all the information Misprediction!! How can one improve prediction of such branches? Use Data Values!!!!!!!!!!!! How to Use Data Values • Speculative branch execution • Using data values directly • Second scheme was chosen • Lower-latency prediction • Some branches can use combined predictions • Solution avoids data value prediction Design Problems… (1) • Large number of data values to store… – Soln – store their difference – Branch Difference Predictor (BDP) • Backing Predictor predicts most cases • REP predicts the small remaining subset • Fringe cases with exceptional outcomes • VHT provides ‘tagcheck’. Design Problems… (2) • Delay in updating data values – Out-of-order execution, pipeline latencies • BDC -- most recently committed branch difference (indexed by branch PC) • BCT -- count of the outstanding instances • Staleness of the data value • Operation • Branch Fetch : Increment counter • Branch Commit : Reset counter and update difference cache • REP replacement policy • Replace least successful entries • Randomly ignore a chunk of mispredictions Search for a good predictor design Paper: “A Language for Describing Predictors and its Application to Automatic Synthesis” Designing a predictor • Components of a predictor – Counters and History – More? • Formulate as a search problem – How? Search the design space • Representation – Feedback Loop Model + Primitives Feedback/ Update (U) Input Index (I) • P[w,d](I;U) d Prediction (P) w => parse tree Examples • Onebit[d](PC;T) =P[1,d](PC; T) • Counter[n,d](I;T) = P[n,d](I; if T then P+1 else P-1) • Twobit[d](PC;T) =MSB(Counter[2,d](PC;T)) Genetic Programming • Process – Population, Evaluation, Selection and populate next generation • Operations on parse trees – Replication, Crossover, Mutation, Encapsulation Discussion • Comparable good prediction accuracy • Complex but new components • Rediscover counters and local/global histories • Predictors for other purposes, e.g. indirect jump target prediction End • Comments? • Thank you.