Branch Classification: a New Mechanism for Improving Branch Predictor Performance Po-Yung Chang Eric Department of Electrical Engineering The Ann Tse-Yu Hao University Arbor, and of Clara, 48109-2122 CA Abstract 95051 predictor minimizes dicting is wide agreement impediments to the pipelined tive branches execution one of the most performance superscalar ditional that in the seems branch problem, branch is mispredicted. is the instruction speculative Therefore, predictor; 9570 accuracy This paper proposes branch prove the accuracy allows associated dict its with predictor predictor is best suited. scheme, the predictor this predicts This a hybrid several reported Keywords: speculative in the such each that branches suggests any for one that im- cycles clas- branch classification, ipc denotes are executed of branches branch can one method branch predictors achieve higher have This super- paper sults. of instructions if they to the reduce interrupt instruction the performance the pipeline steady [4]. Section num- that r * ipc = 0.9, a a prediction as a technique of branch that predictors. Using we analyze hybrid several predictors any branch that predictors that as follows: model, schemes, analyzes and some 2 presents Section proposes schemes, 4 provides Section classification. 3 describes previously several pro- new hybrid presents simulation concluding remarks. re- Branch 2 Classification A branch Branch to copy of the instructions, of supply into fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. MICRO 27- 11/94 San Jose CA USA 0 1994 ACM 0-89791 -707-3/94/001 1..$3.50 Permission ratio reported. classification prediction of p denotes 97.770. than of branch prediction as number 10~0 requires propose accuracy the and classification, and and of instructions C=5 classification is organized concept branch Introduction can significantly algo- defined of total number than accuracy been the number than of branch previously a branch processors branch misses is misprediction, For branch previ- cache T denotes cycle. the as C’ denotes of less higher posed Branches if that microproces- where the p of greater improve the pipelined in- work prediction penalty the average per We introduce pro- such to a branch over penalty accuracy scalar. 1 branch branch accuracy, and predictor, performance, due to pre- predictor branch processor wasted the prediction it stalls * r * ipc), branch and the all speculative away accurate the ber literature. execution, ignore to be which Because fetching to a high-performance confiicts, C * ((l-p) classification achieves branch we bus component predictors, predictor than suited a hybrid branch branch accuracy best approach, those paper Branch instruction st ails by pre- and be thrown a very is important If accu- help path. must of pipeline branch sor, if a enough. to predictors. branch branch Using analyzes prediction ously of branch can be constructed branch poses classification an individual direction. is not good of the that a branch rithm to the a very from beyond the number direction is mispredicted, Specula- is discarded we need branch future of choice work the structions of con- stream. rate sification and presence to be one solution but important of current processors Science Corporation* Santa There Computer Patt Michigan Michigan Intel Yale Yeh* without classification sets or branches A can good sessing class; be done classification similar thus, of a class 22 branch dynamic once partitions The statically scheme we can partitions the same dynamic optimize of dynamically. branches into the branches partitioning and/or behavior we understand of branches, a program’s classes. for posbranch behavior this class. For example, predict handling the the compiler branches or of these can try the to eliminate hardware branches can (e.g. execute 3.2 hard-t~ special both paths their classification diction get. accuracy branch class class. For can be used to maximize obtained Prediction from accuracy with the we and that are more hardware predictor use more difficult each that resources static to to to predict. SC2, piler collect replaces tions the the with the code function branch predictor to prediction hardware. a per-branch not changed branch the The these enough approach the path. this mine whether schemes method, program This a real prediction the design. executable on executes down is not files results 3.3 Stahic lisp, eqntott, shows vided not from and testing Because with different the eqntott data SPECint92 paper are for SPECint92 compress, the training benchmarks. in this the suite: the six inespresso, Table data sets for each of these and suite for gee. 1 and they lisp or are use inputs not ford 008.espresso Cps Data are training. nine I bool. eq.2 026.compress \ gcc source 072.sc I loada2 I loadal 085.gcc I jump.i stint. taken 1: Training and Testing Data [1]), queens branch detect 1 and address [9]. 85%-90%, have counter-based i 1 formation, Sets of Benchmarks To an even about further 227 boolean equations with 37 different derivative variables benchmark rate more His accurate uses branch fast level direction technique predictor uses the hybrid for and more of branch target 2-bit history inac- McFarling two up-down its and [6, 10, 11, 12]. is currently predictor about bit prediction combines making logic instruction history accuracy, predictor methods important accuracies, keeping that and pre- hardware of the for simple By technique branch; not- their One stage prediction of which taken run-time, 90 Y0-95Y0, can be attained a new for each 11]. prediction [9]. higher be prediction 10, reported improve track 9, the schemes Stan- use hardware at dynamic 7, branch been that in to by studying buffers, predict (as predicting algorithms at an early High predict (as in Motorola history directions target branches opThe [3, 5, 9]. prediction [2, to branches whereas execution Many is taken not-taken all as branch direction. always accuracy branch use information such prediction are 35% studied predictors. of a symbolic Work branch always about behavior. pipeline, in or branch ‘been class, predict 65~0 future to keep version Pratt. Classes algorithms Predicting branch record proposed 1Common Lisp writ ten by Vaughan 100% execution, branch [8]). Dynamic to to int-pri-3.eqn Data 95% <= Prediction branches achieves curacy, Table <= Pr(br) 2: Static to of about predict Testing 023.eqntott kind achieves pro- which bca deriv.cll 022.li < ~r~br) < prediction profiles, MIPS-X have Training 90% program conditional vious Benchmark predictors. I Branch branch MC88000 SC, sets, ! 95% before simplest from SC6 Previous accuracy. presented I pre- be able to measure all programs known will accu- accurate Benchmarks The classification SC5 Table codes teger hybrid is actual A more rates whether ! com- the however, deter- taken and I Descriptions L program return as mixedwe behavior Classes - simulated predictors always approach, to fine-tune of functions SC4 branches similar previously refer classification, have on this will calls we can branch we as as mostly-one- that based is classes branch gathered 3.1 dynamic branches condi- the the this branches similar section, on as shown branch branches and SC3 and With of to these SC6 based com- functions of behavior uses instrumented per-branch state of various basis. branch of these the because the the to generate Using conditions; correct rate update performance behavior, calculates Each simulator and the that calls. prediction pare branch following and branches. to outperform dynamic In the classified by profiling partitioning refer ion branches direction this we will SC5, are collected Because classes. have Experiments To 2. SC1, branches taken-rate statically, direct predictor experiment, Table diction 3 Classification dynamic done budfor a simple dedicate in the pre- by associating suitable could branches branches a given is increased most example, for predictable handle In our of branch). Branch Branch case the [6] branch counters more then prediction. accu- uses the In this bining paper, branch different branch formation but brid branch with the we introduce predictors. classes also thermore, then suitable our several based branch on not only can for run-time diction hy- branch class with that class. Fur- to fill the advantages of predictors. In this branch and Results predictor a given implement section, we will classification. performance In our branch bit predictors address tern Two-Level Predictor and using a modified ORs the implementations GAg the global scheme history the branch history Advantages In our of study, taken-rates static are Branch more (SC5,SC6) performance is (SC1). example, the to select branch 1 ‘s. grouped together. Because dynamic branches in ● different classes optimal each have branch of the on each branch prediction these analyze different scheme classes. In performance static class dynamic this may the the be different section, of branch to show behavior, we prediction benefits entry. We short branch of 4, 5, and the Mostly One-direction of long Branches 1, 2, and on GAs, and the curve shows tern length ures, branch registers taken rence of the direction (1% with to for as the by one, schemes with in predicting (SC1). With is required the behavior e.g. a leading will the the of the a repeating 1 followed require “l”. the of same mispredictions the same by more PHT having a PHTs. Branches Unlike registers. between having more a longer branch branches are more register. with long in predicting branches the these 10% patterns By and 50%, to these due a longer history, likely branch with branch branch execution Thus, be- schemes Because distinguish history are mostly-one- mixed-direction execution with accuracy the by prediction taken-rates schemes fective prediction taken-rates states. histories to remain the in branch histories mixed-direction preare ef- branches. history The scheme, odd a occur- mostly-one- . history by ninety-nine of branch diction fig- mostly-not- branch 100 branch occurrence the history we can of correlated of pat- short PAs im- the predicted see more In addition, the dynamic characteristics. history, branch in these the to capture dynamic one the number As shown branch the Each curve, prediction capture respectively. each doubles. ac- PAs, accuracy decreases taken-rate) the On branches; pattern prediction using schemes are effective history average prediction tables branches long order gshare cost. history the benchmarks the plementation history 3 show integer mostly access conflicts and 50$Z0 (SC3). dynamic we will Figures consist causing 6 show branches, branch have (SC1,SC2,SG5,SC6) curacy mostly Mixed-direction whose 107o and direction of above taken, accessing register branches are to these the of branches tween classification. Analysis reduce history are effectively ● possibly on the taken will tend branches can Analysis Figures and schemes performance PHT, to different of mixed- mentioned if branches will the of the SC1 branches. that register branches and mostly to branches of each the accessed. (SC3,SC4) for report on history are performance to that similar history These due similar For The predic- decreases more entries between occur. is similar The also branches branches branches the because PHT conflicts (GAs), entry. with history more PHT SC2 branches Classification branches branch to interference scheme, scheme reg- mapped of of different gshare mostly-one-direction part 3.4.1 of the in for scheme. are amount GAs longer history prediction branches the decreases results a shorter history in ocpre- Furthermore, the the pattern means direction Per- exclusive- address table longer the fewer Branch tables [6]) that with Thus, a set of pat- Two-Level (gshare with pattern using Global of pattern history appropriate the 2- Branch are the accuracy in thus, As tion patterns of the Two- They Predictor (PAs), a set predictors. guided(PG), the is reduced. design Two-Level are studied. Branch tables the the at ion cost, PHTs, also it takes history time. PHTs PHT; between of single-scheme profile and different Predictor history branch different are simulated: Three Branch advantages present hybrid (2bC), the then three counter Predictor. show will of several experiment, up-down Level first We more same A short predic- taken The scheme warm-up more aken, if this because history. a faster the GAs history the pattern not-t even is mispredicted. of the branch means mostly- high branch accuracy With Simulation are remains of the longer ister 3.4 branches accuracy currence in- Our these tion each for combine since cominto information. associates predictor technique method are partitioned compile-time predictor most a new Branches histories 0s performance of mostly taken branches tioned above (SC3). }Veight of Branch Let us define the GAs, PAs, (SC4) is similar Classes dynamic weight and gshare on to that men- of a branch class as in No. However, o f dun. total 24 brns number belonaina of dynamic to that brn branches class “ Static Class 1: 0 c= Pr(br) -. ..-. Class x----.lu ,---- c= stamc c= .05 + +— 0 .-..-0 ❑ - - -n Ph h 01.93 ~ 0.90 < rr(tw) .XI + — 35KB 0 PhlOKB PAS 4KB + .-..-0 PAS 3SKS PAS1OKB PAS4KB ❑ -––a s 01.87 -= .L1 3 & 01.84 01.81 + 01.78 .I!!!!! 135 Figure 1: Not-taken 0.75 !!!! !!!! !!! I 7 9 11 13 1s 17 Branch History Length Accuracy Prediction of PAs 4!!! !!!! 13 57911131517 !!!! !!!!!4 Branch History Length on the Figure Mostly 4: direction Accuracy Prediction of PAs the on Mixed- Branches Branches Static Class 1: 0 c= Pr(br) Static c= .05 +— 0 .-..-0 ❑ - – –0 V Class 3: .10 c Pr(br) c= .50 +— + GAS32KB GABSKB GAs2KB 0.–..–0 ❑ ---0 + GA, 32K33 GAs8KB GAs2KB .s. m -“+ b \ “m \ I I J!!!!!!!! 79 135 11 I ?! I!!, 13 15 4!!! !!!! !!!! 9 Branch History Length Figure 2: Not-taken of GAs on the Figure Mostly Class 1: 0 c= Pr(br) 0.995 Static +— + W 0 .-..-0 cl- – –n 32KB gsbu SKB wk=2KB Accuracy of GAs on ~ 0.93 ! 0,90 Class 3: 0.10 c Pr(br) the Mixed- c= 0.50 +— 0 SI --..-0 ❑ ---n + Sk dulcsim Wlulczm 4 I 0.985 Prediction Branches c= .05 & < 5: direction Branches Static > Accuracy Prediction 1!!!!1 11 13 15 17 Branch History Length 1357 17 O=e?.o ~ .a 1...’” 0,87 .s ! 0.97s [v ~ ;~””Q- 084 0,.81 0.%5 0,7s [r t d 0,75 0.955 ~ 135 7 9 11 13 15 I I 135 17 Figure Not-taken 3: Prediction Accuracy of gshare I I I I I 79 I 11 I I 13 I I 15 I I I 17 Branch History Length Branch History Length Figure I on the Mostly directic,n Branches 25 6: Prediction Branches Accuracy of gshare on the Mixed- 32KB Figure 7 shows class. the dynamic Approximately are mostly-one-direction are mixed-direction mance of diction accuracy a branches of each branches; branches. predictor and weight 50’%0 of all dynamic is on the the other Thus, the dependent on both the static branches 50~o perfor- on its pre- mostly-one-direction mixed-direction branches. Em=35K” 11 13579 13 15 17 Branch History Length (k) Avgcanqapwh Figure of dynamic branches in each static class We have ration for from shown the that of the for both the average The = where of pattern ber of entries the number The of right-most GAS(17, [12]. Let branch the PA(x) denote the PA(GAs(5, of e.g. PA(GAs(15, 210)) sensitive GAs PA(GAS(15, < as to scheme, 4)) The the e.g. those accuracy 212)). tables, - PA(GAS(13, the of history I I I I I 13 History I I I I 15 17 Length schemes (k) with different branch history length 4)) 11 15 Length 17 (k) Figure the 10: gshare with different branch history length branch increases as With 28)) a fixed prediction ac- register Figures in- 1, 2, 4, <... < figuration scheme is mostly-one-direction that of branches. < configuration 4)) PAs 13 History in of fixed PA(GAS(13, the 13579 Branch presented history of PA(PAs(15, I :E)=3”’ this 1, 216). PA(GAS(5, PA(GAs(5, branch length accuracy a With performance I t is of GAs( accuracy < 9: Global performance prediction length Figure history example, point of e.g. the For the increases, history num- and x. PA(GAS(11,4)) 4)). the num- prediction match prediction PHTs pattern increases creases; the scheme of cost. p is the left-most the results I Branch b is the gshare. accuracy shows Our length, curacy less 1). number number the point I 11 ranging table, 9 shows prediction I I 13579 (bits) length, The GAS2KB (bits) history Figure GAs. I 0.89 or using X 2) (PHTs), in - - -0 ❑ the per- hardware x 2) register branch I (bits) Zk (2’ entries in prediction history < PHT size X $ 0.91 0.90 PAs, length is estimated tables the the (P + history curve shows + different 10 show indicates +(px2~x2) k) k in a 32 K-byte curve X history of highest (b t) = k is the ber k = with 0.93 these GAs, at a fixed schemes 0.94 0.92 [12]: p) gshare(k, history in the graphs of a predictor equations Pk(k, branch the history length 0.95 optimally 8, 9, and using predictor cost GAs(k,p) of the curve of a branch hardware the accuracy with Thus, Per-address 0.96 ; & configu- be configured Figures history is different branches. cannot prediction scheme following predictor branches of branches. 1 to 18. Each formance optimal mixed-direction predictors types gshare the mostly-one-direction single-scheme from that 8: branch ~ “ s g e g x 7: Percentage Figure than - PA(PAs(13, 4)) direction 4)). 26 of GAs and and branches, branch also and the the optimal for for in figures con- both the mixed-direction classification, sub-optimal as shown that is sub-optimal branches Without is 5 show PAs the the gshare mostly-one- 3 and 6. 3.4.2 Combining the Advantages of Different Per-set Predictors In tors this Pamwn section, which we introduce combine the predictors. Wewillfirst tors statically which branch. which namically. tion select a branch branch a branch predictor branch branch predic- predictor branch we summarize for predictor both and hybrid ‘-t dy- with GAs We Multiple have signed the Predictor shown to that optimize branches. types To increase scheme, called GAs.mhl; for the bits for not a enough To better [6] uses branch address. the to select is done to hash to different may only no these branches Figure prediction outperforms gshare gshare the tively utilize the and SC6 to predict the is able are .015 and and SC6 - X GAs.mhl 0 .. .. ... . ❑ gshare. +— GAs + Figure 12: History 2S6 lK 16K 4K Predictor Performance Size 64K (bytes) of GAs with Multiple Branch Length rlcw _ 100%? the branches run and, this case, scheme static that branches. sizes, .0098 higher branches the best at each 13 shows class most per- and By a short than mixed-direction gshare in prediction his- branches, those respectively. 13: Performance W .X3 of lK =6 “’” Predictors on each static effec- significant prediction W and more GAs.mhl using w. GAs, GAs.mhl they scl Figure GAs.mhl, class. “- GAs.mhl Figure mostly-one-direction achieve GAs.mhl predictors because The of shows of lK-byte between to His- This from In and gshare. PHTs. is on SC1 SC1 predictor GAs difference GAs.mhl these each formance Branch set used testing the the patterns some figure all GAs outperform Multiple branch data performance of accuracy on and GAs.mhl history with w gshare the the This For both the prediction with history. gshare. cost. the entry. the :L each with information. accuracy hardware of GAs x - - there identify PHT run, are predicted and mulfewer is different during 12 compares GAs with history global testing branch 11: Structure Length a long using scheme, history profiled tory branches. branch, global information actual a long history and GAs Because be executed have global to appropriate entries. in the tQus, is using global . . pre- branches, each the . ,,, on both length, are bits the frequent profile used we history the PHT to gather one of Because As in the gshare exclusive-ORa address structure both k both branch mixed-direction identify k XOR ,_J_. [~ Figure de- history short mostly-one-direction be scheme that a new branches the the Length accuracy propose length. the branch. on on uses I& I mostly-one-direction uses multiple it 11 shows may tory accuracy the prediction for history and is not we which history Figure with scheme mostly-one-direction branch tiple History GAs and of branches, diction Branch prediction mixed-direction s ) —. .. .. i’Hfi w El I----iv Selection the History Tables (SPHTS) ‘II Pc Branch -.—. ClMl, predic- schemes. Static Branch c~ each design statically these Global Bransh History Rej3ister (G13HR) predic- of different hybrid ahybrid Finally, 3.4.2.1 ● advantages introduce Wethenshow selects hybrid e0iP(8) accuracies of gshare For the the is due mostly-one-direction branches fewer and PIIT entries between the direction branches branches. 27 branches, accuracy pattern result that improvement the fact are now history and slight to in of of the fewer the that hashed the into conflicts mostly-one- mixed-direction Combination ● of static and dynamic predic=. 0.96 i < 0.95 t ors The static predictors can mostly-one-direction dictors tor for can those be optimized for PG+gshare scheme to statically the gshare the If SC1 predicting the the profile guided and SC6 branches A 0.92 0.91 / 9 the I I 0.90 I 6$ training run, nated for then the predicting dynamic this predictor branch the Figure 14 shows and and 16 show and GAs.mhl that on For guided predictor and and SC6 can branches because or mostly not-taken. accuracy GAs.mhl costs. on the PHT SC1 especially SC6 of Static Class 1: 0 <= pr(br) the mixed-direction achieves than a hybrid mixed-direction tween the branches, direction branches ing only with correlated the the im- PG+gshare’s branch gshare scheme and exists. Also, branches, likely Figure 15: Performance mostly not-taken Up timal Static to this be- Static mixed- ~ 0.93 by deal- ~ $ 0.92 histories to remain dictor [7]. performance tor. We uses of both then propose both to further each dynamic improve statically branch. predictors dynamically that we have for branch Predictor In this selected section, types of a new and prediction the g O.go ~ 0.89 Class Hybrid Predictor branch predictor predictor of 0.841 predictor is to use 2-bit ters 2bC) doing (i.e. better to keep [6]. saturating Specifically, .50 A .* - .-A x - - PG+gshare - x GAs.mtd x ! I Figure predic- I t I lK 16: Performance mixed-direction design I I I I 16K 4K Size I 64K @ytes) of PG+gshare and GAs.mhl on branches branch from direction optimal up-down BD I 2S6 selection the of which let c= ,/ Predictor Predictor selecting track pr(br) , actual Dynamic of dynamically .05< ,/ from P1 denote 1, and predictor 2. The or decremented predicted P2 denote the counters based on the direc- predicted can be inrule shown 3. coun- predictor denote in Table direction, predictor cremented method on compare Selection One GAs.mhl .*..4$ 64 accuracy. with 3: =A_ pre- tion ● and op- method we first hybrid static of PG+gshare 0.91 in the the optimal of hybrid (bytes) Selection Another is to select Size the register. Dynamic point, predictor combining the plus PG+gshare GAs.mtd branches 0.88 E 3.4.2.2 - x I in predicts contention history A - t’ 1’ accuracy PHT are more --- I reduce branches mixed-direction branches A x- 1’ PG+gshare only no longer .05 branches. predictor mostly-one-direction <= x be- at lower prediction Because GAs.mhl. PG+gshare higher with outperforms branches, slightly Predictors these taken Predictor For (bytes) Hybrid Classification and the mostly- PG+gshare and Branch 15 conflicts can significantly Thus, Static Size Performance ! 64K profile- are mostly branches scheme, 14: I 16K respec- the predict branches branches ation branches branches, In addition, of a gshare plement SC3 Figure I 4K of PG+gshare accurately these the mixed-direction one-direction Figures accuracy SC1 SC1 I I I outperforms pr~dictors. the prediction tively. tween PG+.gshare single-scheme I lK predictor testing run. G~s.mhl I 2S6 is desig- during PG+gsham GAs.mbl 0.93 and during A – X 0.94 the other executed --- X - - !!$! predic- predict is not .-~ 2 predic- experiment, to dynamically a branch pre- our In predict the static hardware accurately uses scheme branches. the branches. predict Using branches, mixed-direction tor accurately branches. is In our study, the in the fully 28 we associate associative a counter branch with address each entry cache(BAC). BD PI P2 0.965 EI 000 Table — E no change decrement counter 010 increment counter .g .= 0.945 — 011 no change g 0.935 – 0.925- - no change 101 increment counter 110 decrement counter 111 no change 3: Updating rule for .X+$.% — 001 100 a counter found in the BAC predictor to use, which branch is the is then 2bC/gshare A >.+ .4”A’ ~+z f“ ; 0.915. — 0.895> counters A I I ! I I I 2S6 64 corresponding Figure used to determine as shown PAs/gshwe A .. . . . . . A selection fetched, PG+gshare --- K I I IK in Figure 18: Prediction I I 4K I L 16K Predictor When “ . A :&*” 0.905. — predictor A M- - - * A 0.955. 4 Size Accuracy 64K (bytes) of Hybrid Branch Pre- dictors 17. ~ + ---+ ‘“O SC2 0---0 0.8 “: SC3 A ..,-,. A 3 % Scl s---w SC4 A —.—A SC5 0 . . .. 0 SC6 n — 0 overall ‘“6 0.4 ~ 0.2 Predcimn [ Predictor Figure 17: Structure Predictor of Hybrid Predictor with Size (bytes) Dynamic Figure Selection 19: Fraction of gshare usage in the 2bC/gshare scheme We simulated dictors: two the with gshare of these combinations up-down (2bC/gshare) (PAs/gshare), Figure hybrid selection different 2-bit for 2bC/gshare ing the and with and pre- predictor with in the gshare the performance the PG+gshare. following PAs 18 compares predictors scheme of counter static The PAs/gshare predictor hardware cost able to outperform us- than equations: p, a) = (a x 2)+ PAs/gshare(k, p,a) ((2 x 2) k+(px2~ + = (a x 2) + (b x k)+ x2) (p k+(px2k + single-scheme 2k x 2) x x2) For the a is the number dress cache, number brid k is the history of PHTs, the branch and history predictor only considered urations For with predictors PG+gshare scheme. similar the With the PAs implementation than outperforms PG+gshare twice p is the in SPEC counters used in the and and gshare config- 16K PG+gshare the size of either of scheme at as is 29 in- a larger fixed size, increases and, the gshare majority profile-guided 19 shows how predictions in the gshare at 256 the 2bC/gshare portion; bytes. With prediction thus, more scheme. branches on the COst, is not is the in- accuracy Figure by gshare. the study, a lK-entry size of 2bC/gshare predictor, on uses make of predictions are made PG+gshare In this Figure to increasing remains using scheme The a 10W implementation performs PAs or gshare pre- predictor scheme. used scheme. by only branches the of a larger the outperforms counters. was 2bC/gshare the taken a 2-bit 2bC/gshare that the PAs/gshare PAs/gshare gshare predicted bytes, the lK the of gshare cost. larger to outperform benchmarks, predictor and creasing of predictor integer BAC 2!bC portion we predictors benefits size the 2bC/gshare c)ften increased for was smaller is able the as the 2bC/gshare the the cost of For PAs/gshare, same size, the gshare approximately and combining smaller the is, the cost of the hy- predictor. scheme ad- of entries by summing predictors branch length, number That is determined the optimal in the register b is the table. the single-scheme to select of entries hand, gshare of two predictor. clutperforms where larger combination Because a combined an implementa- the PAs/gshare diminishes creases, With bytes, the bytes, PG+gshare. 2bC/gshare(k, 16K C)n the other 16K dictor scheme. below predictors. costs are estimated PAs/gshare tion mostly Since & are 19 shows not- gshare, acCurate PG+gshare mostly-one-direction out- mostly one-direction GAswith GAs.mhl a short PAs/gshare Table 4: Summary mostly 2bC or GAs (selected dynamically) PAs or GAs (selected dynamically) PG 19 also SC4 2bC branches well, shows mixed-direction with Selection In section, we propose that exploits run-time and branch prediction. the a new compile-time predictor branches and mixed-direction the for performance of PG+PAs/gshare. smaller 4K bytes, provides hand, the for optimal PG+PAs/gshare dictors. tion to cost SPEC for 32K bytes, prediction integer gshare 95.2% gee, PG+PAs/gshare 96.91Y0, for GAs. to predictor schemes Summary of Hybrid ones 4 schemes. In many this different 96.4% is able on the to the and report, we present We SPEC92 branches, the 4). were best have branch predictor with selection Table 5 lists several prediction omitted. of predictors. The tion branches the Predictors most combining global prediction then combines success- namic 30 the proposed we showed proposed the predictors. using a short Using and improves a long the rates classifica- history for history for the performance the of Predictors. branch of static model taken branch Branch a hybrid advantages different classification this Two-Level as well of dynamic With that branches history advantages their branches classification performance branch on profiling. mostly-one-direction prediction predictor based during model, branch analyzing for gathered pre- introduced for as a means groups [7]. branch the (bytes) of hybrid static (see Table that as a means 95.7% accuracy (PAs/gshare) hybrid Size. Conclusion We We examined GAs implementa- prediction Branch gshan? + 64K 16K Performance mixed-direction 3.4.2.3 ❑ +— other the many for ful pre- For 96.47% 20: predic- all other as compared contains achieves as compared known of 4K dynamic scheme the PG+PAs/gshare which Figure bytes, a fixed accuracy benchmarks, and benchmark viously with - M PAs/gshare ❑ ... .. .. 20 shows On 4K assist both For than lK Prrdictor mostly- PG+gshare outperforms example, of achieve larger scheme For the performance. predictors &,” PG+gsham scheme Figure the x -- ‘ PG+PAs\gshare A / 256 scheme PAs/gshare branches. tors than the – + --- :li??!E- both to PG+PAs/gshare profile-guided ./’ the A D” *,#(.,...k” branch of using + - - ...m ,A/#;~:., Dy- hybrid information The +:s , ,f-,”$.“~ the on Both advantage dynamically) Schemes A’ Predictor Static (selected Thus, 2bC/gshare Predictor Hybrid Prediction by branches. outperforms Branch or GAs mostly- is outperformed New the branches 40% by predict A and the made branches. namic Hybrid that are can it \ PAs 5: Omitted branches While the one-direction mixed-direction branches dynamically) Schemes PAs/GAs mixed-direction for Prediction (selected 2bC/GAs also the Branch or gshare PAs on History dynamically) ] PAs one-direction the uses (selected GAs one-direction this or gshare PG scheme. predictor PAs PG Figure Branch dynamically) PG+PAs of predictions PG+gshare (selected PG+GAs Table on or gshare of Hybrid PG+PAs/GAs gshare 2bC PG PG+PAs/gshare branches along gshare 2bC/gshare 2bC GAswith History PG PG+gshare branches. mixed-direction branches Branch predictor predictors a profile-guided that and predictor dyfor the mostly-one-direction be dedicated the to dynamic rately the implementation GAs, cost reducing time hybrid and posed of the the miss rate In summary, by both to tecture, com- assist accuracy [4] P.M. the our shown that construction of 96.9170 predictor, effective than currently the [6] Lee S. is one result idea of branch that sity of Michigan. Intel, AT& entific and of our for We would ful HPS research provide, his comments also like at the of our industrial of group June the stimulat- in particular, on work. the reviewers for their help- Smith, T.-Y. the design chitecture, of MIPS-X,” International June Proceedings Symposium on IEEE Predictors”, Equipment Cor- “Reducing the 13th the cost International Architecture, PP.396-404, of the April 88000 RISC family,” 1989. of the of Branch Prediction of the 8th International Architecture, and Y.N. Two-level of on T.-Y. Patt, the 19th June “Alternative 1981. Implemen- Branch Annual Strate- Symposium pp.135-148, Adaptive Computer Yeh and Branch Prediction,” International Architecture, Idih Ar- 1987. 31 Patt, Sym- pp.124-135, May Yeh History”, Symposium November and Y.N. Patt, Predictors ternational Symposium May 1993. on Adap- of the on Micro Zdth archi- 1991. “A that Proceedings pp.257-266, “Two-level Proceedings International Branch Branch Y.N. Prediction,” pp.51-61, T.-Y. namic tradeoffs Computer Study 1992. [12] Annual of design “A Yeh of posium References in Digital Hennessey, pp.26-38, Proceedings tecture, “Architecture Branch Computer “The Proceedings Carlos on this Horowitz, Prediction Design,” 1984. Proceedings ACM/IEEE M. “Branch Buffer TN-36, and J.L. MICRO, tations [11] and Smith, Target 1986. , J.E. [10] the other suggestions. P. Chow Comput- 1993. branches,” tive [1] Archi- of Pipelined January Note on Computer Sci- and suggestions to thank Computer 1981. “Combining Technical gies,” appreciated. for A.J. Branch pp.6-22, C. Melear, [9] partners: acknowledge and on Proceedings in high Univerand is greatly to gratefully they research Hewlett-Packard, Software we wish ing environment Fuentes support Motorola, Engineering In addition, members The T/GIS, ongoing implementation of Pro- 11/780,” Architecture and and [7] S. McFarling predictors. [8] of our computer May Characterization VAX- McGraw-Hill, McFarling, IEEE paper Architecture, Symposium The poration, Acknowledgments This in the Kogge, Symposium performance of Inter- to June 5 “A Clark, 1984. Computer, reduc- of predictors known D. Annual Strategies 12.5Y0. allow Evaluation of the Idth on Computer June [5] J.K.F. pro- WRL will and ers, PP.237-243, PAs, as compared known “An accuracy. bytes, predictor, benchmark, Levy, Proceedings Performance of the llth suitable higher cost of 32K previously we have classification are more achieving intensive [3] J. Emer for dynamically be used a prediction best of the a profile-guided achieved on gee, a branch ing in predictor implementation gshare for can Symposium M. achieved from time national cessor done H. 1989. 16.7%. be and Architectures,” an to 95.2% selection can DeRosa Branch accu- With PG+gshare information execution combination 96.47~o by branch Thus, branch a fixed and each statically. to more as compared rate [2] J.A. can Furthermore, branches. bytes, classification, for and/or With of 32K miss hardware be optimized accuracy, the branch predictor can more predictor. mixed-direction prediction With pile dynamic predictor predict a 96.0% branches, the Comparison use of the Computer Two 20th of DyLevels Annual Architecture, of In-