EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE A Survey on BRANCH PREDICTION METHODOLOGY Prof. Dr. Angkul Kongmunvattana Electrical and Computer Engineering Department University of Minnesota By, *Baris Mustafa Kazar and ˜Resit Sendag Authors are with Electrical and Computer Engineering Department of University of Minnesota *bkazar@ece.umn.edu ; ˜rsgt@ece.umn.edu 05/05/2000 1 Table of Contents: Table of Figures ……………………………………………………………………. 1.Introduction ……………………………………………………………………….. 2.Static Branch Prediction Schemes: A Brief View ………………………………… 3.Dynamic Branch Prediction ………………………………………………………. 3.1 One-Bit Scheme:BHT ………………………………………………………. 3.2 To provide target instructions quickly: BTBs ……………………………… 3.3 Bimodal Branch Prediction (Two-Bit Prediction Scheme) ………………… 3.4 Two-level Branch Prediction ………………………………………………. 3.4.1 Global History Schemes: GAg, GAs, GAp …………………………… 3.4.2 Per-Address History Schemes: PAg, PAs, PAp …………………… 3.4.3 Per-Set Address Schemes: SAg, SAs, SAp ………………………….. 3.4.4 Results ……………………………………………………………….. 3.5 Correlation Branch Prediction ……………………………………………... 3.6 More on Global Branch Prediction: Gselect and Gshare …………………... 3.7 Hybrid Branch Predictors ………………………………………………….. 3.7.1 Where did the idea come from? ………………………………………. 3.7.2 Branch Classification …………………………………………………. 3.7.3 An alternative Selection Mechanism …………………………………. 3.8 Reducing PHT Interference ………………………………………………… 3.8.1 Bi-Mode Branch Predictor ……………………………………………. 3.8.2 The Agree Predictor …………………………………………………... 3.8.3 The Skewed Branch Predictor ………………………………………… 3.8.4 The YAGS Branch Prediction Scheme ……………………………….. 4.Concluding Remarks ……………………………………………………………… References …………………………………………………………………………... 2 Table of Figures Figure 1 Global History Schemes ………………………………………………… Figure 2 Per-Address History Schemes ………………………………………….. Figure 3 Per-Set History Schemes ………………………………………………… Figure 4 2-Level Predictor Selection Mechanism ………………………………… Figure 5 The Bi-Mode Predictor …………………………………………………. Figure 6 The Agree Predictor …………………………………………………….. Figure 7 The Skewed Branch Predictor ………………………………………….. Figure 8 The YAGS Branch Prediction Scheme ………………………………….. 3 1. Introduction This paper covers Branch Prediction Methodology. We focused on dynamic hardware-based prediction schemes in which the hardware rearranges the instruction execution to reduce the stalls rather than the compile-time schemes, which require static branch prediction. Compiler techniques, i.e. static scheduling, are used for scheduling the instructions so as to separate dependent instructions and minimize the number of actual hazards and resultant hazards. On the other hand, dynamic scheduling offers several advantages: It enables handling some cases when dependencies are unknown at compile time (e.g., because they may involve a memory reference), and it simplifies the compiler. Perhaps most importantly, it also allows code that was compiled with one pipeline in mind to run efficiently on a different pipeline. However, these advantages are gained at a cost of a significant increase in hardware complexity. 2. Static Branch Prediction Schemes There are two methods one can use to statistically predict branches: by examining the program behavior (branch direction, etc.) and by the use of profile information collected from earlier runs of the program (branch behaviors are often bimodally distributed – that is, an individual branch is often highly biased toward taken or untaken. Some schemes are Always not taken, Always Taken, Opcode Based (i.e. certain classes of branch instructions tend to branch more in one direction than the other.), Backward Taken and Forward not Taken Scheme (Smith[1]). Taking the direction of the branches into consideration, this scheme is fairly effective in loop-bound programs, because it misses only once over all iteration of a loop. It does not work well on programs with irregular branches. 3. Dynamic Branch Prediction Schemes They are hardware based schemes. The prediction will change if the branches change its behavior while the program is running. 4 3.1 One-bit Scheme The simplest scheme is the Branch Prediction Buffer or BHT, which stands for Branch History Table. A branch prediction buffer is a small memory indexed by the lower portion of the address of the branch instruction. The memory contains a bit that says whether the branch was recently taken or not. The Shortcoming is that even if a branch is almost always taken, it will likely be predicted incorrectly twice, rather than once, when it is not taken. 3.2 To provide target instructions quickly: BTBs The branch performance problem can be divided into two sub-problems. First, a prediction of the branch direction is needed. Second, for the taken branches, the instructions from the branch target must be available for execution with minimal delay. One way to provide the target instructions quickly is to use a Branch Target Buffer, which is abbreviated as BTB (Lee and Smith, 1984). BTB uses 2-bit saturating up-down counters to collect history information, which is then used to make predictions. BTB is a special instruction cache designed to store the target instructions. In this paper, the branch directions will be focused on. 3.3 Two-bit prediction scheme (Smith, 1981) Another name for two-bit prediction scheme is the Bimodal Branch Prediction. It is known that most branches are either usually taken or usually not taken. Bimodal Branch Prediction takes advantage of this bimodal distribution of branch behavior and attempts to distinguish usually taken from usually not taken branches. It makes a prediction based on the direction that the branch went the last few times it was executed. Smith (1981) proposed a branch prediction scheme that uses a table of 2-bit saturating up-down counters to keep track of the direction a branch is more likely to take. In these counters, they are not decremented past 0, nor are they incremented past 3. Each branch is mapped via its address to a counter. The branch is predicted taken if the most significant bit of the associated counter is set; otherwise, it is predicted not taken. These counters are updated based on the branch outcomes. When a branch is taken, the 2-bit 5 value of the associated saturating counter is incremented by one; otherwise, the value is decremented by one. 3.4 Two-level Branch Predictors The two-bit predictor schemes use only the recent behavior of a branch to predict the future behavior of that branch. It may be possible to improve the prediction accuracy if we also look at the recent behavior of other branches rather than just the branch we are trying to predict. These predictors are called correlating predictors or two-level predictors. Two-level Adaptive Branch Prediction is mentioned first in Yeh and Patt, 1991. The first-level branch execution history is the history of the last k branches encountered. The second-level pattern history is the branch behavior for the last j occurrences of the specific pattern of these k branches. Prediction is based on the branch behavior for the last k occurrences of the current branch history pattern. Since the history register has kbits, at most 2k different patterns appear in the history register. Each of these 2k patterns has a corresponding entry in what we called the Pattern History Table (PHT). The firstlevel and second-level pattern history are collected at run-time, eliminating the disadvantages inherent in Lee and Smith's method, that is, the differences in the branch behavior of the profiling data set and the run-time data sets. In Lee and Smith 1981, there is static training scheme, which uses the statistics collected from a pre-run of the program and a history pattern consisting of the last n run-time execution results of the branch to make a prediction. The main disadvantage of this scheme is that the program has to be run first to accumulate the statistics and the same statistics may not be applicable to different data sets. Two-level adaptive training scheme and several other dynamic or static branch prediction schemes such as Lee and Smith's static training schemes, branch target buffer designs, always taken, backward taken, and forward not taken schemes were simulated. It is shown that, the proposed scheme has an average prediction accuracy of 97 percent on 6 nine of ten SPEC benchmarks compared to less than 93 percent for other schemes while having a 3 percent miss rate vs. 7 percent for the other schemes. Variations in Two-Level Adaptive Schemes: (Yeh and Patt, 1993) GAg, GAs, GAp, PAg, PAs, PAp, SAg, SAs, SAp : They classify nine variations of Two-level Adaptive Branch Prediction into three classes according to the way the firstlevel branch history is collected. The abbreviations for these nine cases are as follows: GAg -Global Adaptive Branch Prediction using one global pat tern history table. GAs -Global Adaptive Branch Prediction using per-set pat tern history tables. GAp -Global Adaptive Branch Prediction using per-address pattern history tables. PAg -Per-address Adaptive Branch Prediction using one global pattern history table. PAs -Per-address Adaptive Branch Prediction using per-set pattern history tables. PAp -Per-address Adaptive Branch Prediction using per-address pattern history tables. SAg -Per-Set Adaptive Branch Prediction using one global pattern history table. SAs -Per-Set Adaptive Branch Prediction using per-set pattern history tables. SAp -Per-Set Adaptive Branch Prediction using per-address pattern history tables. 3.4.1 Global History Schemes In the global history schemes, shown in Figure 1 and also called as correlation branch prediction by Pan et al., 1991), the first-level branch history means the actual last k branches encountered; therefore, only a single global history register (GHR) is used. The global history register is updated with the results from all the branches. Since the global history register is used by all branch predictions, not only the history of a branch but also the history of other branches influences the prediction of the branch. Figure1. Global History Schemes 7 3.4.2 Per-address History Schemes (Local) In the per-address history schemes, shown in Figure 2, the first-level branch history refers to the last k occurrences of the same branch instruction; therefore, one history register is associated with each static conditional branch to distinguish the branch history in-formation of each branch. One such history register is called a per-address history register (PBHR). The per-address history registers are contained in a per-address branch history table (PBHT) in which each entry is indexed by the static branch instruction address. In these schemes, only the execution history of the branch itself has an effect on its prediction; therefore, the branch prediction is independent of other branches’ execution history. Figure2. Per-Address History Schemes 3.4.3 Per-set History Schemes In the Per-set history schemes, shown in Figure 3, the first-level branch history means the last k occurrences of the branch instructions from the same sub-set; therefore, one history register is associated with a set of conditional branches. One such history register is called a per-set history register. The set attribute of a branch can be determined by the branch op-code, branch class assigned by a compiler, or branch address. Since the per-set 8 history register is updated with history possibly from all the branches in the same set, the other branches in the same set influence the prediction of a branch. Figure3. Per-Set History Schemes 3.4.4 Results We have characterized the global, per-address, and per-set history schemes and compared them with respect to their branch prediction performance and cost effectiveness. Global history schemes perform better than other schemes on integer programs but require higher implementation costs to be effective overall. Per-address history schemes perform better than other schemes on floating point programs and require lower implementation costs to be effective overall. Per-set history schemes have performance similar to global history schemes on integer programs; they also have performance similar to per-address history schemes on floating point programs. To be effective, however, per-set history schemes require even higher implementation costs than global history schemes due to the separate pattern history tables of each set. As a rule of thumb, with respect to the cost-effectiveness of different variations, PAs is the most cost effective among low-cost schemes. 9 3.5 Correlation Branch Prediction (Pan, So, and Rameh, 1992) It is just like GAp in Yeh and Patt 1991 because the prediction of a branch depends on the history of other branches, and GAs, where addresses that contain branch instructions are partitioned into subsets, each of which shares the same second-level pattern table. 3.6 More on Global Branch Prediction: Gselect (Pan, So, Rameh, 1992) and Gshare, McFarling 1993 Now, it is known what the local branch prediction, which considers the history of each branch independently and takes advantage of repetitive patterns, and global branch prediction , which uses the combined history of all recent branches in making prediction, is. As we can realize the bimodal technique (2-bit counter) works well when each branch is strongly biased in a particular direction. Local technique works well for branches with simple repetitive patterns. The global technique works particularly well when direction taken by sequentially executed branches is highly correlated. It is shown that bimodal predictor performs well for small predictor sizes. Local history predictor is better for larger sizes. It is better to get into Gselect and Gshare a little bit more in detail. Gselect: Global Predictor with Index Selection Global History information is less efficient at identifying the current branch than simply using the branch address. This suggests that a more efficient prediction might be made using both the branch address and the global history. Such a scheme was proposed by Pan, So, and Rahmeh, 92. The counter table is a concatenation of global history and branch address bits. This global scheme with selected address bits is called Gselect. It performs better than either bimodal or global prediction. 10 Gshare: Global Predictor with Index Sharing Global history information weakly identifies the current branch. This suggests that there is a lot of redundancy in the counter index used by Gselect. The observation that the usage of the PHT entries is not uniform when indexed by concatenations of the global history and branch address, led the idea of using the exclusive OR function instead of concatenation to more evenly use the entries in the PHT . 3.7 Hybrid Branch Predictors 3.7.1 Where did the idea come from? Combining Branch Predictors, McFarling 93 Realizing that the different prediction schemes have different advantages, McFarling suggests that one can make use of different advantages combined in a new branch predictor. The combined predictor contains two predictors (Bimodal and Gshare) and an additional counter array which serves to select the best predictor to use (here 2-bit saturating up-down counter is used). It is shown that combined predictor, which keeps track of which predictor is more accurate for the current branch, performs always better than either predictor alone. Combined predictors using local and global branch information reach a prediction accuracy of 98.1% as compared to 97.1% for the previously most accurate known scheme (Yeh and Patt’s 2-level Branch Predictor, 1991). This idea is incredibly useful as machine designers attempt to take advantage of ILP and miss-predicted branches become a critical performance bottleneck. 3.7.2 Branch Classification, Chang, Hao, Yeh and Patt, 1994 Branch classification partitions a program's branches into sets or branch classes. Once the dynamic behavior of a class of branches is understood, optimization for that class can be done. Associating each branch class with the most suitable predictor for that class increases prediction accuracy. 11 The paper gives a new method for combining branch predictors. Branches are partitioned into different branch classes based not only run-time information but also compile-time information. The proposed hybrid branch predictor then associates each branch class with the most suitable predictor of that class. The hybrid predictor design introduced in this paper uses both dynamic and static predictor selection to further improve prediction accuracy. It is shown that, the idea of branch classification will allow construction of predictors that are most effective than currently known predictors. 3.7.3 An alternative Selection Mechanism Selection mechanism is an important fact to increase the accuracy of Hybrid predictors (introducing 2-level Branch Predictor Selector (BPS)). As we know at this point, by using known single-scheme predictors and selection mechanisms, the most effective hybrid predictor implementation can be identified. Chang, Hao and Patt, 1995, introduced a new selection mechanism called 2-level selector, which further improves the performance of the hybrid branch predictor. It is now well known that the two-level branch predictor improves prediction accuracy over previously known single-level branch predictors. The concepts embodied in the two-level predictor can also be applied to the hybrid branch predictor selection mechanism. Figure 4 shows the structure of the 2-level predictor selection mechanism. Figure 4. 2-Level Predictor Selection Mechanism 12 A Branch History Register (BHR) holds the branch outcomes of the last m branches encountered, where m is the length of the BHR. This first level of branch history represents the state of branch execution when a branch is encountered. No extra hardware is required to maintain the first level of history if one of the component predictors already contains this information. That is, if the component predictor maintains a BHR, then the 2-level BPS mechanism does not need to maintain another copy of the BHR; instead, it just uses the component predictor’s BHR. The Branch Predictor Selection Table (BPST) records which predictor was most frequently correct for the times this branch occurred with the associated branch history. This second level of history keeps track of the more accurate predictor for branches at different branch execution states. When a branch is fetched, its instruction address and the current branch history are used to hash into the BPST. The associated counter is then used to select the appropriate prediction. By using the branch history to distinguish more execution states, 2-level predictor selection scheme can more accurately select the appropriate predictions. The performance of 2-level BPS is compared to that of previously proposed selection mechanisms. Although the 2-level BPS mechanism provides a performance increase over the 2-bit counter BPS mechanism of McFarling 93, more effective BPS mechanisms are still required for the hybrid branch predictor to achieve its full potential performance. Target Cache, Chang, Hao, Patt, 1997 In this paper, a new prediction mechanism, the target cache, for predicting the targets of indirect jumps, is proposed. The target cache uses a concept central to the 2level branch predictor: branch history is used to distinguish different dynamic occurrences of each indirect branch. The only difference between Target cache and the pattern history table of the 2level branch predictor is that a target cache's storage structure records branch targets while a 2-level branch predictor's pattern history table records branch directions. 13 By using branch history to distinguish different dynamic occurrences of indirect branches, the target cache was able to reduce the total execution time of perl and gcc benchmarks by 8.92% and 4.27%, respectively. 3.8 Reducing Pattern History Table Interference Since the introduction of the gshare 1993 and the observation that aliasing in the PHT is a major factor in reducing the prediction accuracy studied by several authors such as ………… , several schemes have been proposed to reduce aliasing in the PHT. The main problem reducing the prediction rate in the global schemes is aliasing between two indices, each of which is typically formed from history and address bits, that map to the same entry in the Pattern History Table (PHT). Since the information stored in the PHT entries is either “taken” or “not taken”, two aliased indices whose corresponding information is the same will not result in mispredictions. We define this situation as neutral aliasing. On the other hand, two aliased indices with different information might interfere with each other and result in a misprediction. We define this situation as destructive aliasing. 3.8.1 The Bi-Mode Branch Predictor, Lee, Chen, Mudge 1997 The bi-mode predictor, shown in Figure 5, tries to replace destructive aliasing with neutral aliasing in a different manner. It splits the PHT table into even parts. One of the parts is the choice PHT, which is just a bimodal predictor, an array of two bit saturating counters, with a slight change in the updating procedure. The other two parts are direction PHTs; one is a “taken” direction PHT and the other is a “not taken” direction PHT. The direction PHTs are indexed by the branch address XORed with the global history. When a branch is present, its address points to the choice PHT entry, which in turn chooses between the "taken" direction PHT, and the “not taken” direction PHT. The prediction of the direction PHT chosen by the choice PHT serves as the prediction. Only the direction PHT chosen by the choice PHT is updated. The choice PHT is normally updated too, but not if it gives a prediction contradicting the branch outcome and the direction PHT chosen gives the correct prediction. As a result of this scheme, branches, which are biased to be taken, will have their predictions in the “taken” direction PHT, 14 and branches, which are biased not to be taken, will have their predictions in the “not taken” direction PHT. So at any given time most of the information stored in the “taken” direction PHT entries is “taken” and any aliasing is more likely not to be destructive. The same phenomenon happens in the “not taken” direction PHT. The choice PHT serves to dynamically choose the branches’ biases. In contrast to the agree predictor, if the bias is incorrectly chosen the first time the branch is introduced to the BTB, it is not bound to stay that way while the branch is in the BTB and as a result pollute the direction PHTs. However, the choice PHT takes a third of all PHT resources just to dynamically determine the bias. It also does not solve the aliasing problem between instances of a branch which do not agree with the bias and instances which do. Figure 5. The Bi-Mode Branch Predictor 3.8.2The Agree Predictor, Sprangle, Chappel, Alsup, Patt, 1997 The agree predictor, shown in Figure 6, assigns a biasing bit to each branch in the Branch Target Buffer (BTB) according to the branch direction just before it is written 15 into the BTB. The PHT information is then changed from “taken” or “not taken” to “agree” or “disagree” with the prediction of the biasing bit. The idea behind the agree predictor is that most branches are highly biased to be either taken or not taken and the hope is that the first time a branch is introduced into the BTB it will exhibit its biased behavior. If this is the case, most entries in the PHT will be “agreeing,” so if aliasing does occur it will more likely be neutral aliasing, which will not result in a misprediction. It is one of the first two level scheme to take advantage branches’ biased behavior to reduce destructive aliasing by replacing it with neutral aliasing. It considerably reduces destructive aliasing. However, there is no guarantee that the first time a branch is introduced to the BTB its behavior will correspond to its bias. When such cases occur, the biasing bit will stay the same until the branch is replaced in the BTB by a different branch. Meanwhile, it will pollute the PHT with “disagree” information. There is still aliasing between instances of a branch, which do not comply with the bias and instances, which do comply with the bias. Furthermore, when a branch is not in the BTB, no prediction is available. Figure 6. The Agree Predictor 3.8.3The Skewed Branch Predictor, Michaud, Seznec, Uhlig, 1997 The skewed branch predictor, shown in Figure 7, is based on the observation that most small, but because of a lack of associativity in the PHT (the major contributor to aliasing is conflict aliasing and not capacity aliasing). The best way to deal with conflict 16 aliasing is to make the PHT set-associative, but this requires tags and is not costeffective. Instead, the skewed predictor emulates associativity using a special skewing function. Figure 7. The Skewed Branch Predictor The skewed branch predictor splits the PHT into three even banks and hashes each index to a 2-bit saturating counter in each bank using a unique hashing function per bank (fl, f2 and f3). The prediction is made according to a majority vote among the three banks. If the prediction is wrong all three banks are updated. If the prediction is correct, only the banks that made a correct prediction will be updated (partial updating). The skewing function should have inter-bank dispersion. This is needed to make sure that if a branch is aliased in one bank it will not be aliased in the other two banks, so the majority vote will produce an unaliased prediction. The reasoning behind partial updating is that if a bank gives a misprediction while the other two give correct predictions, the bank with the misprediction probably holds information, which belongs to a different branch. In order to maintain the accuracy of the other branch, this bank is not updated. The skewed branch predictor tries to eliminate all aliasing instances and therefore all destructive aliasing. Unlike the other methods, it tries to eliminate destructive aliasing 17 between branch instances, which obey the bias and those, which do not. However, to achieve this, the skewed predictor stores each branch outcome in two or three banks. This redundancy of l/3 to 213 of the PHT size creates capacity aliasing but eliminates much more conflict aliasing, resulting in a lower misprediction rate. However, it is slow to warm-up on context switches. 3.8.4 The YAGS Branch Prediction Scheme, Eden, Mudge, 1998 The motivation behind YAGS is the observation that for each branch we need to store its bias and the instances when it does not agree with it. It is shown in Figure 8. If we employ a bimodal predictor to store the bias, as the choice predictor does in the bi-mode scheme, than all we need to store in the direction PHTs are the instances when the branch does not comply with its bias. This reduces the amount of information stored in the direction PHTs, and therefore the direction PHTs can be smaller than the choice PHT. To identify those instances in the direction PHTs we add small tags (6-8 bits) to each entry, referring to them now as direction caches. These tags store the least significant bits of the branch address and they virtually eliminate aliasing between two consecutive branches. When a branch occurs in the instruction stream, the choice PHT is accessed. If the choice PHT indicated “taken,” the …………………….. Figure 8. The YAGS Branch Prediction Scheme 18 4. Summary and Conclusion In this paper, the branch prediction methodology is studied. The 2-level branch prediction is the most important step on the topic. Hybrid predictors, combining the advantages of the single-predictors, are the most effective ones in branch prediction. The selection of the predictors in the Hybrid predictors requires a good study of branch behavior and depends on a great extend upon the programs. Branch classification could be a promising method for Hybrid predictors. Using 2-level BPS gives better performance than the 2-bit BPS. Bi-Mode and Agree predictors that suggest splitting of the PHT into two branch streams have done a good job in reducing the aliasing in global schemes. YAGS scheme further reduces the aliasing by combining the strong points of previous schemes. The research is still going on to further reduce the penalty in predicting the branches and also the branch target addresses. 19 References 1) The Bi-Mode Branch Predictor, Chih-Chieh Lee, I-Cheng K.Chen, and Trevor N.Mudge, Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture, Pages 4-13, December 1 - 3, 1997, Research Triangle Pk United States. 2) Branch prediction for free, Thomas Ball, and James R.Larus, Proceedings of the Conference on Programming Language Design and Implementation, Pages 300-313, June 21 - 25, 1993, Albuquerque, NM. 3) The Effects of Predicated Execution on Branch Prediction, Page 196 GaryScottTyson 4) Fast and Accurate Instruction Fetch and Branch Prediction; B. Calder and D. Grunwald; Proceedings of the 21st Annual International Symposium on Computer Architecture , 1994,Pages2-11 5) Characterizing the Impact of Predicated Execution on Branch Prediction; Scott A. Mahlke, Richard E. Hank, Roger A. Bringmann, John C. Gyllenhaal, David M. Gallagher and Wen-mei W. Hwu; Proceedings of the 27th annual international symposium on Microarchitecture , 1994, Pages 217 227. 6) The Effect of Speculatively Updating Branch History on Branch Prediction Accuracy,Revisited; Eric Hao, Po-Yung Chang and Yale N. Patt; Proceedings of the 27th annual international symposium on Microarchitecture , 1994, Pages 228 232 7) Guarded Execution and Branch Prediction in Dynamic ILP Processors; D. N. Pnevmatikatos and G. S. Sohi; Proceedings of the 21st Annual International symposium on Computer architecture , 1994, Pages 120 - 129 8) Improving The Accuracy of Static Branch Prediction Using Branch Correlation; Cliff Young and Michael D. Smith; Proceedings 6th International Conference on Architectural Support for Programming Languages and Operating Systems , 1994, Pages 232 - 241 9) A Comparative Analysis of Schemes for Correlated Branch Prediction; Cliff Young, Nicolas Gloy and Michael D. Smith; Proceedings of the 22nd Annual International Symposium on Computer Architecture , 1995, Pages 276-286 10) The Impact of Unresolved Branches on Branch Prediction Scheme Performance;A. R. Talcott, W. Yamamoto, M. J. Serrano, R. C. Wood and M. Nemirovsky; Proceedings of the 21st Annual International Symposium on Computer Architecture , 1994, Pages 12 -21 11) The Role of Adaptivity in TwoLevel Adaptive Branch Prediction; Stuart Sechrest, Chih-Chieh Lee and Trevor Mudge; Proceedings of the 28th Annual International Symposium on Microarchitecture , 1995, Pages 264 269 12) Using Hybrid Branch Predictors to Improve Branch Prediction Accuracy in The Presence of Context Switches; 20 Marius Evers, Po-Yung Chang and Yale N. Patt; Proceedings of the 23rd Annual International Symposium on Computer Architecture , 1996, Pages 3 - 1 13) An Analysis of Dynamic Branch Prediction Schemes on System Workloads; Nicolas Gloy, Cliff Young, J. Bradley Chen and Michael D. Smith; Proceedings of the 23rd Annual International Symposium on Computer Architecture , 1996, Pages 12 - 21 14) Compiler Synthesized Dynamic Branch Prediction; Scott Mahlke and Balas Natarajan; Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture , 1996, Pages 153 164 15) Dynamic History-Length Fitting: A Third Level of Adaptivity for Branch Prediction; Toni Juan, Sanji Sanjeevan and Juan J. Navarro; Proceedings of the 25th Annual International Symposium on Computer Architecture , 1998, Pages 155-166 16) Alternative Implementations of Two-Level Adaptive Branch Prediction ; Tse-Yu Yeh and Yale N. Patt; 25 years of the International Symposia on Computer Architecture (selected papers) ,1998,Pages451-461 17) Variable Length Path Branch Prediction; Jared Stark, Marius Evers and Yale N. Patt; Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems , 1998,Pages170-179 18) Alternative Implementations of Hybrid Branch Predictors; Po-Ying Chang, Eric Hao and Yale N. Patt; Proceedings of the 28th Annual International Symposium on Microarchitecture , 1995, Pages 252 257 19) Correlation and Aliasing in Dynamic Branch Predictors; Stuart Sechrest, Chih-Chieh Lee and Trevor Mudge; Proceedings of the 23rd Annual International Symposium on Computer Architecture , 1996, Pages 22 -32 20)Dynamic Path-Based Branch Correlation; Ravi Nair ; Proceedings of the 28th Annual International Symposium on Microarchitecture , 1995, Pages15-23 21) Predicting Conditional Branch Directions from Previous Runs of a Program;Joseph A. Fisher and Stefan M. Freudenberger; Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems , 1992,Pages85-95 22) An Analysis of Correlation and Predictability: What Makes Two-Level Branch Predictors Work; Marius Evers, Sanjay J. Patel, Robert S. Chappell and Yale N. Patt; Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998, Pages 52 61 23) A Comprehensive Instruction Fetch Mechanism for a Processor Supporting Speculative Execution; Tse-Yu Yeh and Yale N. Patt; Proceedings of the 25th Annual International Symposium on Microarchitecture,1992,Pages 129 - 139 21 24) Implementation and Analysis of Path History in Dynamic Branch Prediction Schemes; S. Reches and S. Weiss; Proceedings of the 1997 International Conference on Supercomputing , 1997, Pages285-292 25) Improving The Accuracy of Dynamic Branch Prediction Using BranchCorrelation;Shien-Tai Pan, Kimming So and Joseph T. Rahmeh; Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems,1992,Pages76-84 26) Branch History Table Indexing to Prevent Pipeline Bubbles in Wide-Issue Superscalar Processors; Tse-Yu Yeh and Yale N. Patt; Proceedings of the 26th Annual International Symposium on Microarchitecture,1993,Pages164-175 27) The Cascaded Predictor: Economical and Adaptive Branch Target Prediction; Karel Driesen and Urs Hölzle; Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture,1998,Pages249-258 28) Branch Classification: A New Mechanism for Improving Branch PredictorPerformance; Po-Yung Chang, Eric Hao, Tse-Yu Yeh and Yale Patt; Proceedings of the 27th Annual International Symposium on Microarchitecture,1994,Pages22-31 29) A Comparison of Dynamic Branch Predictors That Use Two Levels of BranchHistory;Tse-Yu Yeh and Yale N. Patt; Proceedings of the 20th Annual International Symposium on Computer Architecture,1993,Pages257-266 30) The Agree Predictor: A Mechanism for Reducing Negative Branch History Interference; Eric Sprangle, Robert S. Chappell, Mitch Alsup and Yale N. Patt; Proceedings of the 24th International Symposium on Computer Architecture , 1997,Pages284-291 31) Target Prediction for Indirect Jumps, Po-Yung Chang, Eric Hao, and Yale N. Patt,Proceedings of the 24th International Symposium on Computer Architecture, Denver, June 1997 32)Improving Branch Prediction Accuracy by Reducing Pattern History TableInterference, Po-Yung Chang, Marius Evers, and Yale N. Patt, International Conference on Parallel Architectures and Compilation Techniques, Boston, MA, October, 1996. 33) A Comprehensive Instruction Fetch Mechanism Design for a Processor Supporting Speculative Execution, T-Y. Yeh and Y. Patt, The 25th ACM/IEEE International Symposium on Microarchitecture,Dec.1992. 34) Assigning Confidence to Conditional BranchPredictions; Erik Jacobsen, Eric Rotenberg, and J. E.Smith, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, Page 142, December 2-4,1996,Paris France. 36)The YAGS Branch Prediction Scheme; A. N.Eden, and T. Mudge, Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, Pages 69-77, November 30 - December 2, 1998, Dallas,TX,USA 22 37) Performance Issues in Correlated Branch Prediction Schemes; Nicolas Gloy, Michael D. Smith and Cliff Young; Proceedings of the 28th Annual International Symposium on Microarchitecture , 1995, Pages 3 - 14. 38) Increasing The Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache; Tse-Yu Yeh, Deborah T. Marr and Yale N. Patt; Proceedings of the 1993 International Conference on Supercomputing , 1993, Pages 67 - 76. 39) Two-Level Adaptive Training BranchPrediction; Tse-Yu Yeh and Yale N. Patt; Proceedings of the 24th Annual International Symposium on Microarchitecture , 1991, Pages 51 - 61 40) The effects of predicated execution on branch prediction, Gary Scott Tyson, Proceedings of the 27th Annual International Symposium on Microarchitecture, Page 196, November 30 - December 2, 1994, San Jose,CA,USA 41)PaperCopy The Effects of Mispredicted-Path Execution on Branch PredictionStructures, S. Jourdan, T.H. Hsing, J. Stark and Y.N. Patt ECE, Univ. of Michigan, Ann Arbor. 42)PaperCopy Multiple Branch and BlockPrediction; Steven Wallace and Nader Bagherzadeh , Proceedings of the Third International Symposium on High Performance Computer Architecture, February 1-5, 1997, San Antonio, Texas. 43)PaperCopy: Brach Transition Rate: A New Metric for Improved Branch ClassificationAnalysis; Michael Haungs, Phil Sallee, and Matthew Farrens Dept. of CS, University of California, Davis. 44)NotFoundyet A third Level of Adaptivity for Branch Prediction, Technical Report UPC-DAC1998-4, T. Juan, S. Sanjeevan, and J.J. Navarro, Computer Architecture Department, UPC, BARCELONA, March 1998. 45)PaperCopy: Combining branch predictors, Scott McFarling ,TN 36,DEC-WRL,June1993. 46)NotFoundyet Reducing the branch penalty in pipelined processors, David J. Lilja, IEEE Computer, pages:47-55,July 1988. 47)NotFoundyet Branch Prediction Strategies and Branch Traget Buffer Design, J.K.F. Lee, A.J. Smith ,IEEE Computer, 17,1, Jan, 1984, pp. 6-22 48)NotFoundyet Correlation Based Branch Prediction, Technical Report, S.T. Pan, K. So, J.T. Rahmeh, UTCERC-TR-JTR91-01, University of Texas at Austion, August,1991. 49)NotFoundyet A Study of Branch Prediction Strategies, J.E. Smith, Proceedings of the 8th Annual International Symposium on Computer Architecture, June,1981,pp. 135-147. 23 50)PaperCopy Optimal 2-bit Branch Predictors, R. Nair, IEEE Transactions on Computers, vol. 44, no. 5, pp. 698-702, May 1995. 51)NotFoundyet The Influence of Branch Prediction Table Interference on Branch Prediction Scheme Performance, A. R. Talcott, M. Nemirovsky, and R. C. Wood, Proceedings of the 1995 ACM/IEEE Conference on Parallel Architectures and Compilation Techniques, 1995. 52)NotFoundyet Elastic History Buffer: A Low-Cost Method to Improve Branch Prediction Accuracy, M. -D. Tarlescu, K. B. Theobald, and G. R. Gao ,Proceedings of the 1997 international IEEE Conference on Computer Design, pp. 82-87, 1997. 24