Static Branch Prediction Schemes:

advertisement
EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE
A Survey on
BRANCH PREDICTION METHODOLOGY
Prof. Dr. Angkul Kongmunvattana
Electrical and Computer Engineering Department
University of Minnesota
By,
*Baris Mustafa Kazar and ˜Resit Sendag
Authors are with Electrical and Computer Engineering Department
of
University of Minnesota
*bkazar@ece.umn.edu ; ˜rsgt@ece.umn.edu
05/05/2000
1
Table of Contents:
Table of Figures …………………………………………………………………….
1.Introduction ………………………………………………………………………..
2.Static Branch Prediction Schemes: A Brief View …………………………………
3.Dynamic Branch Prediction ……………………………………………………….
3.1 One-Bit Scheme:BHT ……………………………………………………….
3.2 To provide target instructions quickly: BTBs ………………………………
3.3 Bimodal Branch Prediction (Two-Bit Prediction Scheme) …………………
3.4 Two-level Branch Prediction ……………………………………………….
3.4.1 Global History Schemes: GAg, GAs, GAp ……………………………
3.4.2 Per-Address History Schemes: PAg, PAs, PAp ……………………
3.4.3 Per-Set Address Schemes: SAg, SAs, SAp …………………………..
3.4.4 Results ………………………………………………………………..
3.5 Correlation Branch Prediction ……………………………………………...
3.6 More on Global Branch Prediction: Gselect and Gshare …………………...
3.7 Hybrid Branch Predictors …………………………………………………..
3.7.1 Where did the idea come from? ……………………………………….
3.7.2 Branch Classification ………………………………………………….
3.7.3 An alternative Selection Mechanism ………………………………….
3.8 Reducing PHT Interference …………………………………………………
3.8.1 Bi-Mode Branch Predictor …………………………………………….
3.8.2 The Agree Predictor …………………………………………………...
3.8.3 The Skewed Branch Predictor …………………………………………
3.8.4 The YAGS Branch Prediction Scheme ………………………………..
4.Concluding Remarks ………………………………………………………………
References …………………………………………………………………………...
2
Table of Figures
Figure 1 Global History Schemes …………………………………………………
Figure 2 Per-Address History Schemes …………………………………………..
Figure 3 Per-Set History Schemes …………………………………………………
Figure 4 2-Level Predictor Selection Mechanism …………………………………
Figure 5 The Bi-Mode Predictor ………………………………………………….
Figure 6 The Agree Predictor ……………………………………………………..
Figure 7 The Skewed Branch Predictor …………………………………………..
Figure 8 The YAGS Branch Prediction Scheme …………………………………..
3
1. Introduction
This paper covers Branch Prediction Methodology. We focused on dynamic
hardware-based prediction schemes in which the hardware rearranges the instruction
execution to reduce the stalls rather than the compile-time schemes, which require static
branch prediction. Compiler techniques, i.e. static scheduling, are used for scheduling the
instructions so as to separate dependent instructions and minimize the number of actual
hazards and resultant hazards. On the other hand, dynamic scheduling offers several
advantages: It enables handling some cases when dependencies are unknown at compile
time (e.g., because they may involve a memory reference), and it simplifies the compiler.
Perhaps most importantly, it also allows code that was compiled with one pipeline in
mind to run efficiently on a different pipeline. However, these advantages are gained at a
cost of a significant increase in hardware complexity.
2. Static Branch Prediction Schemes
There are two methods one can use to statistically predict branches: by examining the
program behavior (branch direction, etc.) and by the use of profile information collected
from earlier runs of the program (branch behaviors are often bimodally distributed – that
is, an individual branch is often highly biased toward taken or untaken.
Some schemes are Always not taken, Always Taken, Opcode Based (i.e. certain classes
of branch instructions tend to branch more in one direction than the other.), Backward
Taken and Forward not Taken Scheme (Smith[1]). Taking the direction of the branches
into consideration, this scheme is fairly effective in loop-bound programs, because it
misses only once over all iteration of a loop. It does not work well on programs with
irregular branches.
3. Dynamic Branch Prediction Schemes
They are hardware based schemes. The prediction will change if the branches change
its behavior while the program is running.
4
3.1 One-bit Scheme
The simplest scheme is the Branch Prediction Buffer or BHT, which stands for
Branch History Table. A branch prediction buffer is a small memory indexed by the
lower portion of the address of the branch instruction. The memory contains a bit that
says whether the branch was recently taken or not. The Shortcoming is that even if a
branch is almost always taken, it will likely be predicted incorrectly twice, rather than
once, when it is not taken.
3.2 To provide target instructions quickly: BTBs
The branch performance problem can be divided into two sub-problems. First, a
prediction of the branch direction is needed. Second, for the taken branches, the
instructions from the branch target must be available for execution with minimal delay.
One way to provide the target instructions quickly is to use a Branch Target Buffer,
which is abbreviated as BTB (Lee and Smith, 1984). BTB uses 2-bit saturating up-down
counters to collect history information, which is then used to make predictions. BTB is a
special instruction cache designed to store the target instructions. In this paper, the branch
directions will be focused on.
3.3 Two-bit prediction scheme (Smith, 1981)
Another name for two-bit prediction scheme is the Bimodal Branch Prediction. It is
known that most branches are either usually taken or usually not taken. Bimodal Branch
Prediction takes advantage of this bimodal distribution of branch behavior and attempts
to distinguish usually taken from usually not taken branches. It makes a prediction based
on the direction that the branch went the last few times it was executed.
Smith (1981) proposed a branch prediction scheme that uses a table of 2-bit saturating
up-down counters to keep track of the direction a branch is more likely to take. In these
counters, they are not decremented past 0, nor are they incremented past 3. Each branch
is mapped via its address to a counter. The branch is predicted taken if the most
significant bit of the associated counter is set; otherwise, it is predicted not taken. These
counters are updated based on the branch outcomes. When a branch is taken, the 2-bit
5
value of the associated saturating counter is incremented by one; otherwise, the value is
decremented by one.
3.4 Two-level Branch Predictors
The two-bit predictor schemes use only the recent behavior of a branch to predict
the future behavior of that branch. It may be possible to improve the prediction accuracy
if we also look at the recent behavior of other branches rather than just the branch we are
trying to predict. These predictors are called correlating predictors or two-level
predictors.
Two-level Adaptive Branch Prediction is mentioned first in Yeh and Patt, 1991. The
first-level branch execution history is the history of the last k branches encountered. The
second-level pattern history is the branch behavior for the last j occurrences of the
specific pattern of these k branches. Prediction is based on the branch behavior for the
last k occurrences of the current branch history pattern. Since the history register has kbits, at most 2k different patterns appear in the history register. Each of these 2k patterns
has a corresponding entry in what we called the Pattern History Table (PHT). The firstlevel and second-level pattern history are collected at run-time, eliminating the
disadvantages inherent in Lee and Smith's method, that is, the differences in the branch
behavior of the profiling data set and the run-time data sets. In Lee and Smith 1981, there
is static training scheme, which uses the statistics collected from a pre-run of the program
and a history pattern consisting of the last n run-time execution results of the branch to
make a prediction. The main disadvantage of this scheme is that the program has to be
run first to accumulate the statistics and the same statistics may not be applicable to
different data sets.
Two-level adaptive training scheme and several other dynamic or static branch
prediction schemes such as Lee and Smith's static training schemes, branch target buffer
designs, always taken, backward taken, and forward not taken schemes were simulated. It
is shown that, the proposed scheme has an average prediction accuracy of 97 percent on
6
nine of ten SPEC benchmarks compared to less than 93 percent for other schemes while
having a 3 percent miss rate vs. 7 percent for the other schemes.
Variations in Two-Level Adaptive Schemes: (Yeh and Patt, 1993)
GAg, GAs, GAp, PAg, PAs, PAp, SAg, SAs, SAp : They classify nine variations
of Two-level Adaptive Branch Prediction into three classes according to the way the firstlevel branch history is collected. The abbreviations for these nine cases are as follows:
GAg -Global Adaptive Branch Prediction using one global pat tern history table.
GAs -Global Adaptive Branch Prediction using per-set pat tern history tables.
GAp -Global Adaptive Branch Prediction using per-address pattern history tables.
PAg -Per-address Adaptive Branch Prediction using one global pattern history table.
PAs -Per-address Adaptive Branch Prediction using per-set pattern history tables.
PAp -Per-address Adaptive Branch Prediction using per-address pattern history tables.
SAg -Per-Set Adaptive Branch Prediction using one global pattern history table.
SAs -Per-Set Adaptive Branch Prediction using per-set pattern history tables.
SAp -Per-Set Adaptive Branch Prediction using per-address pattern history tables.
3.4.1 Global History Schemes
In the global history schemes, shown in Figure 1 and also called as correlation branch
prediction by Pan et al., 1991), the first-level branch history means the actual last k
branches encountered; therefore, only a single global history register (GHR) is used. The
global history register is updated with the results from all the branches. Since the global
history register is used by all branch predictions, not only the history of a branch but also
the history of other branches influences the prediction of the branch.
Figure1. Global History Schemes
7
3.4.2 Per-address History Schemes (Local)
In the per-address history schemes, shown in Figure 2, the first-level branch history
refers to the last k occurrences of the same branch instruction; therefore, one history
register is associated with each static conditional branch to distinguish the branch history
in-formation of each branch. One such history register is called a per-address history
register (PBHR). The per-address history registers are contained in a per-address branch
history table (PBHT) in which each entry is indexed by the static branch instruction address. In these schemes, only the execution history of the branch itself has an effect on its
prediction; therefore, the branch prediction is independent of other branches’ execution
history.
Figure2. Per-Address History Schemes
3.4.3 Per-set History Schemes
In the Per-set history schemes, shown in Figure 3, the first-level branch history means
the last k occurrences of the branch instructions from the same sub-set; therefore, one
history register is associated with a set of conditional branches. One such history register
is called a per-set history register. The set attribute of a branch can be determined by the
branch op-code, branch class assigned by a compiler, or branch address. Since the per-set
8
history register is updated with history possibly from all the branches in the same set, the
other branches in the same set influence the prediction of a branch.
Figure3. Per-Set History Schemes
3.4.4 Results
We have characterized the global, per-address, and per-set history schemes and
compared them with respect to their branch prediction performance and cost
effectiveness. Global history schemes perform better than other schemes on integer
programs but require higher implementation costs to be effective overall. Per-address
history schemes perform better than other schemes on floating point programs and
require lower implementation costs to be effective overall. Per-set history schemes have
performance similar to global history schemes on integer programs; they also have
performance similar to per-address history schemes on floating point programs. To be
effective, however, per-set history schemes require even higher implementation costs
than global history schemes due to the separate pattern history tables of each set. As a
rule of thumb, with respect to the cost-effectiveness of different variations, PAs is the
most cost effective among low-cost schemes.
9
3.5 Correlation Branch Prediction (Pan, So, and Rameh, 1992)
It is just like GAp in Yeh and Patt 1991 because the prediction of a branch
depends on the history of other branches, and GAs, where addresses that contain
branch instructions are partitioned into subsets, each of which shares the same
second-level pattern table.
3.6 More on Global Branch Prediction:
Gselect (Pan, So, Rameh, 1992) and Gshare, McFarling 1993
Now, it is known what the local branch prediction, which considers the history of
each branch independently and takes advantage of repetitive patterns, and global branch
prediction , which uses the combined history of all recent branches in making prediction,
is.
As we can realize the bimodal technique (2-bit counter) works well when each branch
is strongly biased in a particular direction. Local technique works well for branches with
simple repetitive patterns. The global technique works particularly well when direction
taken by sequentially executed branches is highly correlated.
It is shown that bimodal predictor performs well for small predictor sizes. Local
history predictor is better for larger sizes. It is better to get into Gselect and Gshare a
little bit more in detail.
Gselect: Global Predictor with Index Selection
Global History information is less efficient at identifying the current branch than
simply using the branch address. This suggests that a more efficient prediction might be
made using both the branch address and the global history. Such a scheme was proposed
by Pan, So, and Rahmeh, 92. The counter table is a concatenation of global history and
branch address bits. This global scheme with selected address bits is called Gselect. It
performs better than either bimodal or global prediction.
10
Gshare: Global Predictor with Index Sharing
Global history information weakly identifies the current branch. This suggests that
there is a lot of redundancy in the counter index used by Gselect. The observation that the
usage of the PHT entries is not uniform when indexed by concatenations of the global
history and branch address, led the idea of using the exclusive OR function instead of
concatenation to more evenly use the entries in the PHT .
3.7 Hybrid Branch Predictors
3.7.1 Where did the idea come from?
Combining Branch Predictors, McFarling 93
Realizing that the different prediction schemes have different advantages, McFarling
suggests that one can make use of different advantages combined in a new branch
predictor. The combined predictor contains two predictors (Bimodal and Gshare) and an
additional counter array which serves to select the best predictor to use (here 2-bit
saturating up-down counter is used). It is shown that combined predictor, which keeps
track of which predictor is more accurate for the current branch, performs always better
than either predictor alone. Combined predictors using local and global branch
information reach a prediction accuracy of 98.1% as compared to 97.1% for the
previously most accurate known scheme (Yeh and Patt’s 2-level Branch Predictor, 1991).
This idea is incredibly useful as machine designers attempt to take advantage of ILP and
miss-predicted branches become a critical performance bottleneck.
3.7.2 Branch Classification, Chang, Hao, Yeh and Patt, 1994
Branch classification partitions a program's branches into sets or branch classes.
Once the dynamic behavior of a class of branches is understood, optimization for that
class can be done. Associating each branch class with the most suitable predictor for that
class increases prediction accuracy.
11
The paper gives a new method for combining branch predictors. Branches are
partitioned into different branch classes based not only run-time information but also
compile-time information. The proposed hybrid branch predictor then associates each
branch class with the most suitable predictor of that class.
The hybrid predictor design introduced in this paper uses both dynamic and static
predictor selection to further improve prediction accuracy. It is shown that, the idea of
branch classification will allow construction of predictors that are most effective than
currently known predictors.
3.7.3 An alternative Selection Mechanism
Selection mechanism is an important fact to increase the accuracy of Hybrid
predictors (introducing 2-level Branch Predictor Selector (BPS)). As we know at this
point, by using known single-scheme predictors and selection mechanisms, the most
effective hybrid predictor implementation can be identified.
Chang, Hao and Patt, 1995, introduced a new selection mechanism called 2-level
selector, which further improves the performance of the hybrid branch predictor. It is
now well known that the two-level branch predictor improves prediction accuracy over
previously known single-level branch predictors. The concepts embodied in the two-level
predictor can also be applied to the hybrid branch predictor selection mechanism. Figure
4 shows the structure of the 2-level predictor selection mechanism.
Figure 4. 2-Level Predictor Selection Mechanism
12
A Branch History Register (BHR) holds the branch outcomes of the last m branches
encountered, where m is the length of the BHR. This first level of branch history
represents the state of branch execution when a branch is encountered. No extra hardware
is required to maintain the first level of history if one of the component predictors already
contains this information. That is, if the component predictor maintains a BHR, then the
2-level BPS mechanism does not need to maintain another copy of the BHR; instead, it
just uses the component predictor’s BHR. The Branch Predictor Selection Table (BPST)
records which predictor was most frequently correct for the times this branch occurred
with the associated branch history. This second level of history keeps track of the more
accurate predictor for branches at different branch execution states. When a branch is
fetched, its instruction address and the current branch history are used to hash into the
BPST. The associated counter is then used to select the appropriate prediction. By using
the branch history to distinguish more execution states, 2-level predictor selection scheme
can more accurately select the appropriate predictions.
The performance of 2-level BPS is compared to that of previously proposed selection
mechanisms. Although the 2-level BPS mechanism provides a performance increase over
the 2-bit counter BPS mechanism of McFarling 93, more effective BPS mechanisms are
still required for the hybrid branch predictor to achieve its full potential performance.
Target Cache, Chang, Hao, Patt, 1997
In this paper, a new prediction mechanism, the target cache, for predicting the
targets of indirect jumps, is proposed. The target cache uses a concept central to the 2level branch predictor: branch history is used to distinguish different dynamic
occurrences of each indirect branch.
The only difference between Target cache and the pattern history table of the 2level branch predictor is that a target cache's storage structure records branch targets
while a 2-level branch predictor's pattern history table records branch directions.
13
By using branch history to distinguish different dynamic occurrences of indirect
branches, the target cache was able to reduce the total execution time of perl and gcc
benchmarks by 8.92% and 4.27%, respectively.
3.8 Reducing Pattern History Table Interference
Since the introduction of the gshare 1993 and the observation that aliasing in the PHT
is a major factor in reducing the prediction accuracy studied by several authors such as
………… , several schemes have been proposed to reduce aliasing in the PHT. The main
problem reducing the prediction rate in the global schemes is aliasing between two
indices, each of which is typically formed from history and address bits, that map to the
same entry in the Pattern History Table (PHT). Since the information stored in the PHT
entries is either “taken” or “not taken”, two aliased indices whose corresponding
information is the same will not result in mispredictions. We define this situation as
neutral aliasing. On the other hand, two aliased indices with different information might
interfere with each other and result in a misprediction. We define this situation as
destructive aliasing.
3.8.1 The Bi-Mode Branch Predictor, Lee, Chen, Mudge 1997
The bi-mode predictor, shown in Figure 5, tries to replace destructive aliasing with
neutral aliasing in a different manner. It splits the PHT table into even parts. One of the
parts is the choice PHT, which is just a bimodal predictor, an array of two bit saturating
counters, with a slight change in the updating procedure. The other two parts are
direction PHTs; one is a “taken” direction PHT and the other is a “not taken” direction
PHT. The direction PHTs are indexed by the branch address XORed with the global
history. When a branch is present, its address points to the choice PHT entry, which in
turn chooses between the "taken" direction PHT, and the “not taken” direction PHT. The
prediction of the direction PHT chosen by the choice PHT serves as the prediction. Only
the direction PHT chosen by the choice PHT is updated. The choice PHT is normally
updated too, but not if it gives a prediction contradicting the branch outcome and the
direction PHT chosen gives the correct prediction. As a result of this scheme, branches,
which are biased to be taken, will have their predictions in the “taken” direction PHT,
14
and branches, which are biased not to be taken, will have their predictions in the “not
taken” direction PHT. So at any given time most of the information stored in the “taken”
direction PHT entries is “taken” and any aliasing is more likely not to be destructive. The
same phenomenon happens in the “not taken” direction PHT. The choice PHT serves to
dynamically choose the branches’ biases. In contrast to the agree predictor, if the bias is
incorrectly chosen the first time the branch is introduced to the BTB, it is not bound to
stay that way while the branch is in the BTB and as a result pollute the direction PHTs.
However, the choice PHT takes a third of all PHT resources just to dynamically
determine the bias. It also does not solve the aliasing problem between instances of a
branch which do not agree with the bias and instances which do.
Figure 5. The Bi-Mode Branch Predictor
3.8.2The Agree Predictor, Sprangle, Chappel, Alsup, Patt, 1997
The agree predictor, shown in Figure 6, assigns a biasing bit to each branch in the
Branch Target Buffer (BTB) according to the branch direction just before it is written
15
into the BTB. The PHT information is then changed from “taken” or “not taken” to
“agree” or “disagree” with the prediction of the biasing bit. The idea behind the agree
predictor is that most branches are highly biased to be either taken or not taken and the
hope is that the first time a branch is introduced into the BTB it will exhibit its biased
behavior. If this is the case, most entries in the PHT will be “agreeing,” so if aliasing does
occur it will more likely be neutral aliasing, which will not result in a misprediction. It is
one of the first two level scheme to take advantage branches’ biased behavior to reduce
destructive aliasing by replacing it with neutral aliasing. It considerably reduces
destructive aliasing. However, there is no guarantee that the first time a branch is
introduced to the BTB its behavior will correspond to its bias. When such cases occur,
the biasing bit will stay the same until the branch is replaced in the BTB by a different
branch. Meanwhile, it will pollute the PHT with “disagree” information. There is still
aliasing between instances of a branch, which do not comply with the bias and instances,
which do comply with the bias. Furthermore, when a branch is not in the BTB, no
prediction is available.
Figure 6. The Agree Predictor
3.8.3The Skewed Branch Predictor, Michaud, Seznec, Uhlig, 1997
The skewed branch predictor, shown in Figure 7, is based on the observation that
most small, but because of a lack of associativity in the PHT (the major contributor to
aliasing is conflict aliasing and not capacity aliasing). The best way to deal with conflict
16
aliasing is to make the PHT set-associative, but this requires tags and is not costeffective. Instead, the skewed predictor emulates associativity using a special skewing
function.
Figure 7. The Skewed Branch Predictor
The skewed branch predictor splits the PHT into three even banks and hashes each
index to a 2-bit saturating counter in each bank using a unique hashing function per bank
(fl, f2 and f3). The prediction is made according to a majority vote among the three
banks. If the prediction is wrong all three banks are updated. If the prediction is correct,
only the banks that made a correct prediction will be updated (partial updating). The
skewing function should have inter-bank dispersion. This is needed to make sure that if a
branch is aliased in one bank it will not be aliased in the other two banks, so the majority
vote will produce an unaliased prediction.
The reasoning behind partial updating is that if a bank gives a misprediction while the
other two give correct predictions, the bank with the misprediction probably holds
information, which belongs to a different branch. In order to maintain the accuracy of the
other branch, this bank is not updated.
The skewed branch predictor tries to eliminate all aliasing instances and therefore all
destructive aliasing. Unlike the other methods, it tries to eliminate destructive aliasing
17
between branch instances, which obey the bias and those, which do not. However, to
achieve this, the skewed predictor stores each branch outcome in two or three banks. This
redundancy of l/3 to 213 of the PHT size creates capacity aliasing but eliminates much
more conflict aliasing, resulting in a lower misprediction rate. However, it is slow to
warm-up on context switches.
3.8.4 The YAGS Branch Prediction Scheme, Eden, Mudge, 1998
The motivation behind YAGS is the observation that for each branch we need to store
its bias and the instances when it does not agree with it. It is shown in Figure 8. If we
employ a bimodal predictor to store the bias, as the choice predictor does in the bi-mode
scheme, than all we need to store in the direction PHTs are the instances when the branch
does not comply with its bias. This reduces the amount of information stored in the
direction PHTs, and therefore the direction PHTs can be smaller than the choice PHT. To
identify those instances in the direction PHTs we add small tags (6-8 bits) to each entry,
referring to them now as direction caches. These tags store the least significant bits of the
branch address and they virtually eliminate aliasing between two consecutive branches.
When a branch occurs in the instruction stream, the choice PHT is accessed. If the choice
PHT indicated “taken,” the ……………………..
Figure 8. The YAGS Branch Prediction Scheme
18
4. Summary and Conclusion
In this paper, the branch prediction methodology is studied. The 2-level branch
prediction is the most important step on the topic. Hybrid predictors, combining the
advantages of the single-predictors, are the most effective ones in branch prediction. The
selection of the predictors in the Hybrid predictors requires a good study of branch
behavior and depends on a great extend upon the programs. Branch classification could
be a promising method for Hybrid predictors. Using 2-level BPS gives better
performance than the 2-bit BPS. Bi-Mode and Agree predictors that suggest splitting of
the PHT into two branch streams have done a good job in reducing the aliasing in global
schemes. YAGS scheme further reduces the aliasing by combining the strong points of
previous schemes. The research is still going on to further reduce the penalty in
predicting the branches and also the branch target addresses.
19
References
1) The Bi-Mode Branch Predictor,
Chih-Chieh Lee, I-Cheng K.Chen, and
Trevor N.Mudge, Proceedings of the
30th Annual IEEE/ACM International
Symposium
on
Microarchitecture,
Pages 4-13, December 1 - 3, 1997,
Research Triangle Pk United States.
2) Branch prediction for free,
Thomas Ball, and James R.Larus,
Proceedings of the Conference on
Programming Language Design and
Implementation, Pages 300-313, June 21
- 25, 1993, Albuquerque, NM.
3) The Effects of Predicated Execution
on Branch Prediction, Page 196
GaryScottTyson
4) Fast and Accurate Instruction Fetch
and
Branch
Prediction;
B. Calder and D. Grunwald; Proceedings
of the 21st Annual International
Symposium on Computer Architecture ,
1994,Pages2-11
5) Characterizing the Impact of
Predicated Execution on Branch
Prediction; Scott A. Mahlke, Richard E.
Hank, Roger A. Bringmann, John C.
Gyllenhaal, David M. Gallagher and
Wen-mei W. Hwu; Proceedings of the
27th annual international symposium on
Microarchitecture , 1994, Pages 217 227.
6) The Effect of Speculatively Updating
Branch History on Branch Prediction
Accuracy,Revisited; Eric Hao, Po-Yung
Chang and Yale N. Patt; Proceedings of
the 27th annual international symposium
on Microarchitecture , 1994, Pages 228 232
7) Guarded Execution and Branch
Prediction in Dynamic ILP Processors;
D. N. Pnevmatikatos and G. S. Sohi;
Proceedings of the 21st Annual
International symposium on Computer
architecture , 1994, Pages 120 - 129
8) Improving The Accuracy of Static
Branch Prediction Using Branch
Correlation;
Cliff Young and Michael D. Smith;
Proceedings
6th
International
Conference on Architectural Support for
Programming Languages and Operating
Systems , 1994, Pages 232 - 241
9) A Comparative Analysis of Schemes
for Correlated Branch Prediction;
Cliff Young, Nicolas Gloy and Michael
D. Smith; Proceedings of the 22nd
Annual International Symposium on
Computer Architecture , 1995, Pages
276-286
10) The Impact of Unresolved Branches
on
Branch
Prediction
Scheme
Performance;A.
R.
Talcott,
W.
Yamamoto, M. J. Serrano, R. C. Wood
and M. Nemirovsky; Proceedings of the
21st Annual International Symposium on
Computer Architecture , 1994, Pages 12
-21
11) The Role of Adaptivity in TwoLevel Adaptive Branch Prediction;
Stuart Sechrest, Chih-Chieh Lee and
Trevor Mudge; Proceedings of the 28th
Annual International Symposium on
Microarchitecture , 1995, Pages 264 269
12) Using Hybrid Branch Predictors to
Improve Branch Prediction Accuracy in
The Presence of Context Switches;
20
Marius Evers, Po-Yung Chang and Yale
N. Patt; Proceedings of the 23rd Annual
International Symposium on Computer
Architecture , 1996, Pages 3 - 1
13) An Analysis of Dynamic Branch
Prediction
Schemes
on
System
Workloads; Nicolas Gloy, Cliff Young,
J. Bradley Chen and Michael D. Smith;
Proceedings of the 23rd Annual
International Symposium on Computer
Architecture , 1996, Pages 12 - 21
14) Compiler Synthesized Dynamic
Branch
Prediction;
Scott Mahlke and Balas Natarajan;
Proceedings of the 29th Annual
IEEE/ACM International Symposium on
Microarchitecture , 1996, Pages 153 164
15) Dynamic History-Length Fitting: A
Third Level of Adaptivity for Branch
Prediction; Toni Juan, Sanji Sanjeevan
and Juan J. Navarro; Proceedings of the
25th Annual International Symposium
on Computer Architecture , 1998, Pages
155-166
16) Alternative Implementations of
Two-Level Adaptive Branch Prediction ;
Tse-Yu Yeh and Yale N. Patt; 25 years
of the International Symposia on
Computer Architecture (selected papers)
,1998,Pages451-461
17) Variable Length Path Branch
Prediction; Jared Stark, Marius Evers
and Yale N. Patt; Proceedings of the 8th
International
Conference
on
Architectural Support for Programming
Languages and Operating Systems ,
1998,Pages170-179
18) Alternative Implementations of
Hybrid
Branch
Predictors;
Po-Ying Chang, Eric Hao and Yale N.
Patt; Proceedings of the 28th Annual
International
Symposium
on
Microarchitecture , 1995, Pages 252 257
19) Correlation and Aliasing in Dynamic
Branch
Predictors;
Stuart Sechrest, Chih-Chieh Lee and
Trevor Mudge; Proceedings of the 23rd
Annual International Symposium on
Computer Architecture , 1996, Pages 22
-32
20)Dynamic
Path-Based
Branch
Correlation; Ravi Nair ; Proceedings of
the
28th
Annual
International
Symposium on Microarchitecture , 1995,
Pages15-23
21) Predicting Conditional Branch
Directions from Previous Runs of a
Program;Joseph A. Fisher and Stefan M.
Freudenberger; Proceedings of the 5th
International
Conference
on
Architectural Support for Programming
Languages and Operating Systems ,
1992,Pages85-95
22) An Analysis of Correlation and
Predictability: What Makes Two-Level
Branch
Predictors
Work;
Marius Evers, Sanjay J. Patel, Robert S.
Chappell and Yale N. Patt; Proceedings
of the 25th Annual International
Symposium on Computer Architecture,
1998,
Pages
52
61
23) A Comprehensive Instruction Fetch
Mechanism for a Processor Supporting
Speculative
Execution;
Tse-Yu Yeh and Yale N. Patt;
Proceedings of the 25th Annual
International
Symposium
on
Microarchitecture,1992,Pages 129 - 139
21
24) Implementation and Analysis of Path
History in Dynamic Branch Prediction
Schemes; S. Reches and S. Weiss;
Proceedings of the 1997 International
Conference on Supercomputing , 1997,
Pages285-292
25) Improving The Accuracy of
Dynamic Branch Prediction Using
BranchCorrelation;Shien-Tai
Pan,
Kimming So and Joseph T. Rahmeh;
Proceedings of the 5th International
Conference on Architectural Support for
Programming Languages and Operating
Systems,1992,Pages76-84
26) Branch History Table Indexing to
Prevent Pipeline Bubbles in Wide-Issue
Superscalar
Processors;
Tse-Yu Yeh and Yale N. Patt;
Proceedings of the 26th Annual
International
Symposium
on
Microarchitecture,1993,Pages164-175
27) The Cascaded Predictor: Economical
and Adaptive Branch Target Prediction;
Karel Driesen and Urs Hölzle;
Proceedings of the 31st Annual
ACM/IEEE International Symposium on
Microarchitecture,1998,Pages249-258
28) Branch Classification: A New
Mechanism for Improving Branch
PredictorPerformance; Po-Yung Chang,
Eric Hao, Tse-Yu Yeh and Yale Patt;
Proceedings of the 27th Annual
International
Symposium
on
Microarchitecture,1994,Pages22-31
29) A Comparison of Dynamic Branch
Predictors That Use Two Levels of
BranchHistory;Tse-Yu Yeh and Yale N.
Patt; Proceedings of the 20th Annual
International Symposium on Computer
Architecture,1993,Pages257-266
30) The Agree Predictor: A Mechanism
for Reducing Negative Branch History
Interference; Eric Sprangle, Robert S.
Chappell, Mitch Alsup and Yale N. Patt;
Proceedings of the 24th International
Symposium on Computer Architecture ,
1997,Pages284-291
31) Target Prediction for Indirect Jumps,
Po-Yung Chang, Eric Hao, and Yale N.
Patt,Proceedings
of
the
24th
International Symposium on Computer
Architecture, Denver, June 1997
32)Improving
Branch
Prediction
Accuracy by Reducing Pattern History
TableInterference, Po-Yung Chang,
Marius Evers, and Yale N. Patt,
International Conference on Parallel
Architectures
and
Compilation
Techniques, Boston, MA, October,
1996.
33) A Comprehensive Instruction Fetch
Mechanism Design for a Processor
Supporting
Speculative
Execution,
T-Y. Yeh and Y. Patt, The 25th
ACM/IEEE International Symposium on
Microarchitecture,Dec.1992.
34) Assigning Confidence to Conditional
BranchPredictions; Erik Jacobsen, Eric
Rotenberg, and J. E.Smith, Proceedings
of the 29th Annual IEEE/ACM
International
Symposium
on
Microarchitecture, Page 142, December
2-4,1996,Paris
France.
36)The YAGS Branch Prediction
Scheme; A. N.Eden, and T. Mudge,
Proceedings of the 31st Annual
ACM/IEEE International Symposium on
Microarchitecture,
Pages
69-77,
November 30 - December 2, 1998,
Dallas,TX,USA
22
37) Performance Issues in Correlated
Branch
Prediction
Schemes;
Nicolas Gloy, Michael D. Smith and
Cliff Young; Proceedings of the 28th
Annual International Symposium on
Microarchitecture , 1995, Pages 3 - 14.
38) Increasing The Instruction Fetch
Rate via Multiple Branch Prediction and
a
Branch
Address
Cache;
Tse-Yu Yeh, Deborah T. Marr and Yale
N. Patt; Proceedings of the 1993
International
Conference
on
Supercomputing , 1993, Pages 67 - 76.
39) Two-Level Adaptive Training
BranchPrediction; Tse-Yu Yeh and Yale
N. Patt; Proceedings of the 24th Annual
International
Symposium
on
Microarchitecture , 1991, Pages 51 - 61
40) The effects of predicated execution
on
branch
prediction,
Gary Scott Tyson, Proceedings of the
27th Annual International Symposium
on Microarchitecture, Page 196,
November 30 - December 2, 1994, San
Jose,CA,USA
41)PaperCopy
The
Effects
of
Mispredicted-Path Execution on Branch
PredictionStructures, S. Jourdan, T.H.
Hsing, J. Stark and Y.N. Patt ECE, Univ.
of Michigan, Ann Arbor.
42)PaperCopy Multiple Branch and
BlockPrediction; Steven Wallace and
Nader Bagherzadeh , Proceedings of the
Third International Symposium on High
Performance Computer Architecture,
February 1-5, 1997, San Antonio, Texas.
43)PaperCopy: Brach Transition Rate: A
New Metric for Improved Branch
ClassificationAnalysis; Michael Haungs,
Phil Sallee, and Matthew Farrens Dept.
of CS, University of California, Davis.
44)NotFoundyet
A third Level of Adaptivity for Branch
Prediction, Technical Report UPC-DAC1998-4, T. Juan, S. Sanjeevan, and J.J.
Navarro,
Computer
Architecture
Department, UPC, BARCELONA,
March 1998.
45)PaperCopy:
Combining
branch
predictors, Scott McFarling ,TN
36,DEC-WRL,June1993.
46)NotFoundyet
Reducing the branch penalty in pipelined
processors, David J. Lilja, IEEE
Computer, pages:47-55,July 1988.
47)NotFoundyet
Branch Prediction Strategies and Branch
Traget
Buffer
Design,
J.K.F. Lee, A.J. Smith ,IEEE Computer,
17,1, Jan, 1984, pp. 6-22
48)NotFoundyet
Correlation Based Branch Prediction,
Technical
Report,
S.T. Pan, K. So, J.T. Rahmeh, UTCERC-TR-JTR91-01, University of
Texas at Austion, August,1991.
49)NotFoundyet
A Study of Branch Prediction Strategies,
J.E. Smith, Proceedings of the 8th
Annual International Symposium on
Computer Architecture, June,1981,pp.
135-147.
23
50)PaperCopy
Optimal 2-bit Branch Predictors,
R. Nair, IEEE Transactions on
Computers, vol. 44, no. 5, pp. 698-702,
May 1995.
51)NotFoundyet
The Influence of Branch Prediction
Table Interference on Branch Prediction
Scheme
Performance,
A. R. Talcott, M. Nemirovsky, and R. C.
Wood, Proceedings of the 1995
ACM/IEEE Conference on Parallel
Architectures
and
Compilation
Techniques, 1995.
52)NotFoundyet
Elastic History Buffer: A Low-Cost
Method to Improve Branch Prediction
Accuracy, M. -D. Tarlescu, K. B.
Theobald, and G. R. Gao ,Proceedings of
the 1997 international IEEE Conference
on Computer Design, pp. 82-87, 1997.
24
Download