Yale Patt Po-Yung Chang

advertisement
Branch
Classification:
a
New
Mechanism
for
Improving
Branch
Predictor
Performance
Po-Yung
Chang
Eric
Department
of
Electrical
Engineering
The
Ann
Tse-Yu
Hao
University
Arbor,
and
of
Clara,
48109-2122
CA
Abstract
95051
predictor
minimizes
dicting
is wide
agreement
impediments
to the
pipelined
tive
branches
execution
one of the most
performance
superscalar
ditional
that
in
the
seems
branch
problem,
branch
is mispredicted.
is the
instruction
speculative
Therefore,
predictor;
9570 accuracy
This
paper
proposes
branch
prove
the
accuracy
allows
associated
dict
its
with
predictor
predictor
is best
suited.
scheme,
the
predictor
this
predicts
This
a hybrid
several
reported
Keywords:
speculative
in the
such
each
that
branches
suggests
any
for
one
that
im-
cycles
clas-
branch
classification,
ipc denotes
are
executed
of branches
branch
can
one method
branch
predictors
achieve
higher
have
This
super-
paper
sults.
of instructions
if they
to the
reduce
interrupt
instruction
the performance
the
pipeline
steady
[4].
Section
num-
that
r * ipc = 0.9,
a
a prediction
as a technique
of branch
that
predictors.
Using
we analyze
hybrid
several
predictors
any branch
that
predictors
that
as follows:
model,
schemes,
analyzes
and
some
2 presents
Section
proposes
schemes,
4 provides
Section
classification.
3 describes
previously
several
pro-
new
hybrid
presents
simulation
concluding
remarks.
re-
Branch
2
Classification
A branch
Branch
to copy
of the
instructions,
of
supply
into
fee all or part of this material is
granted provided that the copies are not made or distributed for
direct commercial advantage, the ACM copyright notice and the
title of the publication and its date appear, and notice is given
that copying is by permission of the Association of Computing
Machinery. To copy otherwise, or to republish, requires a fee
and/or specific permission.
MICRO 27- 11/94 San Jose CA USA
0 1994 ACM 0-89791 -707-3/94/001 1..$3.50
Permission
ratio
reported.
classification
prediction
of
p denotes
97.770.
than
of branch
prediction
as
number
10~0 requires
propose
accuracy
the
and
classification,
and
and
of instructions
C=5
classification
is organized
concept
branch
Introduction
can significantly
algo-
defined
of total
number
than
accuracy
been
the
number
than
of branch
previously
a branch
processors
branch
misses
is
misprediction,
For
branch
previ-
cache
T denotes
cycle.
the
as
C’ denotes
of less
higher
posed
Branches
if that
microproces-
where
the
p of greater
improve
the
pipelined
in-
work
prediction
penalty
the average
per
We introduce
pro-
such
to a branch
over
penalty
accuracy
scalar.
1
branch
branch
accuracy,
and
predictor,
performance,
due
to pre-
predictor
branch
processor
wasted
the prediction
it
stalls
* r * ipc),
branch
and
the
all speculative
away
accurate
the
ber
literature.
execution,
ignore
to be
which
Because
fetching
to a high-performance
confiicts,
C * ((l-p)
classification
achieves
branch
we
bus
component
predictors,
predictor
than
suited
a hybrid
branch
branch
accuracy
best
approach,
those
paper
Branch
instruction
st ails by pre-
and
be thrown
a very
is important
If
accu-
help
path.
must
of pipeline
branch
sor,
if a
enough.
to
predictors.
branch
branch
Using
analyzes
prediction
ously
of branch
can be constructed
branch
poses
classification
an individual
direction.
is not good
of the
that
a branch
rithm
to the
a very
from
beyond
the number
direction
is mispredicted,
Specula-
is discarded
we need
branch
future
of choice
work
the
structions
of con-
stream.
rate
sification
and
presence
to be one solution
but
important
of current
processors
Science
Corporation*
Santa
There
Computer
Patt
Michigan
Michigan
Intel
Yale
Yeh*
without
classification
sets
or
branches
A
can
good
sessing
class;
be
done
classification
similar
thus,
of a class
22
branch
dynamic
once
partitions
The
statically
scheme
we
can
partitions
the
same
dynamic
optimize
of
dynamically.
branches
into
the
branches
partitioning
and/or
behavior
we understand
of branches,
a program’s
classes.
for
posbranch
behavior
this
class.
For example,
predict
handling
the
the
compiler
branches
or
of these
can try
the
to eliminate
hardware
branches
can
(e.g.
execute
3.2
hard-t~
special
both
paths
their
classification
diction
get.
accuracy
branch
class
class.
For
can be used to maximize
obtained
Prediction
from
accuracy
with
the
we
and
that
are more
hardware
predictor
use
more
difficult
each
that
resources
static
to
to
to predict.
SC2,
piler
collect
replaces
tions
the
the
with
the code
function
branch
predictor
to
prediction
hardware.
a per-branch
not
changed
branch
the
The
these
enough
approach
the
path.
this
mine
whether
schemes
method,
program
This
a real
prediction
the
design.
executable
on
executes
down
is not
files
results
3.3
Stahic
lisp,
eqntott,
shows
vided
not
from
and testing
Because
with
different
the
eqntott
data
SPECint92
paper
are for
SPECint92
compress,
the training
benchmarks.
in this
the
suite:
the six inespresso,
Table
data
sets for each of these
and
suite
for
gee.
1
and
they
lisp
or
are
use inputs
not
ford
008.espresso
Cps
Data
are
training.
nine
I bool.
eq.2
026.compress
\ gcc source
072.sc
I loada2
I loadal
085.gcc
I jump.i
stint.
taken
1: Training
and Testing
Data
[1]),
queens
branch
detect
1
and
address
[9].
85%-90%,
have
counter-based
i
1
formation,
Sets of Benchmarks
To
an even
about
further
227 boolean
equations
with
37 different
derivative
variables
benchmark
rate
more
His
accurate
uses
branch
fast
level
direction
technique
predictor
uses
the
hybrid
for
and
more
of branch
target
2-bit
history
inac-
McFarling
two
up-down
its
and
[6, 10, 11, 12].
is currently
predictor
about
bit
prediction
combines
making
logic
instruction
history
accuracy,
predictor
methods
important
accuracies,
keeping
that
and
pre-
hardware
of the
for simple
By
technique
branch;
not-
their
One
stage
prediction
of which
taken
run-time,
90 Y0-95Y0, can be attained
a new
for each
11].
prediction
[9].
higher
be
prediction
10,
reported
improve
track
9,
the
schemes
Stan-
use hardware
at
dynamic
7,
branch
been
that
in
to
by studying
buffers,
predict
(as
predicting
algorithms
at an early
High
predict
(as in Motorola
history
directions
target
branches
opThe
[3, 5, 9].
prediction
[2,
to
branches
whereas
execution
Many
is
taken
not-taken
all
as branch
direction.
always
accuracy
branch
use information
such
prediction
are
35%
studied
predictors.
of a symbolic
Work
branch
always
about
behavior.
pipeline,
in
or
branch
‘been
class,
predict
65~0
future
to keep
version
Pratt.
Classes
algorithms
Predicting
branch
record
proposed
1Common
Lisp
writ ten by Vaughan
100%
execution,
branch
[8]).
Dynamic
to
to
int-pri-3.eqn
Data
95%
<=
Prediction
branches
achieves
curacy,
Table
<=
Pr(br)
2: Static
to
of
about
predict
Testing
023.eqntott
kind
achieves
pro-
which
bca
deriv.cll
022.li
< ~r~br)
<
prediction
profiles,
MIPS-X
have
Training
90%
program
conditional
vious
Benchmark
predictors.
I
Branch
branch
MC88000
SC,
sets,
! 95%
before
simplest
from
SC6
Previous
accuracy.
presented
I
pre-
be able
to measure
all
programs
known
will
accu-
accurate
Benchmarks
The
classification
SC5
Table
codes
teger
hybrid
is
actual
A more
rates
whether
!
com-
the
however,
deter-
taken
and
I Descriptions
L
program
return
as mixedwe
behavior
Classes
-
simulated
predictors
always
approach,
to fine-tune
of
functions
SC4 branches
similar
previously
refer
classification,
have
on this
will
calls
we can
branch
we
as
as mostly-one-
that
based
is
classes
branch
gathered
3.1
dynamic
branches
condi-
the
the
this
branches
similar
section,
on
as shown
branch
branches
and SC3 and
With
of
to these
SC6
based
com-
functions
of
behavior
uses instrumented
per-branch
state
of various
basis.
branch
of these
the
because
the
the
to generate
Using
conditions;
correct
rate
update
performance
behavior,
calculates
Each
simulator
and
the
that
calls.
prediction
pare
branch
following
and
branches.
to outperform
dynamic
In
the
classified
by profiling
partitioning
refer
ion branches
direction
this
we will
SC5,
are
collected
Because
classes.
have
Experiments
To
2.
SC1,
branches
taken-rate
statically,
direct
predictor
experiment,
Table
diction
3
Classification
dynamic
done
budfor
a simple
dedicate
in
the pre-
by associating
suitable
could
branches
branches
a given
is increased
most
example,
for predictable
handle
In our
of
branch).
Branch
Branch
case the
[6]
branch
counters
more
then
prediction.
accu-
uses the
In
this
bining
paper,
branch
different
branch
formation
but
brid
branch
with
the
we introduce
predictors.
classes
also
thermore,
then
suitable
our
several
based
branch
on not
only
can
for
run-time
diction
hy-
branch
class
with
that
class.
Fur-
to fill
the
advantages
of
predictors.
In
this
branch
and
Results
predictor
a given
implement
section,
we will
classification.
performance
In
our
branch
bit
predictors
address
tern
Two-Level
Predictor
and
using
a modified
ORs
the
implementations
GAg
the global
scheme
history
the branch
history
Advantages
In
our
of
study,
taken-rates
static
are
Branch
more
(SC5,SC6)
performance
is
(SC1).
example,
the
to select
branch
1 ‘s.
grouped
together.
Because
dynamic
branches
in
●
different
classes
optimal
each
have
branch
of
the
on each
branch
prediction
these
analyze
different
scheme
classes.
In
performance
static
class
dynamic
this
may
the
the
be different
section,
of branch
to show
behavior,
we
prediction
benefits
entry.
We
short
branch
of
4, 5, and
the
Mostly
One-direction
of
long
Branches
1, 2, and
on
GAs,
and
the
curve
shows
tern
length
ures,
branch
registers
taken
rence
of the
direction
(1%
with
to
for
as the
by one,
schemes
with
in predicting
(SC1).
With
is required
the
behavior
e.g.
a leading
will
the
the
of the
a repeating
1 followed
require
“l”.
the
of
same
mispredictions
the
same
by
more
PHT
having
a
PHTs.
Branches
Unlike
registers.
between
having
more
a longer
branch
branches
are more
register.
with
long
in predicting
branches
the
these
10%
patterns
By
and
50%,
to
these
due
a longer
history,
likely
branch
with
branch
branch
execution
Thus,
be-
schemes
Because
distinguish
history
are
mostly-one-
mixed-direction
execution
with
accuracy
the
by prediction
taken-rates
schemes
fective
prediction
taken-rates
states.
histories
to remain
the
in
branch
histories
mixed-direction
preare
ef-
branches.
history
The
scheme,
odd
a
occur-
mostly-one-
.
history
by ninety-nine
of
branch
diction
fig-
mostly-not-
branch
100 branch
occurrence
the
history
we can
of correlated
of pat-
short
PAs
im-
the
predicted
see more
In addition,
the
dynamic
characteristics.
history,
branch
in these
the
to capture
dynamic
one
the number
As shown
branch
the
Each
curve,
prediction
capture
respectively.
each
doubles.
ac-
PAs,
accuracy
decreases
taken-rate)
the
On
branches;
pattern
prediction
using
schemes
are effective
history
average
prediction
tables
branches
long
order
gshare
cost.
history
the
benchmarks
the
plementation
history
3 show
integer
mostly
access
conflicts
and
50$Z0 (SC3).
dynamic
we will
Figures
consist
causing
6 show
branches,
branch
have
(SC1,SC2,SG5,SC6)
curacy
mostly
Mixed-direction
whose
107o and
direction
of
above
taken,
accessing
register
branches
are
to
these
the
of
branches
tween
classification.
Analysis
reduce
history
are effectively
●
possibly
on the
taken
will
tend
branches
can
Analysis
Figures
and
schemes
performance
PHT,
to different
of
mixed-
mentioned
if branches
will
the
of the SC1 branches.
that
register
branches
and
mostly
to
branches
of each
the
accessed.
(SC3,SC4)
for
report
on
history
are
performance
to that
similar
history
These
due
similar
For
The
predic-
decreases
more
entries
between
occur.
is similar
The
also
branches
branches
branches
the
because
PHT
conflicts
(GAs),
entry.
with
history
more
PHT
SC2 branches
Classification
branches
branch
to
interference
scheme,
scheme
reg-
mapped
of
of different
gshare
mostly-one-direction
part
3.4.1
of the
in
for
scheme.
are
amount
GAs
longer
history
prediction
branches
the
decreases
results
a shorter
history
in
ocpre-
Furthermore,
the
the
pattern
means
direction
Per-
exclusive-
address
table
longer
the
fewer
Branch
tables
[6]) that
with
Thus,
a set of pat-
Two-Level
(gshare
with
pattern
using
Global
of pattern
history
appropriate
the
2-
Branch
are the
accuracy
in
thus,
As
tion
patterns
of the Two-
They
Predictor
(PAs),
a set
predictors.
guided(PG),
the
is reduced.
design
Two-Level
are studied.
Branch
tables
the
the
at ion cost,
PHTs,
also
it takes
history
time.
PHTs
PHT;
between
of
single-scheme
profile
and
different
Predictor
history
branch
different
are simulated:
Three
Branch
advantages
present
hybrid
(2bC),
the
then
three
counter
Predictor.
show
will
of several
experiment,
up-down
Level
first
We
more
same
A short
predic-
taken
The
scheme
warm-up
more
aken,
if this
because
history.
a faster
the
GAs
history
the pattern
not-t
even
is mispredicted.
of the
branch
means
mostly-
high
branch
accuracy
With
Simulation
are
remains
of the
longer
ister
3.4
branches
accuracy
currence
in-
Our
these
tion
each
for
combine
since
cominto
information.
associates
predictor
technique
method
are partitioned
compile-time
predictor
most
a new
Branches
histories
0s
performance
of
mostly
taken
branches
tioned
above
(SC3).
}Veight
of Branch
Let us define the
GAs,
PAs,
(SC4)
is similar
Classes
dynamic
weight
and
gshare
on
to that
men-
of a branch
class
as
in
No.
However,
o f dun.
total
24
brns
number
belonaina
of
dynamic
to that
brn
branches
class
“
Static
Class 1: 0 c= Pr(br)
-. ..-. Class x----.lu
,---- c=
stamc
c= .05
+
+—
0
.-..-0
❑ - -
-n
Ph
h
01.93
~
0.90
< rr(tw)
.XI
+ —
35KB
0
PhlOKB
PAS 4KB
+
.-..-0
PAS 3SKS
PAS1OKB
PAS4KB
❑ -––a
s 01.87
-=
.L1
3
& 01.84
01.81
+
01.78
.I!!!!!
135
Figure
1:
Not-taken
0.75
!!!!
!!!!
!!!
I
7
9
11 13 1s 17
Branch History Length
Accuracy
Prediction
of
PAs
4!!!
!!!!
13
57911131517
!!!!
!!!!!4
Branch History Length
on
the
Figure
Mostly
4:
direction
Accuracy
Prediction
of
PAs
the
on
Mixed-
Branches
Branches
Static
Class 1: 0 c= Pr(br)
Static
c= .05
+—
0 .-..-0
❑ - – –0
V
Class 3: .10 c Pr(br)
c= .50
+—
+ GAS32KB
GABSKB
GAs2KB
0.–..–0
❑ ---0
+
GA,
32K33
GAs8KB
GAs2KB
.s.
m
-“+
b
\
“m
\
I
I
J!!!!!!!!
79
135
11
I
?! I!!,
13
15
4!!!
!!!!
!!!!
9
Branch History Length
Figure
2:
Not-taken
of GAs
on
the
Figure
Mostly
Class 1: 0 c= Pr(br)
0.995
Static
+—
+ W
0 .-..-0
cl- – –n
32KB
gsbu SKB
wk=2KB
Accuracy
of GAs
on
~
0.93
!
0,90
Class 3: 0.10 c Pr(br)
the
Mixed-
c= 0.50
+—
0
SI
--..-0
❑ ---n
+ Sk
dulcsim
Wlulczm
4
I
0.985
Prediction
Branches
c= .05
&
<
5:
direction
Branches
Static
>
Accuracy
Prediction
1!!!!1
11 13 15 17
Branch History Length
1357
17
O=e?.o
~
.a
1...’”
0,87
.s
!
0.97s
[v
~
;~””Q-
084
0,.81
0.%5
0,7s [r
t
d
0,75
0.955 ~
135
7
9
11
13
15
I
I
135
17
Figure
Not-taken
3: Prediction
Accuracy
of gshare
I
I
I
I
I
79
I
11
I
I
13
I
I
15
I
I
I
17
Branch History Length
Branch History Length
Figure
I
on the
Mostly
directic,n
Branches
25
6: Prediction
Branches
Accuracy
of gshare
on the
Mixed-
32KB
Figure
7 shows
class.
the
dynamic
Approximately
are
mostly-one-direction
are
mixed-direction
mance
of
diction
accuracy
a
branches
of each
branches;
branches.
predictor
and
weight
50’%0 of all dynamic
is
on the
the
other
Thus,
the
dependent
on both
the
static
branches
50~o
perfor-
on
its
pre-
mostly-one-direction
mixed-direction
branches.
Em=35K”
11
13579
13
15
17
Branch History Length (k)
Avgcanqapwh
Figure
of dynamic
branches
in each static
class
We
have
ration
for
from
shown
the
that
of the
for
both
the
average
The
=
where
of
pattern
ber
of
entries
the
number
The
of
right-most
GAS(17,
[12].
Let
branch
the
PA(x)
denote
the
PA(GAs(5,
of
e.g.
PA(GAs(15,
210))
sensitive
GAs
PA(GAS(15,
<
as
to
scheme,
4))
The
the
e.g.
those
accuracy
212)).
tables,
- PA(GAS(13,
the
of
history
I
I
I
I
I
13
History
I
I
I
I
15
17
Length
schemes
(k)
with
different
branch
history
length
4))
11
15
Length
17
(k)
Figure
the
10:
gshare
with
different
branch
history
length
branch
increases
as
With
28))
a
fixed
prediction
ac-
register
Figures
in-
1, 2, 4,
<...
<
figuration
scheme
is
mostly-one-direction
that
of
branches.
<
configuration
4))
PAs
13
History
in
of
fixed
PA(GAS(13,
the
13579
Branch
presented
history
of
PA(PAs(15,
I
:E)=3”’
this
1, 216).
PA(GAS(5,
PA(GAs(5,
branch
length
accuracy
a
With
performance
I
t is
of
GAs(
accuracy
<
9: Global
performance
prediction
length
Figure
history
example,
point
of
e.g.
the
For
the
increases,
history
num-
and
x.
PA(GAS(11,4))
4)).
the
num-
prediction
match
prediction
PHTs
pattern
increases
creases;
the
scheme
of
cost.
p is the
left-most
the
results
I
Branch
b is the
gshare.
accuracy
shows
Our
length,
curacy
less
1).
number
number
the
point
I
11
ranging
table,
9 shows
prediction
I
I
13579
(bits)
length,
The
GAS2KB
(bits)
history
Figure
GAs.
I
0.89
or
using
X 2)
(PHTs),
in
- - -0
❑
the per-
hardware
x 2)
register
branch
I
(bits)
Zk
(2’
entries
in
prediction
history
<
PHT
size
X
$
0.91
0.90
PAs,
length
is estimated
tables
the
the
(P
+
history
curve
shows
+
different
10 show
indicates
+(px2~x2)
k)
k
in
a 32 K-byte
curve
X
history
of
highest
(b
t) =
k is the
ber
k
=
with
0.93
these
GAs,
at a fixed
schemes
0.94
0.92
[12]:
p)
gshare(k,
history
in the graphs
of a predictor
equations
Pk(k,
branch
the
history
length
0.95
optimally
8, 9, and
using
predictor
cost
GAs(k,p)
of
the
curve
of a branch
hardware
the
accuracy
with
Thus,
Per-address
0.96
;
&
configu-
be configured
Figures
history
is different
branches.
cannot
prediction
scheme
following
predictor
branches
of branches.
1 to 18. Each
formance
optimal
mixed-direction
predictors
types
gshare
the
mostly-one-direction
single-scheme
from
that
8:
branch
~
“
s
g
e
g
x
7: Percentage
Figure
than
- PA(PAs(13,
4))
direction
4)).
26
of GAs
and
and
branches,
branch
also
and
the
the
optimal
for
for
in figures
con-
both
the
mixed-direction
classification,
sub-optimal
as shown
that
is sub-optimal
branches
Without
is
5 show
PAs
the
the
gshare
mostly-one-
3 and
6.
3.4.2
Combining
the
Advantages
of
Different
Per-set
Predictors
In
tors
this
Pamwn
section,
which
we introduce
combine
the
predictors.
Wewillfirst
tors
statically
which
branch.
which
namically.
tion
select
a branch
branch
a branch
predictor
branch
branch
predic-
predictor
branch
we summarize
for
predictor
both
and
hybrid
‘-t
dy-
with
GAs
We
Multiple
have
signed
the
Predictor
shown
to
that
optimize
branches.
types
To increase
scheme,
called
GAs.mhl;
for
the
bits
for
not
a
enough
To better
[6] uses
branch
address.
the
to select
is done
to hash
to different
may
only
no
these
branches
Figure
prediction
outperforms
gshare
gshare
the
tively
utilize
the
and
SC6
to predict
the
is able
are .015 and
and
SC6
- X
GAs.mhl
0 .. .. ... . ❑
gshare.
+—
GAs
+
Figure
12:
History
2S6
lK
16K
4K
Predictor
Performance
Size
64K
(bytes)
of GAs
with
Multiple
Branch
Length
rlcw
_
100%?
the
branches
run
and,
this
case,
scheme
static
that
branches.
sizes,
.0098
higher
branches
the
best
at
each
13 shows
class
most
per-
and
By
a short
than
mixed-direction
gshare
in prediction
his-
branches,
those
respectively.
13: Performance
W
.X3
of lK
=6
“’”
Predictors
on each
static
effec-
significant
prediction
W
and
more
GAs.mhl
using
w.
GAs,
GAs.mhl
they
scl
Figure
GAs.mhl,
class.
“-
GAs.mhl
Figure
mostly-one-direction
achieve
GAs.mhl
predictors
because
The
of
shows
of lK-byte
between
to
His-
This
from
In
and gshare.
PHTs.
is on SC1
SC1
predictor
GAs
difference
GAs.mhl
these
each
formance
Branch
set used
testing
the
the
patterns
some
figure
all
GAs
outperform
Multiple
branch
data
performance
of
accuracy
on
and
GAs.mhl
history
with
w
gshare
the
the
This
For
both
the prediction
with
history.
gshare.
cost.
the
entry.
the
:L
each
with
information.
accuracy
hardware
of GAs
x - -
there
identify
PHT
run,
are predicted
and
mulfewer
is different
during
12 compares
GAs
with
history
global
testing
branch
11: Structure
Length
a long
using
scheme,
history
profiled
tory
branches.
branch,
global
information
actual
a long
history
and
GAs
Because
be executed
have
global
to
appropriate
entries.
in the
tQus,
is using
global
.
.
pre-
branches,
each
the
.
,,,
on both
length,
are
bits
the frequent
profile
used
we
history
the
PHT
to gather
one
of
Because
As in the gshare
exclusive-ORa
address
structure
both
k
both
branch
mixed-direction
identify
k
XOR
,_J_.
[~
Figure
de-
history
short
mostly-one-direction
be
scheme
that
a new
branches
the
the
Length
accuracy
propose
length.
the
branch.
on
on
uses
I&
I
mostly-one-direction
uses multiple
it
11 shows
may
tory
accuracy
the
prediction
for
history
and
is not
we
which
history
Figure
with
scheme
mostly-one-direction
branch
tiple
History
GAs
and
of branches,
diction
Branch
prediction
mixed-direction
s
)
—.
..
..
i’Hfi
w El
I----iv
Selection
the
History
Tables
(SPHTS)
‘II
Pc
Branch
-.—.
ClMl,
predic-
schemes.
Static
Branch
c~
each
design
statically
these
Global
Bransh
History
Rej3ister
(G13HR)
predic-
of different
hybrid
ahybrid
Finally,
3.4.2.1
●
advantages
introduce
Wethenshow
selects
hybrid
e0iP(8)
accuracies
of gshare
For
the
the
is due
mostly-one-direction
branches
fewer
and
PIIT
entries
between
the
direction
branches
branches.
27
branches,
accuracy
pattern
result
that
improvement
the
fact
are now
history
and
slight
to
in
of
of
the
fewer
the
that
hashed
the
into
conflicts
mostly-one-
mixed-direction
Combination
●
of
static
and
dynamic
predic=.
0.96
i
<
0.95
t ors
The
static
predictors
can
mostly-one-direction
dictors
tor
for
can
those
be optimized
for
PG+gshare
scheme
to statically
the gshare
the
If
SC1
predicting
the
the
profile
guided
and
SC6 branches
A
0.92
0.91
/
9
the
I
I
0.90
I
6$
training
run,
nated
for
then
the
predicting
dynamic
this
predictor
branch
the
Figure
14
shows
and
and
16 show
and
GAs.mhl
that
on
For
guided
predictor
and
and
SC6
can
branches
because
or mostly
not-taken.
accuracy
GAs.mhl
costs.
on the
PHT
SC1
especially
SC6
of
Static
Class
1:
0 <=
pr(br)
the
mixed-direction
achieves
than
a
hybrid
mixed-direction
tween
the
branches,
direction
branches
ing only
with
correlated
the
the
im-
PG+gshare’s
branch
gshare
scheme
and
exists.
Also,
branches,
likely
Figure
15: Performance
mostly
not-taken
Up
timal
Static
to this
be-
Static
mixed-
~
0.93
by deal-
~
$
0.92
histories
to remain
dictor
[7].
performance
tor.
We
uses
of both
then
propose
both
to further
each
dynamic
improve
statically
branch.
predictors
dynamically
that
we have
for
branch
Predictor
In this
selected
section,
types
of
a new
and
prediction
the
g
O.go
~
0.89
Class
Hybrid
Predictor
branch
predictor
predictor
of
0.841
predictor
is to use 2-bit
ters
2bC)
doing
(i.e.
better
to keep
[6].
saturating
Specifically,
.50
A
.*
- .-A
x - -
PG+gshare
- x
GAs.mtd
x
!
I
Figure
predic-
I
t
I
lK
16: Performance
mixed-direction
design
I
I
I
I
16K
4K
Size
I
64K
@ytes)
of PG+gshare
and
GAs.mhl
on
branches
branch
from
direction
optimal
up-down
BD
I
2S6
selection
the
of which
let
c=
,/
Predictor
Predictor
selecting
track
pr(br)
,
actual
Dynamic
of dynamically
.05<
,/
from
P1 denote
1, and
predictor
2. The
or decremented
predicted
P2 denote
the
counters
based
on the
direc-
predicted
can be inrule
shown
3.
coun-
predictor
denote
in Table
direction,
predictor
cremented
method
on
compare
Selection
One
GAs.mhl
.*..4$
64
accuracy.
with
3:
=A_
pre-
tion
●
and
op-
method
we first
hybrid
static
of PG+gshare
0.91
in the
the optimal
of hybrid
(bytes)
Selection
Another
is to select
Size
the
register.
Dynamic
point,
predictor
combining
the
plus
PG+gshare
GAs.mtd
branches
0.88 E
3.4.2.2
- x
I
in
predicts
contention
history
A
-
t’
1’
accuracy
PHT
are more
---
I
reduce
branches
mixed-direction
branches
A
x-
1’
PG+gshare
only
no longer
.05
branches.
predictor
mostly-one-direction
<=
x
be-
at lower
prediction
Because
GAs.mhl.
PG+gshare
higher
with
outperforms
branches,
slightly
Predictors
these
taken
Predictor
For
(bytes)
Hybrid
Classification
and the mostly-
PG+gshare
and
Branch
15
conflicts
can significantly
Thus,
Static
Size
Performance
!
64K
profile-
are mostly
branches
scheme,
14:
I
16K
respec-
the
predict
branches
branches
ation
branches
branches,
In addition,
of a gshare
plement
SC3
Figure
I
4K
of PG+gshare
accurately
these
the mixed-direction
one-direction
Figures
accuracy
SC1
SC1
I
I
I
outperforms
pr~dictors.
the prediction
tively.
tween
PG+.gshare
single-scheme
I
lK
predictor
testing
run.
G~s.mhl
I
2S6
is desig-
during
PG+gsham
GAs.mbl
0.93
and
during
A
– X
0.94
the other
executed
---
X - -
!!$!
predic-
predict
is not
.-~
2
predic-
experiment,
to dynamically
a branch
pre-
our
In
predict
the
static
hardware
accurately
uses
scheme
branches.
the
branches.
predict
Using
branches,
mixed-direction
tor
accurately
branches.
is
In our study,
the
in the fully
28
we associate
associative
a counter
branch
with
address
each entry
cache(BAC).
BD
PI
P2
0.965
EI
000
Table
—
E
no change
decrement
counter
010
increment
counter
.g
.=
0.945
—
011
no change
g
0.935
–
0.925-
-
no change
101
increment
counter
110
decrement
counter
111
no change
3: Updating
rule
for
.X+$.%
—
001
100
a
counter
found
in the BAC
predictor
to use,
which
branch
is
the
is then
2bC/gshare
A
>.+
.4”A’
~+z
f“
;
0.915. —
0.895>
counters
A
I
I
!
I
I
I
2S6
64
corresponding
Figure
used to determine
as shown
PAs/gshwe
A .. . . . . .
A
selection
fetched,
PG+gshare
---
K
I
I
IK
in Figure
18:
Prediction
I
I
4K
I
L
16K
Predictor
When
“
. A :&*”
0.905. —
predictor
A
M- - - *
A
0.955.
4
Size
Accuracy
64K
(bytes)
of Hybrid
Branch
Pre-
dictors
17.
~
+ ---+
‘“O
SC2
0---0
0.8
“:
SC3
A ..,-,.
A
3
%
Scl
s---w
SC4
A
—.—A
SC5
0 . . ..
0
SC6
n —
0
overall
‘“6
0.4
~
0.2
Predcimn
[
Predictor
Figure
17: Structure
Predictor
of Hybrid
Predictor
with
Size
(bytes)
Dynamic
Figure
Selection
19:
Fraction
of gshare
usage
in the
2bC/gshare
scheme
We
simulated
dictors:
two
the
with
gshare
of these
combinations
up-down
(2bC/gshare)
(PAs/gshare),
Figure
hybrid
selection
different
2-bit
for
2bC/gshare
ing
the
and
with
and
pre-
predictor
with
in the
gshare
the performance
the
PG+gshare.
following
PAs
18 compares
predictors
scheme
of
counter
static
The
PAs/gshare
predictor
hardware
cost
able
to outperform
us-
than
equations:
p, a) = (a x 2)+
PAs/gshare(k,
p,a)
((2 x 2)
k+(px2~
+
= (a x 2) + (b x k)+
x2)
(p
k+(px2k
+
single-scheme
2k x 2)
x
x2)
For
the
a is the number
dress
cache,
number
brid
k is the history
of PHTs,
the branch
and
history
predictor
only
considered
urations
For
with
predictors
PG+gshare
scheme.
similar
the
With
the
PAs
implementation
than
outperforms
PG+gshare
twice
p is the
in
SPEC
counters
used
in the
and
and
gshare
config-
16K
PG+gshare
the size of either
of
scheme
at
as
is
29
in-
a larger
fixed
size,
increases
and,
the gshare
majority
profile-guided
19 shows
how
predictions
in
the gshare
at 256
the
2bC/gshare
portion;
bytes.
With
prediction
thus,
more
scheme.
branches
on the
COst,
is not
is
the
in-
accuracy
Figure
by gshare.
the
study,
a lK-entry
size of 2bC/gshare
predictor,
on
uses
make
of predictions
are made
PG+gshare
In this
Figure
to
increasing
remains
using
scheme
The
a 10W implementation
performs
PAs or gshare
pre-
predictor
scheme.
used
scheme.
by only
branches
the
of a larger
the
outperforms
counters.
was
2bC/gshare
the
taken
a
2-bit
2bC/gshare
that
the
PAs/gshare
PAs/gshare
gshare
predicted
bytes,
the
lK
the
of gshare
cost.
larger
to outperform
benchmarks,
predictor
and
creasing
of
predictor
integer
BAC
2!bC portion
we
predictors
benefits
size
the 2bC/gshare
c)ften
increased
for
was
smaller
is able
the
as the
2bC/gshare
the
the cost of
For PAs/gshare,
same
size, the gshare
approximately
and
combining
smaller
the
is, the cost of the hy-
predictor.
scheme
ad-
of entries
by summing
predictors
branch
length,
number
That
is determined
the optimal
in the
register
b is the
table.
the single-scheme
to select
of entries
hand,
gshare
of two
predictor.
clutperforms
where
larger
combination
Because
a combined
an implementa-
the
PAs/gshare
diminishes
creases,
With
bytes,
the
bytes,
PG+gshare.
2bC/gshare(k,
16K
C)n the other
16K
dictor
scheme.
below
predictors.
costs
are estimated
PAs/gshare
tion
mostly
Since
&
are
19 shows
not-
gshare,
acCurate
PG+gshare
mostly-one-direction
out-
mostly
one-direction
GAswith
GAs.mhl
a short
PAs/gshare
Table
4: Summary
mostly
2bC
or GAs
(selected
dynamically)
PAs
or GAs
(selected
dynamically)
PG
19
also
SC4
2bC
branches
well,
shows
mixed-direction
with
Selection
In
section,
we propose
that
exploits
run-time
and
branch
prediction.
the
a new
compile-time
predictor
branches
and
mixed-direction
the
for
performance
of PG+PAs/gshare.
smaller
4K bytes,
provides
hand,
the
for
optimal
PG+PAs/gshare
dictors.
tion
to
cost
SPEC
for
32K
bytes,
prediction
integer
gshare
95.2%
gee,
PG+PAs/gshare
96.91Y0,
for
GAs.
to
predictor
schemes
Summary
of
Hybrid
ones
4
schemes.
In
many
this
different
96.4%
is
able
on
the
to
the
and
report,
we present
We
SPEC92
branches,
the
4).
were
best
have
branch
predictor
with
selection
Table
5 lists
several
prediction
omitted.
of
predictors.
The
tion
branches
the
Predictors
most
combining
global
prediction
then
combines
success-
namic
30
the
proposed
we showed
proposed
the
predictors.
using
a short
Using
and
improves
a long
the
rates
classifica-
history
for
history
for the
performance
the
of
Predictors.
branch
of static
model
taken
branch
Branch
a hybrid
advantages
different
classification
this
Two-Level
as well
of
dynamic
With
that
branches
history
advantages
their
branches
classification
performance
branch
on
profiling.
mostly-one-direction
prediction
predictor
based
during
model,
branch
analyzing
for
gathered
pre-
introduced
for
as a means
groups
[7].
branch
the
(bytes)
of hybrid
static
(see Table
that
as a means
95.7%
accuracy
(PAs/gshare)
hybrid
Size.
Conclusion
We
We examined
GAs
implementa-
prediction
Branch
gshan?
+
64K
16K
Performance
mixed-direction
3.4.2.3
❑
+—
other
the
many
for
ful
pre-
For
96.47%
20:
predic-
all other
as compared
contains
achieves
as compared
known
of
4K
dynamic
scheme
the
PG+PAs/gshare
which
Figure
bytes,
a fixed
accuracy
benchmarks,
and
benchmark
viously
with
- M PAs/gshare
❑ ... .. ..
20 shows
On
4K
assist
both
For
than
lK
Prrdictor
mostly-
PG+gshare
outperforms
example,
of
achieve
larger
scheme
For
the
performance.
predictors
&,”
PG+gsham
scheme
Figure
the
x --
‘
PG+PAs\gshare
A
/
256
scheme
PAs/gshare
branches.
tors
than
the
– +
---
:li??!E-
both
to
PG+PAs/gshare
profile-guided
./’
the
A
D”
*,#(.,...k”
branch
of using
+ - -
...m
,A/#;~:.,
Dy-
hybrid
information
The
+:s
, ,f-,”$.“~
the
on
Both
advantage
dynamically)
Schemes
A’
Predictor
Static
(selected
Thus,
2bC/gshare
Predictor
Hybrid
Prediction
by
branches.
outperforms
Branch
or GAs
mostly-
is outperformed
New
the
branches
40%
by
predict
A
and
the
made
branches.
namic
Hybrid
that
are
can
it
\ PAs
5: Omitted
branches
While
the
one-direction
mixed-direction
branches
dynamically)
Schemes
PAs/GAs
mixed-direction
for
Prediction
(selected
2bC/GAs
also
the
Branch
or gshare
PAs
on
History
dynamically)
] PAs
one-direction
the
uses
(selected
GAs
one-direction
this
or gshare
PG
scheme.
predictor
PAs
PG
Figure
Branch
dynamically)
PG+PAs
of predictions
PG+gshare
(selected
PG+GAs
Table
on
or gshare
of Hybrid
PG+PAs/GAs
gshare
2bC
PG
PG+PAs/gshare
branches
along
gshare
2bC/gshare
2bC
GAswith
History
PG
PG+gshare
branches.
mixed-direction
branches
Branch
predictor
predictors
a profile-guided
that
and
predictor
dyfor
the
mostly-one-direction
be dedicated
the
to
dynamic
rately
the
implementation
GAs,
cost
reducing
time
hybrid
and
posed
of
the
the
miss
rate
In summary,
by
both
to
tecture,
com-
assist
accuracy
[4]
P.M.
the
our
shown
that
construction
of
96.9170
predictor,
effective
than
currently
the
[6]
Lee
S.
is one result
idea
of branch
that
sity
of Michigan.
Intel,
AT&
entific
and
of our
for
We would
ful
HPS
research
provide,
his comments
also like
at
the
of our industrial
of
group
June
the
stimulat-
in particular,
on
work.
the reviewers
for their
help-
Smith,
T.-Y.
the
design
chitecture,
of
MIPS-X,”
International
June
Proceedings
Symposium
on
IEEE
Predictors”,
Equipment
Cor-
“Reducing
the
13th
the cost
International
Architecture,
PP.396-404,
of the
April
88000
RISC
family,”
1989.
of the
of Branch
Prediction
of the 8th International
Architecture,
and
Y.N.
Two-level
of
on
T.-Y.
Patt,
the
19th
June
“Alternative
1981.
Implemen-
Branch
Annual
Strate-
Symposium
pp.135-148,
Adaptive
Computer
Yeh
and
Branch
Prediction,”
International
Architecture,
Idih
Ar-
1987.
31
Patt,
Sym-
pp.124-135,
May
Yeh
History”,
Symposium
November
and
Y.N.
Patt,
Predictors
ternational
Symposium
May
1993.
on
Adap-
of
the
on Micro
Zdth
archi-
1991.
“A
that
Proceedings
pp.257-266,
“Two-level
Proceedings
International
Branch
Branch
Y.N.
Prediction,”
pp.51-61,
T.-Y.
namic
tradeoffs
Computer
Study
1992.
[12]
Annual
of
design
“A
Yeh
of
posium
References
in
Digital
Hennessey,
pp.26-38,
Proceedings
tecture,
“Architecture
Branch
Computer
“The
Proceedings
Carlos
on this
Horowitz,
Prediction
Design,”
1984.
Proceedings
ACM/IEEE
M.
“Branch
Buffer
TN-36,
and J.L.
MICRO,
tations
[11]
and
Smith,
Target
1986.
, J.E.
[10]
the other
suggestions.
P. Chow
Comput-
1993.
branches,”
tive
[1]
Archi-
of Pipelined
January
Note
on Computer
Sci-
and suggestions
to thank
Computer
1981.
“Combining
Technical
gies,”
appreciated.
for
A.J.
Branch
pp.6-22,
C. Melear,
[9]
partners:
acknowledge
and
on
Proceedings
in high
Univerand
is greatly
to gratefully
they
research
Hewlett-Packard,
Software
we wish
ing environment
Fuentes
support
Motorola,
Engineering
In addition,
members
The
T/GIS,
ongoing
implementation
of Pro-
11/780,”
Architecture
and
and
[7] S. McFarling
predictors.
[8]
of our
computer
May
Characterization
VAX-
McGraw-Hill,
McFarling,
IEEE
paper
Architecture,
Symposium
The
poration,
Acknowledgments
This
in the
Kogge,
Symposium
performance
of
Inter-
to
June
5
“A
Clark,
1984.
Computer,
reduc-
of predictors
known
D.
Annual
Strategies
12.5Y0.
allow
Evaluation
of the Idth
on Computer
June
[5] J.K.F.
pro-
WRL
will
and
ers, PP.237-243,
PAs,
as compared
known
“An
accuracy.
bytes,
predictor,
benchmark,
Levy,
Proceedings
Performance
of the llth
suitable
higher
cost of 32K
previously
we have
classification
are more
achieving
intensive
[3] J. Emer
for
dynamically
be used
a prediction
best
of the
a profile-guided
achieved
on gee, a branch
ing
in
predictor
implementation
gshare
for
can
Symposium
M.
achieved
from
time
national
cessor
done
H.
1989.
16.7%.
be
and
Architectures,”
an
to 95.2%
selection
can
DeRosa
Branch
accu-
With
PG+gshare
information
execution
combination
96.47~o
by
branch
Thus,
branch
a fixed
and
each
statically.
to more
as compared
rate
[2] J.A.
can
Furthermore,
branches.
bytes,
classification,
for
and/or
With
of 32K
miss
hardware
be optimized
accuracy,
the
branch
predictor
can
more
predictor.
mixed-direction
prediction
With
pile
dynamic
predictor
predict
a 96.0%
branches,
the
Comparison
use
of the
Computer
Two
20th
of DyLevels
Annual
Architecture,
of
In-
Download