Alternative Implementations of

advertisement
Alternative
Implementations
of Two-Level
Tse-Yu
Department
Yeh
of Electrical
The
Ann
and
Adaptive
Yale
Arbor,
and
rate
and
depth
prediction
of pipelining
of high
perfor-
delivering
the
potential
pipelined
dynamic
branch
Prediction)
than
predictor
that
any
achieves
other
mechanism
scheme
uses two
to make
of
encountered,
the
substantially
higher
in the
of branch
history
branch
of the specific
a wide-issue,
propose
Adaptive
reported
the
and
We
(Two-Level
levels
predictions,
currences
performance
microarchitecture.
history
of the
behavior
pattern
Branch
The
k branches
last
s oc-
k branches.
the
We
have identified
three variations
of the Two-Level
tive Branch
Prediction,
depending
on how finely
solve
the history
hardware
tions,
costs
and
along
dynamic
and
static
benchmarks.
curacy
94.4
We
for
different
We
1
popular
schemes
prediction
costs
and
of each
the
Prediction
We
pattern
to
will
be taken,
scheme
not
measure
and
the
accuracy.
the
issue
rate
and
dictor
is vital
depth
of pipelining
processors
to delivering
deep
pipelined
they
predicting
that
on the
of high
increase,
the
per-
predict
branches
data
opcode
the
sets
schemes
the
opcode,
the potential
performance
microarchitecture.
Even
and
the
branch
to
for
data
the
fact
basis
that
of run-time
the
substantial
additional
but
be-
the
prediction
tendency.
sample
Unfor-
data
appears
may
execution
the
of that
branch
dynamic
prediction
history
will
collects
is being
made
proposed
bases
about
[14] a Static
its
that
that
J.
Smith
to store,
counter
prediction
branch.
Training
on
implies
is required.
subsequently
information
the
maintaining
In all cases,
on
Lee and
method
which
uses statistics
gathered
prior
to execution
time coupled
with
the history
pattern
of the last k run-time
executions of the branch
to make the next prediction
as to
which way that branch
will go. The major
disadvantage
of
of Static
a
with
Permlsslon
to copy without fee all or part of this material is granted
provided that the copies are not made or dntrlbuted for dmect commercial
advantage, the ACM copyright notice and the title of the pubhcat[on and
Its date appear, and notice is given that copying IS by percussion of the
Association for Computmg Machinery. To copy otherwise, or to repubhsh,
requires a fee and/or specific perrnlsslon.
for
data
In this
dictor
any
124
Training
respect
to the
$1.50
and
history
as in
branch
behave
in
information
hardware
be
at run-time.
also can be as simple
last
to
of a branch
a static
that
that
predicting
the
code,
branch
[6, 13] can be used
presetting
of the
the
the tendency
prediction
only
and
gathered
@ 1992 ACM 0.89791 .509.7/92/0005/0124
branch
or on the
intensive
where
profiling
behavior
from
track
loop
according
branch
different
for
programs
by measuring
on sample
in
for
Also,
bit
A. Smith
amount
use
pre-
use run-time
Static
[17] proposed
utilizing
a branch
target
buffer
for each branch,
a two-bit
saturating
up-down
of speculative
work
due to branch
prediction
becomes
much larger.
Since all such work must be t brown
away
if the prediction
is incorrect,
an excellent
branch
prea wide-issue,
well
is irregular.
which
Superscalar
they
to make
predictions.
be based
prediction
in that
statistics
in that
make
is effective
work
havior
branch
As
profiling
aa always
or can
branch
are static
same way,
or it can be elaborate
as
very large amounts
of history
information.
Introduction
formance
to
of instructions
of cycles these inan incorrect
branch
of the branch,
as in “if the branch
is backward,
taken,
if forward,
predict
not taken”
[17]. This
instruction
at most
obtain
be as simple
keeping
information.
variation
can
Dynamic
ac-
algorithms
are dynamic
very
is 97
achieve
Others
history
in a substantial
of suggested
Some
dictions.
tunately,
SPEC
prediction
accuracy.
prediction
Branch
proposed
on
average
Branch
known
of history
the
prediction
schemes,
the
of different
amounts
measure
same
several
other
ef-
accuracy
is full
results
number
number
before
and
does
varia-
relative
Adaptive
Adaptive
average
their
prediction
to the
information
latter
the
three
of Two-Level
that
the other
effectiveness
in evaluating
prediction
show
Two-Level
while
percent
the
with
of the
due
opcode
direction
predict
Adapwe re-
We compute
each
the branch
variations
Prediction,
percent,
costs
We measure
three
gathered.
of implementing
use these
fectiveness.
of the
information
of 5 percent
[6, 13, 14, 17],
execution
information
for
rate
literature
schemes
accuracy
last
of these
The
a new
literature.
miss
in performance
fetched
each cycle and the
structions
are in the pipeline
prediction
becomes
known.
mance Superscalar
processors
increase,
the importance
of an excellent
branch
predictor
becomes
more vital
to
deep
Science
48109-2122
loss
As the issue
Computer
of Michigan
Michigan
Abstract
Prediction
N. Patt
Engineering
University
Branch
that
other
methods
to profiling;
the sample
that
paper
scheme
uses two
make
predictions.
set may
a new
substantially
reported
The
mentioned
history
not
above
statistics
be applicable
at run-time.
we propose
levels
been
pattern
data
appears
achieves
anism
has
the
in the
of branch
first
level
dynamic
higher
literature.
history
is the
branch
pre-
accuracy
than
The
mech-
information
history
to
of the
last
k branches
reflect
encountered.
whether
countered,
this
or the
instruction.)
the
had
waa
Suppose
six
taken,
O that
that
the
times
the
previous
eight
11100101,
the
branch
alternated
taken.
Then
the
101010.
Our
The
history
for
the
Training
tive
ations
tion
costs
We
in evaluating
their
trace-driven
benchmarks
1, we
Branch
the
three
the
rithms
different
information.
We
obtain
the
compare
the
the
several
We show
Adaptive
the
This
paper
and its three
six
in six
Adaptive
model
costs.
four
discusses
Section
traces
some
used
in this
results
and
concluding
cor-
Definition
of
which
Predict
2.1
Section
five
of branch
history
first
is the
level
tered.
Branch
Prediction
information
history
(Variations
uses two
to make
of the
of our
last
scheme
predictions.
k branches
reflect
mark
consists
benbmk
of
seven
was
not simulated
independent
simulate the branch behavior
ted these loops.
loops.
se
+J=d(%&)
state
‘lhmition
hgicford
Branch
F’re-
a “l”
is recorded;
a
2k different
in
the
results
were
history
k bits
patterns
the
represented
branch
is
B
register,
being
the
specific
HR,
predicted,
its
history
R=_I,
used to address
the
pattern
history
bits SC in
The
table
s times
by that
in
there
history
last
if
the
appear
2k patterns,
pattern
for
in
register.
conditional
is
dressed entry PHTR=_kR=_k+l
tory table are then used
for
prediction
is
of the
are
For each of these
entry
of
there
branch
. . . . ..B=_.
predicting
denoted
as
pattern
the ad-
in the pattern
the branch.
]hisThe
The
2C
=
(1)
A(sc),
encoun-
whether
takes
Sdictirm
ofB
levels
where
this
because this benchIt
l(s%)
Adaptive
then
Since
k branches
content
After
1The N=a7
taken,
branch
R+kRc_k+l......
history
table.
Branch
ion
Adaptive
:
at most
register.
of the
When
Adaptive
was
contains
content
Section
Overview
Two-Level
i%
Pram
Hietay
Bi@)
is recorded.
register,
preceding
remarks.
Two-Level
00-.-00 ~
of Two-Level
is a corresponding
Simula-
analysis.
branch
a “O”
the history
associ-
the
the
history
the
2
0
1
1: Structure
not,
two
the
the
study.
our
Bd
Figure
Prediction
describes
computes
simulation
contains
three
of
percent
Section
and
and
the
Section
results
diction.
97 percent,
Branch
implementations
. . . . . . . ..lb.a
. ..’..
7’
for Two-
94.4
sections.
register
literature.
accuracy
most
branch
11.......10
11....4..11
*
lb: BrnuhPmlt :ofB ‘+
If
Two-Level
hardware
reports
to
is about
the
of the
shift
00.!.....01
00.......10
—
‘*\
variation
predictors
at
11
accuracy.
variations.
responding
Be4ik-k+l
pattern
branch
achieve
of the
We
we
Prediction
entries
is a k-bit
representing
BranehEiitary Register(BHB)
(Shiileft whenupdate)
algo-
Finally
is organized
our
of each
prediction
schemes
prediction
introduces
tion
average
regis-
PatternBiituryTehle (PET)
popular
and
in the
the
at
history
on the outcomes
““’.~dex :
accuracy.
available
in
information
is collected
of the
bits
pattern
of accumu-
the
based
struc-
the
k branches.
Adaptive
prediction
costs
are
register
of im-
schemes.
history
Adaptive
Branch
other
the
prediction
schemes
the
of
bits
and
BrsnchHietory
Pattern
use these
other
different
in
recent
IIast
Two-Level
1. Instead
contents
history
de-
costs
several
shifts
(HR)
depending
history
the
major’data
programs,
the
informa-
and
prediction
of
Two-Level
while
average
with
measure
same
that
history
of Two-Level
static
The
most
register
see Figure
table
for
of information,
profiling
pattern
s oc-
vari-
effectiveness.
amounts
popular
Level
ated
along
and
the
behavior
uses two
predictions
history
last
k branches.
Adap-
Prediction,
hardware
relative
effectiveness
and
which
the
three
variations,
variations
dynamic
measure
elimiStatic
by
branch
and
time,
(PHT),
by updating
branches.
simulation
of nine of the ten SPEC
measure
the branch
prediction
ac-
Prediction,
proposed
the
three
which
of the
Two-Level
the
table
pattern
pattern
history
the
the
The
in question.
levels
branch
statistics
ters
two
for
of these
branch
or
instruction.)
behavior
pattern
Prediction
lating
on
history
the
encountered.
branch
pattern
on the
of the
history
run-time
not
branch
Branch
the
pattern
the
identified
Branch
compute
of the
of
method
have
last
“taken.”
at run
we resolve
each
curacy
1 and
Adaptive
Using
to
We
finely
gathered.
plementing
level
tures,
k branches
specific
is based
Adaptive
and
the
predict
disadvantages
We call our
how
contain
would
for
Prediction.
on
would
the
the
s occurrences
taken).
last
of the same
is the
of
Prediction
the
of the
taken
level
To maintain
that
not
had
2 are collected
of Two-Level
pending
level
mentioned
method.
Branch
in each
between
predictor
level
above
was
that
branches
information
information
nating
second
branch
of these
k branches
branch
actual
currences
in question.
last
the
k occurrences
second
behavior
1 represents
s = 6, and
last
branch
pattern
8, the
(where
means
en-
behavior
branch
pattern
k =
same
branch
on the
of the
for
scheme
k branches
of the
of the specific
11100101
further
ofour
last
is the
is based
suppose,
the behavior
branch
level
s occurrences
example,
actual
k occurrences
second
Prediction
last
For
the
s occurrences
k branches.
for
last
The
for the last
(Variations
means
too
long
~ is the
the
prediction
conditional
decision
branch
function.
is
resolved,
the
clut-
is shifted
left into
the history
register
IfR
come R.
in the least
significant
bit position
and is also used
to update
the pattern
history
bits in the pattern
lhisAfter
being
PHTRc_k
Rc_k+l
. . . . .. Rc_l.
tory
table
entry
to
of these seven kernels, so we omit-
125
updated,
the
&k+l&k+2
. . . . ..
content
pattern
history
pattern
history
is done
by
in the
branch
bits.
bits
the
and
in
state
pattern
as inputs
register
state
represented
S.+l.
the
transition
to
and
new
the
new
pattern
pattern
history
branch
entry
is greater
takes
will
be
of the
tive
Branch
history
bits
S.+l
be-
Sc+l
= 6(Sc,
implement
bits
the
function
in the entries
sition
function
bits
1 and
logic
6 to update
of the pattern
b, predicting
S and
the
finite-state
Moore
the
R
machine,
of
is used
pattern
history
function
outcome
circuit
table.
The
branch
characterized
track
The
counter
used
in
of
this
for
study
the pattern
path
history
the
outcome
the
history
history
will
Last-
the
pattern
tran-
history
to
comprise
a
of the
pattern
pattern
results
appears
has
predicted
as not
predicted
as taken.
up-down
Smith’s
bit
the
The
same
history
be what
hap-
to store
that
Al
the
automaton
A2
to the
branch
for
will
be
used
keeping
in
is preset
A, for
fore
is,
J.
branch
As
different
pattern,
be found
can
be different
tion
for Two-Level
ma
Atizmatnn
S=.”-.”e
u.-
Two-Level
M!
cod..>
Adaptive
Static
ions.
Adaptive
accurate
over
many
Training,
on
if changing
data
branch
histhere
decision
Prediction.
Branch
can
therefore,
funcPredic-
Prediction
bits change
the predictor
change
behavior
Wit
in Two-Level
can adjust
to
of the program
h these
run-time
Prediction
can
programs
and
contrary,
brings
may
about
not
to
up dat es,
Branch
sets
the
information
different
the
on
executes.
execution
predict
same
execu-
information
actual
results
prediction
Branch
program
branch
proper
if the
during
same
table;
to the
the pattern
history
Branch
Prediction,
the current
the
history
history
That
Prediction,
given
Adaptive
as the
made
times
Branch
be-
pattern.
pattern
history
table with the
pattern
inputs
of Two-Level
are
at different
pattern
before
history
a result,
in the
is known
branch
Adaptive
tory
function,
of A is determined
predictions
appears
of branches.
make
Al
branch
Two-Level
Since
Adaptive
AtinmtaI
a given
In Static
decision
output
other
hand,
updates
the
kept in the pattern
history
tions
(LT)
for
changes
Prediction
profiling.
pattern
dif-
pattern
table
Branch
from
history
major
the
history
be-
informa-
The
prediction
the
taken.
Adaptive
is that
Adaptive
Training
not
predictors,
history.
schemes
to the
is
or equal
on run-time
in the pattern
branch
pattern
adaptively
Lae&Time
branch
ac-
Two-Level
branch
based
two
Therefore,
same
tion.
are
the
which
of A2.
and
dynamic
othertime
the branch
is predicted
[14]
are
input
execution
the
branch
are variations
these
a given
history
[17].
Automaton
the
A4
in Static
the
content
is greater
in Two-Level
execution.
is a saturating
automaton
design
be
information
register
next
value
dynamic
history
Training,
pat-
will
the
between
but
records
history
i.e.
is taken;
The
counter
predictions
ference
as
if the
Training
dynamically
when
same
pattern
otherwise,
buffer
The
only
the
automaton
tion,
there
is no taken
branch
of the branch
when
the
same
similar
target
time
will
the
2.
branch
next
is needed
times
in
their
same
entry,
and
Static
of
is the
table
otherwise,
A3
history
result
history
Prediction
cause
which
history
of the
The
two
taken;
counter,
branch
history
one
history
in Figure
The
Only
when
next
execution
register
shown
execution
information.
last
pattern
the prediction
Only
of the
tern
appeared.
recorded,
the
history
two;
index,
history
the
pattern
taken
Both
machines
for predicting
in the pattern
appeared.
time.
history
and
are
stores
last
the
entry
take
Moore
same
Adapup-down
when
is decremented.
has the
Automata
Two-Level
content
entry
branch
of a certain
register
table
counter
predicted
history
value
the
saturating
is incremented
history
the
In
2-bit
is not
of the
the counter
otherwise,
taken.
of the
branch
execution
when
two;
the
history
cesses the same
by equations
finite-state
updating
Time
pattern
last
the
table
branch
automaton
pened
whose
Branch
diagrams
as not
Prediction,
keeps
branch
history
J, pattern
the
to
2.
State
the
combinational
to
the
next
as taken
or equal
predicted
when
of the
be predicted
than
counter
wise,
A straightforward
path
pattern.
the
(2)
RJ.
branch
will
a branch,
come
is decremented
The
of the
table
outcome
and
taken.
by the
6 which
the
is taken
beccmes
transition
history
function
bits
generate
the
The
pattern
history
Therefore,
history
the
becomes
bits
the
old
of
R.
be highly
data
sets.
predict
well
different
execution
behavior.
2.2
Alternative
Implementations
Adaptive
Automaton
Automaton
A3
Fimre
2: State diamams
of the
ch~nes used for mak~ng prediction
tern
history
In
J.
table
Smith’s
A4
counter
keeps
track
branch.
The
counter
the
of the
2-bit
branch
is incremented
saturating
history
when
up-down
alternative
Adaptive
Branch
Adaptive
Global
History
History
Table
branch
(GPHT)
126
used
of
Two-Level
Prediction
implementations
Prediction,
are differentiated
of the Two-
as shown
in Figure
as follows:
Branch
Register
Prediction
and
a
Using
Global
a
Pattern
(GAg)
In GAg,
there
ter (GHR)
and
of a certain
the
are three
Level
Two-Level
entry.
design
There
3. They
finite-state
Moore
maand updating
the pat-
Branch
is only
a single
a single
global
by the
Two-Level
global
pattern
Adaptive
history
history
Branch
registable
Pre-
GAg
In order
PAg
levels,
Glab.1
—
Slsby
‘IUI1.
(CWmTJ
UObd
k-h
to completely
each static
ble a set of which
table
(P PHT).
and
bt-Y
L8#hr
Qnmu
each
ters
Therefore,
history
a per-address
history
history
table
branch.
in a per-address
All
branch
for
each
distinct
Per-address
ing
Per-address
3
Implementation
3.1
Figure
3: Global
Adaptive
view
Branch
of three
variations
of Two-Level
history
regis-
history
tab ie.
ficult
diction.
global
All
branch
history
which
are
updated
variation
are based
and
pattern
after
therefore
Branch
predictions
register
global
each
is called
Prediction
using
branch
Global
on the
same
history
table
is resolved.
Two-Level
a global
pattern
This
cycle
from
this
branch
table
ister
(GAg).
Since
the
outcomes
of different
register
and the same
same history
the information
tory
of both
is influenced
branches
branch
by results
update
pattern
history
history
and
of different
prediction
for a conditional
branch
tually
dependent
on the outcomes
pattern
branches.
can
updated
the
his-
Adaptive
The
in this scheme is acof other branches.
Per-address
Branch
Pattern
In
History
order
the
branch
collect
branch
tory
registers
tory
table
the
each
one specific
branch
instruction
is kept
for
each
branch
all history
pattern
history
table,
pattern
The
history
the
is based on
history
bits
entry
history
table.
indexed
register.
by
Since
table,
Two-Level
variation
the prediction,
the same
is called
by
the
by
update
history
branch
indi-
Per-address
using
a global
history
The
prediction
the branch’s
in the global
the
all
content
branches
of
Branch
History
the
for
the
branch’s
the
same
interference
History
Tables
still
Prediction
Table
prediction
time.
The
the
prediction
table
branch
results
is
is accessed.
is incurred
to the
history
table,
of the
branch
until
from
the
the
pre-
time
may
not
Adaptive
The
when
can either
if two
update
be
hand,
result
update
is known.
With
occurs,
of the
same
as that
be cle-
speculative
branch
his-
depending
on
branch
static
his-
pattern
can
the
or repaired
to the
the
as critical
its
available
is very
the branch
for
to
accu-
Prediction
is not
a misprediction
a czme,
be used
prediction
timing
therefore,
instances
the
Branch
be reinitialized
budget
can
by updating
on the other
history;
In such
branches
Since
is enhanced
the branch
hardware
is degraded.
previous
history.
speculatively.
Also,
predictor.
branch
occur
history
in consecutive
cycles,
the latency
of prediction
can be
reduced
for the second
branch
by using the prediction
pattern
fetched
exists.
3.2
from
Target
the
pattern
Address
After
the direction
still the possibility
Using
and
at that
is known
previous
branch
of Twc-Level
the
a conditional
fetched
with
The
history
latency
the
along
table.
is encountered,
cycle
of the
high,
tory
the global
with
by appending
prediction
be updated
reg-
history
prediction
derived
The
the accuracy
racy
updating,
branch
and
own history
and the
pattern
history
table
update
Adaptive
Branch
conditional
register
pattern
table is then stored
the branch
history
address
predictions
tory
global
When~ a
history
before the prediction
of a subsequent
branch
takes
If the obsolete
branch
history
is used for making
layed
history
Pattern
branch
Prediction
the pattern
Per-address
address
access
of a static
own
branch
pattern
the
is known.
accesses
the
next
High
witbin
as follows:
cycle,
history.
cycle.
made
address
se-
is dif-
sequential
branch’s
contents
also
one
one
be
two
as the branch
the
his-
is accessed
conditional
registers
results
history
and
can
into
the
the
two
It
is available.
his-
is accessible
for
branch
only
ready
place.
(PAg).
branch’s
pattern
entry
Since
Branch
table
execution
update
each
static
this
Adaptive
The
same
and
requires
cycles
known,
old
branch
Sometimes
to
branch
the
Prediction
branch
the
register
as soon
the
diction
level
is as-
branch
individually.
addresses.
and
first
register
in a per-address
instruction
distinct
the
the
In the
that
Therefore,
uM-
(PAp).
prediction
different
becomes
history
available
a
a Global
conditional
information
vidually
Two-Level
static
in which
static
in
history
in two
to
time
time
one
are contained
(PBHT)
and
interference
distinct
history
Table
Using
time
be accessed
result
is called
a prediction.
accesses
that
the
history
pattern
(PAg)
information,
with
Prediction
History
Table
reduce
history
sociated
Branch
two
from the pattern
history
the branch’s
history
in
next
Two-Level
the
is updated.
table
the
table,
it
Prediction
Prediction
requirement,
result
tables
to make
requires
are performed
Adaptive
history
to squeeze
To satisfy
branch,
Branch
Branch
accesses
performance
one
Timing
of
Update
table
Branch
Preinformatim
Branch
history
associated
Considerations
Adaptive
quential
conditional
Adaptive
pattern
Pipeline
Information
Two-Level
Prediction.
static
Two-Level
ta-
history
register
are
Since this variation
of Two-Level
Adaptive
diction
keeps separate
history
and pattern
%
in both
pattern
pattern
conditional
are grouped
interference
a per-address
pattern
static
the
has its own
is called
a per-address
with
remove
branch
Per-
it
(PAp)
127
takes
to
generate
history
table
directly.
Caching
of a branch
of a pipeline
the
target
is predicted,
there
is
bubble
due to the time
address.
To
eliminate
this
bubble,
we cache
the
target
One extra
field is required
history
table fordoing
this.
taken,
the
target
instructions;
Caching
cycles
requires
the
fetching
address
fetched
possible
of the
because
the
instruction
branch
the
block
history
in the instruction
tions
are
history
block
a branch
block
address
3.3
F’Ag
infor-
to fetch
new in-
to
have
hold
all
branches’
in the
and
branch
a branch
History
history
execution
Table
use
It
table
history
branch
of entries
Within
a set, a Least-Recently-Used
is used
for
replacement.
is used
The
to index
predicted,
the branch’s
entry
ble is located
first.
If the tag
accessing
address,
the
to predict
address,
into
In this
an Ideal
branch
study,
entry
the
both
Branch
the
History
Table
The
branch
history
practical
(IBHT),
A
(LRU)
al-
part
of a
and
in the
not
and
LRU
and
branch
is used
the
rest
history
are
table
The
branch
history
table
size is h.
The
branch
history
table
is 2~ -way
there
are
a predic-
bits.
Each
history
register
contains
Each
pattern
history
table
history
table
entry
set
and
PAg,
set-associative.
k bits.
size
contains
is p.
to the size of the branch
p is always
in which
the
multiplexer,
the
finite-state
and
Furthermore,
integer.
s bits.
(In
PAp,
history
table,
equal
to one.)
p is
h, while
When
i is equal
to log2h
there
k bits
are
tathe
entry
cost.5.cheme(BHz’(
and
there
h,j,
=
CCJStBJI~(h,
=
{BHTstcwageasPcace
j,
k),p
shifter,
the
and
incremen-
is a non-negative
in a history
k)
+
BHT~P~ating-~ogtc}
register,
=
X f30StPWT(2k,
+
BHTACCe.mg.LOgtC
+
{[~ x (~w?f.-t+,)-b,t
256-entry
IBHT
[1 x Address-Decoder,~,t
to the
table
ations.
+
PHTupdattng_z.og
+
ic)
+ Predictzorz-BitI_w
+
+2’
x
Jj_b,t + 1 X 2Jxl-Muxk-b,t]
[h X sh2f~eTkJit
i- ’2’ X LRU.Incrementorsj_b,
X ~iStOT@itS.-b,t]
[1 x /iddress-~ecoderkAit]
128
S)
Comparators(.-i+
p X {[2k
Branch
{PHTstomgeJpace
+ H&btt
direct-mapped
simulation
x
a
s))
~
+P
PHTAc.e.,in9_Logtc
four
x PHT(2k,
+LRU-Bits,Jzt)]
data is provided
to show the accuracy
loss due
history
interference
in a practical
branch
history
the
machine.
pattern
history
table always has 2h entries.
The
hardware
cost of Two-Level
Adaptive
Prediction
is as follows:
branch,
Predicwith
parator,
tor,
512-entry,
4-way
512-entry
and
implement
of which
table
bit,
4-way
set-associative
256-entry,
direct-mapped
The
equations
predictor.
a subset
indexed
configurations:
set-associative
caches.
following
the
match
approach
was simulated
the
tar-
C., 6’d, C., Cm, Csh, Ci, and C. are the constant
base costs” for the storage,
the decoder,
the com-
branch.
static
conditional
Adaptive
Branch
table
for
are:
history
in the
address
for caching
branch
tion
equal
together
table
the
bits,
and
be imple-
branch
history
entry
matches
for
as a tag
the
estimates
branch
ad-
circuits
space
in the
for
of these
Pattern
per-
cache.
does
included
are a address
in GAg
can
information
above
to
entry
associated
branch
is to be
If the tag
is allocated
is a history
register
for each
were simulated
for Two-Level
tion.
in the
in the
the branch.
a new
the
lower
higher
part
is stored
as a tag in the
with
that branch.
When
a conditional
storage
required
of
and
In an entry
of the branch
history
table,
fields for branch
history,
an address
tag,
implemen-
for
here.
are grouped
as a set.
address
enough
or direct-mapped
in the table
is not
table,
update
of the
consists
Imple-
large
table
history
bit
The
logic
logic
and
entry.
per-address
is not fea-
in real
history
as a set-associative
number
is used
static
branch
for
tags,
incrementors,
history
two
space
updating
bits
branch
are the
bits,
updating
LRU
table.
the
and
and
the
it is not
prediction
accessing
for
history
stored
whether
or not
next sequential
gorithm
the
table,
storage
pattern
to index
The
include
the
following
characterize
items
MUXes,
There
is
instruction
history
determine
from
the
branch
predictors
all
tables
in their
structure.
per-address
branch
if t here
if the
Therefore,
a practical
approach
branch
history
table is proposed
mented
fixed
are decoded,
to
variations.
Detailed
Assumptions
at ion
sible
The
is used
proposed
table
accessing
predic-
The
history
and
addresses
branch
pattern
decoders
because
In this
Branch
and PAp
history
address
get
be squashed.
branch
t ations.
branch
in the instruction
are
three
information,
The
and
a run-time
the
history
pattern
table.
block
dress
for
inconsequential.
of the
and
parts.
bits
Estimates
estimates
costs
table
decoders
history
address
cost
relative
comparators,
the instrucin the
is not
LRU
in
history
missed
Per-address
ment
before
required
keeping
branch
branch
instruction
should
hits
area
tables.
or the
branch
prediction
is used to
the new instructions
fetched
address
until
address
misses
the instructions
in the
known
chip
mechanism
major
by
being
in the branch
sequential
After
than
of the
is no branch
cycle
present
case, the next
st ruct ions.
address
there
in that
is not
If the
prediction
also
block
The
history
by the
rather
is not
can be made
If the
either
fetched
mation
is decoded.
block
decoded.
table,
block
instruction
address
the
This
Cost
tion
the
in con-
delay.
Hardware
hardware
is used.
to be accessed
in the
table,
following
prediction
any
instruction
branch
the
address
makes
table
branch
3.4
of branches.
to fetch
without
history
of the
the
addresses
branch
address
is used
the fall-through
the target
secutive
the
address
otherwise,
addresses
in each entry
of the branch
When
abranch
is predicted
i-
t]} +
+
+ [State-updater.-b,
t]}
=
{hx[(a–i+j)
[hxc.
+2’x(a–
[h XkXC,g.
[2’
x
4.1
+k+l+j]xcs+
a+j)xcc+2’xkxcm]+
+2’
Cd]+
[s
XjXCi]}
x
2s+’
+~X{[2k
x
Ca]},
X.
a +j
>
Description
Nine
used
XC.]+
i.
The
In GAg,
only one history
tern history
table are used,
register
and one global
patso h and p are both equal to
costGAg(BHT(l,
, k), 1 x PHT(2~,
COStBHT(l,
, k) + 1 X COS~PHT(2k,
&
{[k+l]xc,
+k
{2’
It is clear
x($
x
to see that
S)
COStBHT(h,
H
{hx[(a+2x
j, k) + 1 X
j+k+l-i)x
~ x Csh]}
were
The
cost
of a PAg
to the
scheme
history
grows
register
exponentially
length
and
j, k), h x PHT(2k,
=
6’OS~BHT(h,
E
{hx[(a+2x
matrix300,
and
instruction
because
rate
depends
usually
II
Benchmark
with
II tomcatv
s))
j, k) + h X COS~PHT(2k,
When
the
history
branch
table
register
history
a+j>i.
is sufficiently
and
the
respect
with
size.
However,
the
a more
dominant
factor
respect
branch
history
than
it is in
were
used in this
study.
A Mo-
88100 instruction
level simulator
is used for geninstruction
traces.
The instruction
and address
traces are fed into the branch
decodes instructions,
predicts
prediction
branches,
simulator
which
and verifies
the
predictions
results
collect
for
branch
with
the
prediction
1: Number
Benchmark
Name
eqntott
espresso
gcc
Xhsp
doduc
fpppp
matrix300
spice2g6
tomcatv
to the
Model
simulations
on the
The
simulated
regular
The
testing
and
used in this
study
lC)O
be-
of static
traces
History
number
for
branch
number
instruction
1.
twenty
executed.
of
the
register
hit
of static
branches
training
data
are listed
sets
in Table
Number
Static
Benchmark
of
Number
Static
of
370
branch
“
n
IJ
of static
conditional
branches
in each
cost
to the
scheme.
Trace-driven
the
are
(6)
large,
with
linearly
table
Simulation
torola
erating
c, +Cd)},
exponentially
length
size becomes
a PAg
4
history
register
were
branch
before
benchmark.
x (s x
grows
conditional
of their
in Table
on the
benchmarks
+cd+
+
h x {2’
scheme
a
S)
Table
of a PAp
where
with
linearly
j+k+l–i)xc,
k x Csh]}
benchmarks.
behav-
focuses
instructions
tomcatv
in
listed
have
all
finished
the programs.
are
They
branch
2.
(5)
respect
to the branch
history
table size.
In a PAp scheme using a branch
history
table
as defined
above,
h pattern
history
tables
are used, so p is
equal to h. By using Function
3, the estimated
cost for
PAp is as follows:
costp,4p(BHT(h,
Fpppp,
million
in the
a+j~i.
million
branch
branches
and
benchmarks
prediction
which
conditional
for each benchmark
+
gcc
benchmarks
in-
spice2g6
interesting.
branches,
twenty
out
~pppp,
execution;
is attainable,
irregular
integer
of branch
except
through
in-
branch
is tested.
million
havior
Cd+
and
conditional
for
the
loop
Doduc,
are more
mettle
simulated
ones
is not
benchmarks,
accuracy
it is on the
study
fpppp,
integer
Nasa7
repetitive
used.
branches
for
conditional
C.+
{2k X( SXCS+C~)},
respect
benchmarks
predictor’s
the
to capture
point
have
the integer
instructions
cOStPHT(2k,S)
tomcatv
conditional
li.
doduc,
kernels.
prediction
this
long
floating
predictors
Since
s))
too
high
prediction
exponen-
five
and
suite are
are float-
benchmarks.
include
gee, and
of the
(4)
grows
it takes
Therefore,
are integer
tomcatv
espresso,
and
a very
branch
of GAg
and
of all seven
the
four
benchmarks
dependent
ior.
j, k), 1 x PHT(2’,
=
because
matrix300
tially
with
respect
to the history
register
length.
In PAg, only one pattern
history
table is used, so p
is equal to one.
Since j and s are usually
small
compared
to the other
variables,
by using
Function
3, the
estimated
cost for PAg using a branch
history
table is
as follows:
costp&(BHT(h,
cluded
Among
xc,,}+
cost
eqntott,
many
C.+c,)}
the
include
thus,
and
point
spice2g6
behavior
s))
=
benchmarks
floating
matrix300,
one. No tag and no branch
history
table accessing
logic
are necessary
for the single
history
register.
Besides,
pattern
history
state updating
logic is small
compared
to the other two terms in the pattern
history
table cost.
Therefore,
cost estimation
function
for GAg can be simplified
from Function
3 to the following
Function:
Traces
benchmarks
from the SPEC
benchmark
in this branch
prediction
study.
Five
ing point
(3)
of
to
statistics
Table
accuracy.
129
2: Training
Cps
cexp.i
tower of hanoi
tiny doducin
NA
NA
short greycode.in
NA
and
testing
data
bcadbxout.i
eight queens
doducin
natoms
Built-in
greycode.in
Built-in
sets of benchmarks.
In
the
about
traces
generated
24 percent
integer
of
benchmarks
namic
instructions
instructions.
cent
dynamic
of the
branches;
ditional
diction
and
about
the
the
sets,
content
in each
entry.
for
the
pattern
history
table
the
dy-
Figure
data
5 percent
floating
of
point
benchmarks
4 shows
about
instructions
prediction
branches
is the
mechanisms
for
testing
instructions
Figure
branch
therefore,
the
dynamic
for
are branch
with
the
part
are conditional
mechanism
most important
different
classes
for
2. For Branch
is not
a flag
context
specified
con-
among the preof branches.
not
B-h
Idrwtion
specified,
Since
Distribution
in the branch
9s..
table
miss
the
is known,
history
flushing
Frm .Suh hst
El Im Smlwh h!
■ Junv Fnc.ilw Iml
❑ Cadnc.ml Bmti
of the branch
and
table
the
reinitialization
table
bit
A
not
is
it
is
to all 1‘s
occurs.
history
is extended
of the
After
the branch
context
taken
a history
is initialized
causes
result
register.
than
results,
history
which
If
ch
are simulated.
simulation
branch
is
extS’wit
simulated.
branches
history
on the
Cent
switches
to our
history
Context_Switch
are
taken
in
the Pattern
is no pattern
When
switches
more
according
the result
~
1
switches.
are
a miss
there
in the
shown
designs,
designs.
no context
there
register
when
out
their
aa c, context
branches
-C
in
of an entry
automaton
Buffer
because
kept
for
content
be any
Target
included,
information
80 per-
The
can
through-
switch
branch
results
history
in
table.
—
1.s1
Mod.]
Name
~
r co
#
of
As.
Er,tr.
PHT
PHT
.mfig.
of
Entry
Ed,y
set
#
Comt
Size
!3ni,.
Cent
—
GAg(HK(l,
1
,,.s,),
1 X PHT(2r
PAs(BHT
(256,1,
1 X PHT(2”
r-sr),
,A2),
branch
instructions.
r-bit
256
4
r-bit
Characterization
of
Branch
1
r. bit
r-bit
512
4
r-bit
512
4
r-bit
pAg(BHT
The
three
variations
Prediction
were
tions.
tions
in
of
the
3.
schemes
tion
dynamic
also
simulated.
dynamic
s.),
we
to
analyzed,
the
Scheme(
History
IBHT,
entity,
and
the
PAg(l
512
r-Sr),
are
,xsr),
r-bit
512
entity
1
for
to keep
HR
specifies
Associativity
the
is the
Entry_Content
number
Asc
the
(lAg,
of the
in
branch
history
table
entry.
When
Associativity
to 1, the branch
history
table
is direct-mapped.
Atm
1
~.
A* III
1
2r
Atm
LT
A2
512
2“
Atui
1
2“
PB
1
2P
PB
A2
A2
– Table
Set-A
Table,
ble,
Stattc
IBHT
Tame,
a
Level
ta-
tory
each
-
Global
Tables,
Training
is set
The
tern
content
of an entry
in the branch
history
table
can be
any automaton
shown
in Figure
2 or simply
a history
Entr.
– Entr8es,
GAg
Table
a
PB
Branch
Preset
sr
Shaft
3: Configurations
-
Table,
PAP
Pattern
LT
Branch
PSg
-
History
-
Ta.
Last.
Prediction
Per-address
Per-address
Bzt,
GSg
H8story
Infin$te,
–
–
Adapttve
Table,
Pattern
inf
Us$ng
Global
-
H8story
Adapt%ue
Predzctzon
a Preset
Table,
Tab/e,
– Branch
Config.
Two-Level
Global
Two-Level
H%story
Des8gn,
– Global
Pattern
Predation
BHT
Buffer
a Preset
Hwtory
Pattern
-
Global
Ustng
Branch
Using
History
Ustng
Per-address
Adaptsve
– Automaton,
Target
Idea/
-
Atm
– Branch
Tra$nmg
PAg
Us$ng
in
ssociativxty,
BTB
Predactaon
Global
Aim
4
—
Branch
register),
content
2“
LT
Configuration,
information
associativity
specifies
T
512
J.])
Pattern(
of entries
1
s
At III
J.i)
BTB(BHT(512,4,LT),
H%story
history
Aim
AS
73K
4
512
conven-
If a predictor
naming
conven-
history
(A single
2.
A4
r-bit
shown
example,
1
S,
blank.
scheme,
used
Size
is left
At m
S*
—
Associativity,
x
2“
A2
r-bit
7
different
naming
Size,
1
Al
r-bit
4
s,
branch
the
Aim
,,
,LT),[c])
BHT(inf,
At m
2,
S,
Target
Buffer
design
(BTB)
Associativity,
EntrgX’ontent),
for example,
or BHT.
ble,
following
ConteztSwitch).
feature
in the
Branch
Size,
is the
that
(512,4,
2r
A2
r-bit
4
[c])
configura-
predictors
distinguish
field
specifies
PAg,
PAp
or
[17]. In History(
of branches,
static
Pattern.TableSet-.Size
corresponding
Scheme
512
,A4),
1 X PHT(2r
configura-
The
History(
Entry-Content),
not have a certain
the
several
and
branch
order
Entry-Content),
tion,
Branch
1
1
s,
512
In
is used:
Size,
does
Adaptive
with
known
were
Table
Two-Level
simulated
Other
predictors
of
Atm
s,
,A2),[c])
1 XPH’1’(2r
2r
s,
PA~(BHT(512>4,r-st),
Predictors
1
AZ
4
1x PHT(22r,A3),[c])
4.2
A*JII
,,
PAg(BHT(512,4,Mr),
1 xPHT(2”
2’
A2
51’2
,Al),[c])
PAs(BHT(512,4,,.
1
A2
512
[c])
512,4,,-s,),
1 X PHT(2r
At m
S,
r-sr),
1 X PHT(2r
2T
**
wr),
PAs(BHT(512,1,
of dynamic
1
,A2),[c])
PAs(BHT(
4: Distribution
256
,A2),[c])
1 XPHT(2r
1
S,
,A2),[c])
PA~(BHT(256,4,
Figure
r-bit
Two-
Pattern
Hzs-
Per-address
Table,
Stattc
PHT
– Pat-
Register.
of simulated
branch
predictors.
register.
In
Pattern-Table-Set-Size
Size,
Entry-Cent
ent ), Pattern_Table?Set_
number
of pattern
Pattern
is the
tory
the
history
tables
implementation
information,
implementation,
Size
for
specifies
and
used
in
keeping
the number
Entry_Content
The
Pattern(
Size is the
the
entries
of entries
specifies
history
bits
are also initialized
in the
pattern
history
at the beginning
table
of execution.
Since taken
branches
are more likely
for those pattern
history
tables
using automata
Al,
A2, A3, and A4, all
scheme,
pattern
pattern
his-
entries
in
tries
the
130
are initialized
are initialized
to state
to state
3. For
1 such
Last-Time,
that
the
all en-
branches
at
the beginning
of execution
dicted
It
taken.
history
tables
In
addition
and
A.
is not
during
execution.
to the
Two-Level
Smith’s
Static
get Buffer
designs,
prediction
schemes
poses.
ilar
scheme
with
to the
profiling.
Training
Training
using
the
GAg
but
tern
history
a preset
the
with
the
as used
a fair
comparison.
Two-Level
table
and
the
execution
starts,
Branch
A2
diction
(BTFN)
backward
Always
and
cution
predicts
not
the
of
taken
profiling
a branch
profiling
data
program
executed
the
5
Branch
Figures
the
scheme
one
not-taken
the
with
The
for
branch
testing
taken
data
history
the
schemes
history
history
table
ta-
tables
hit
History
Table
to
ratio.
Automa-
efficiency
simulated
12-bit
better
A2,
Last-
Time
time;
Time.
only
formance
of A2,
however,
A2
A3,
in
and
The
four-state
more
history
what
tolerant
performs
figures
clearly,
with
automaton
set-
automata
information
happened
the
to the
devi-
four-state
The per-
close to each other;
best.
each
and
predic-
A4 all per-
history.
Among
the
worse than the others.
and A4 are very
finiteA4,
a four-way
A3,
more
usually
is shown
A3,
branch
records
are therefore
execution
performs
a PAg
A2,
maintain
which
they
following
A4
Al,
different
A2,
registers
BHT.
Last-
and
Al,
with
history
than
A3,
of using
automata
In
order
Two-Level
to
show
Adaptive
A2.
AdsPU.
kin.
U-
DifArd
Mb
Tmmi!4mAnkmmta
—
0 9s00 ~
-——U— P~( BHT(512,4,12m’\
PliT(2.12,LllJ
- .+
-.
P*( 8HT(512,4,12s’k
PHT(’2W2J1)J
–– 4–-
P~( BHT(512,4,12sJ
PnT(m2A2),)
‘e
P~( S14T(512,4,12H’~
Plil(’?l Z.Ml)
—
PM BliT(51Z4,12m’~
PHT(2w2#\}
—
c
for
Taken
;08000.
---------------------------
a
I
c
Y00400
------
.--..,.
-----------
I
-----------------------------------------------------------
t
for-
loop-bound
in the
counts
exe-
the
static
fre-
branch
direction
of
Becchmalk
frequently.
executed
with
predictions
sets,
PAp
branch
branch
Pattern
512-entry
ho.laml
branches
most
Simulation
11 show
the
thus
for
a
the
Fimre
calculat-
5:
Comparison
Pr~dictors
benchmarks.
is the
benchmarks.
Results
prediction
described
across all the
mean across
GMean”
point
takes
were
having
Scheme
a pro-
Not
predicted
branch
ideal
pre-
branches
each
regjs-
usin~
of Two-Level
different
Adaptive
finite-state
Branch
auto~lata.
accuracy.
predictors
SPEC
for
and
Taken,
and
once
of a program
set is used
5 through
ometric
mean
is the geometric
“FP
only
profiling
Prediction
on the nine
and
mispredicts
prediction
branch
floating
it
and
Always
for
branch
Five
the
Al
with
branch
a branch
The
information
training
ing
branch
the
of the
Branch
history
of increasing
PAg
an
practical
the
ations
in
automata,
the
both
simulated
is effective
Adaptive
table.
predicts
if
Time
the preset
Forward
if
execution.
is the
and
taken
scheme
because
of a loop.
quency
taken
BTFN
scheme
Taken
for
the
by
Taken,
The
with
of
Last-
last
history
history
the
Not
Taken
Backward
The
programs,
include
Forward
TWID-
program
to load
static
the
Branch
different
effectiveness
with
5 shows
than
Train-
before
were
effect
automata.
Al,
configu-
required
The
and
state
form
were
schemes
Static
pattern
designs
simulated
scheme
ward.
the
Last-Time.
and
table
the branch
table
Two-Level
length.
Effect
tor,
in this
schemes
Training,
of
Prediction
with
the
simulated
associative
stati-
to implement
is needed
into
Buffer
Taken
scheme.
time
bits
and
all branches.
The
extra
schemes
Backward
filing
history
(PSp)
simulated
cost
because
In Static
Target
automata
the
the
Figure
Static
branches
to implement
than
is
using
tables
Adaptive
each
(IBHT)
pat-
a lot of storage
history
100 per-
ton
to a different
branch
Parameters
simulated
history
5.1.1
to
profiling
Per-address
were
to
table.
structure
Training
Training
cost
pattern
prediction
show
Static
Scheme
are similar.
pattern
in
The
Adaptive
Static
of all
Two-Level
less expensive
recorded
per-address
requires
schemes
same
difference
lengths
ble
history
behavior
by the
to assess
were
Training
scheme
A. Smith’s
rations
ing is not
this
no PSp
Lee and
simulated
pattern
of pattern
ter
second-level
table.
76 percent
Branch
of
Adaptive
history
the
variations
were
Static
Static
of Static
however,
track
schemes
history
three
Smith’s
from
of
Adaptive
Prediction
is pre-determined
the
Global
per-address
Therefore,
study.
that
The
A.
is collected
meaning
is sim-
a similar
from
Prediction
scheme
pattern
has
difference
application
structure;
to keep
preset
pattern
using
is another
and
meaning
which
information
global
Training
as PSg,
a global
PSg,
pattern
scaled
Evaluation
Level
Tarpur-
Two-Level
Lee
5.1
branch
comparison
the important
accuracy
cent.
Lee
Branch
Training
with
study,
scheme
with
abbreviated
cally.
this
is identified
Similarly,
Static
for a given
In
for
prediction
pattern
and static
simulated
but
to be pre-
schemes,
schemes,
dynamic
Per-address
an IBHT
likely
to reinitialize
Training
were
the prediction
by
be more
Adaptive
and some
Lee and A. Smith’s
in structure
that
will
necessary
in the
“Tot
The
accuracy
previous
GMean”
benchmarks,
all the integer
geometric
mean
vertical
5.1.2
Three
of
same
session
Figure
is the ge-
all
the
history
shows
the
forms
across
131
of
History
variations
using
Register
history
Length
registers
of
the
length
6 shows
the prediction
Every scheme
“Int GMean”
benchmarks,
axis
Effect
the
register
the
effects
of history
register
accuracy
of Two-Level
Adaptive
in the graph waa simulated
with
best,
length.
PAg
Among
the
the variations,
second,
and
GAg
length
on
schemes.
the samle
PAp
the
per-
worst.
GAg
is not
every
effective
branch
excessive
with
updates
interference.
because
6-bit
history
the same
PAg
it has a branch
performs
history
registers,
because
register,
causing
history
better
table
than
which
5.1.3
GAg,
reduces
Hardware
In
the
Figure
the
interference
in branch
history.
PAp predicts
the best,
because
the interference
in the pattern
history
is re-
6, prediction
sillabktoiymahhrdtb-kn
aih
various
Two-Level
costs.
the
least
PAp
the
I
same
is useful
cmm
variation
Comparison
using
of
history
the
registers
Two-Level
of the
cost
three.
GAg’s
Effects
To
of
length,
ious
Figure
from
in accuracy
to
effect
ference
in
their
the
accuracy
There
on
The
GAg
schemes
history
and
short
history
effectiveness
with
with
history
register
history
register
history
schemes
because
of the
pattern
pattern
regis-
and
these
when
due
multiple
pattern
history
tables.
a
to
............----0----
t&+# BHq,m,w\
- .+
PM Bli-r[51~,a@,
2wPHl(afA2).)
PHT(2vBA2L]
---. --. --, ---------------------------
8:
I
@# BHql,.Mw~
PHT(N4J2L]
The
--
.- -A. . . . PN[ BHT[51z4,1w
?HT(& 12&),)
Since
Two-Level
branch
history
table
Two-Level
97 percent
needs
of
Context
the
Ben&mark
lengths
tions
if no trap
After
a context
on
the
saved
is more
likely
pattern
history
assuming
history
table.
that
switches
occur
average
accuracy
uses
history,
the
the
switch.
prediction
and
Fig-
accuracy
without
context
whenever
a trap
ocevery 500,000
instruc-
pattern
pattern
switch
is simulated.
history
table
history
table
to be similar
to the
is not
of the
current
table
than
to a re-initialized
The
value
500,000
is derived
a 50 MHz
every
the
with
a context
switch,
the
process
of branch
a context
simulation,
trace or
because
by
132
in
simulated
occurs,
Prediction
track
during
re-initialized,
process’s
pattern
schemes.
Branch
to keep
difference
schemes
achieve
Switch
Adaptive
table
schemes
accuracy.
to be flushed
9 shows
three
Adaptive
prediction
switches.
During
the
curs in the instruction
register
among
is expensive
is expensive
Effect
ure
history
cheapest
table
PAp
5.1.4
for
of various
is the
Banchmafk
O.aooo
Effect
PAg
history
an
12-bit history
regisregisters.
According
and
c0 Moo
7:
requires
history
To
requires
less inter-
history
!
GAg
PAg
6-bit
GAg
0.7800
c
Figure
accuracy,
even
A
~ O.azoo
a
accuracy.
is used.
registers.
....J3....
configuration
prediction
O.a+oo
about
O.aaoo
variation’s
about
is chosen
register
-.----...-.......-.-.:.
Figure
“
register,
requires
achieve
scheme
var-
of
The
is the
of 9
the
on PAiz
branch
register
of GAg
schemes.
effect
history
is an increase
effect
on PAp
the
of
by lengthening
has smaller
smaller
lengths
effect
lengths.
18 bits.
is obvious
len~th
register
the
7 shows
register
6 bits
length
history
investigate
history
percent
ter
various
further
the
that
prediction
estimates,
history
~O,azoo
Branch
approximately
which
One
~+’+~......~%:x~-...-
length.
dif-
evaluat-
variation
with
schemes
/m
Adaptive
same
obtain
97 percent
the-required
6:
When
which
accuracy.
to show
to
to our
long
have
the second,
Adaptive
predict
three
prediction
18-bit history
ters, and PAp
O.moo
they
with
How-
accuracy.
each
achieve
PAg
expect.
know
schemes
schemes
expensive,
would
to
when
requirements
Figure
Vari-
compared.
of Two-Level
8 illustrates
97 percent
Y
as you
variations
it
Figure
for
schemes
Three
the
Adaptive
is the most
least,
three
expensive
the
were
the
Prediction,
PIG BHl(51a,4,aaL
ZifPHT(M,A2k)
for
length
register
and GAg
—
of
accuracy
history
ever,
ing
t Qooo
Efficiency
same
ferent
moved.
~of-bpu”-.s
Cost
at ions
10 ms
degradations
clock
in
is used
a 1 IPC
for
and
context
machine.
the three
schemes
The
are
all
less than
gccwhen
those
ber
1 percent.
PAgand
of the
other
of traps
traps
the
The
than
large
num-
of the
number
accuracy
global
of
register
accuracy
: O.eaoo
actually
There
increases
when context
are very few conditional
.fPPPP and all the
havior;
therefore,
helps
clear
out
-of
conditional
initializing
the
branches
the global
switches
branches
are
in
‘.+
‘“i
; ““i
:~,
.$........................
‘.[/
a
using GAg
simulated.
..
...!..,[.
,
; 0.0400
have regular
behistory
register
P*( lwT@f.,lzsrl,
PHT(212,KI,C)
..&i
c
of ~pppp
—
*’>
. . . . . . . ...
A
~ O.woa
can
.>
,,
r
----.
..Y
.. .LL..LZ.I.:I:
.:..
““””””””
‘
c--d-hwti-@-&dhp&
?.0000
of the GAg
history
prediction
for
greater
excessive
the prediction
an initialized
quickly.
degradations
aremuch
because
However,
degrade
because
be refilled
accuracy
are used
programs
in gee.
do not
scheme,
The
PAp
Owoo
PA#( 8HT(61a4,12s.~
PHT[2-lW.C)
.+r.-..
P~[ BHT[2f6,4,w14~
PHT[2”12,A2),C)
---0-.
P*(
c!m(m,l,la.~
?HT(2W,A21,S)
-.-. *..-.
P*(
BHT(2W,,,,ZS.L
PHT(2”12,A2),C)
. . . .. . . . . .. . . . . . . . . . . . .. . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
t
noise.
‘.
aati titi
1 0000
0.9600
Figure
~ 0 eon
c
c
tion
10:
Effect
on PAg
of branch
history
table
implementi>-
schemes.
: 8.=00
a
Since
c
y O,Moo
the
data
complete
sets, the data
0 Moo
tomcatv
Tot
eqntott WIOW
o
I nt
Q&ma&al
ZI,sp
gcc
P
rMm
&-ic
fpppp
available
SPIM t.mcaw
matm
300
than
erage
29s
9:
Effect
of context
switch
on prediction
target
racy.
graphed.
Note
buffer
using
is about
the
their
accuracy
the
prediction
2-bit
data
Effect
of
ment
Figure
Branch
10 illustrates
tivity
the
of the branch
text
switches.
mentations
ulated.
and
an ideal
table’s
programs
5.2
get buffer
prediction
branch
fit
history
history
in the
miss
close
because
table.
rate
imple-
were
512-entry
is very
table,
most
increases,
sim-
to that
of the
in the
accuracy
de-
is also
seen
which
is about
68.5
percent.
In this
and
figure,
PNdidh
Always
the
other
Taken
schemes
curacy
and
Other
Adaptive
schemes,
ations
The
to
be
branch
to
achieves
for
because
4-way
used
by
all
scheme
curve
whose
prediction
with
schemes
which
were
The
the
base
accuracy
is about
Adaptive
62.5
scheme
2.6 percent.
*O
c
OMOO
Y
vari-
0,7800
the
enough
Adaptive
scheme
is about
9m( BHT($I2,4,LT),)
—
Brn( WT(N 2,4,A2)J
——
F.ro,dt,g
-. -*-
BTm (m.w]
–.+-.
MPlh(am$)
first-level
on the
baFigure
accuracy
P~[ W(61~4,12$,~
PHT(2,12,M),)
is selected
it is simple
by the Two-Level
--0--
--#
o woo
Prediction.
chosen
q[ BH~l.ls@.
PHT(2”1B,F9),)
—
well-known
BHT
P*( WT[slmlhrk
PHT(2-12M}
. ...4...-
a
ac-
the three
keep
Two-Level
scheme
is achieved
prediction
Branch
because
The
other
among
512-entry
information,
and the Static
Training
sis of similar
costs.
top
97 percent
Adaptive
be implemented.
The
schemes.
the least
set-associative
history
are below
c
schemes
prediction
comparison
it costs
of Two-Level
89 percent
accuracy
by at least
A
~ 09200
Branch
Prediction
the branch
which
is chosen
counters
—
of Two-Level
11 compares
scheme
trainbranch
scheme achieves
The branch
tar-
Taken’s
; 06900
PAg
the
0 Moo
Comparison
Figure
for
greatly
for
prediction
Two-Level
av-
depends
up-down
average
are
with
and 89 percent
achieves
about
of the prediction
Always
BTFN’s
percent
to the
Cb#mnd-
and
that
10000
schemes.
Prediction
of BTFN
(76 percent).
lower
accuracy
saturating
using
Last- Time
accuracy.
Most
is superior
branch
branches
Prediction
line
of con-
table
table
curves
associa-
in the presence
branch
performance
as table
PAp
Imple-
size and
set-associative
history
can
creases
in the
Table
of the
table
practical
four-way
branch
effects
history
Four
The
history
ideal
History
at ion
1 to 4 percent
sets used
[17] is around
93 percent.
The Profiling
about
91 percent
prediction
accuracy.
5.1.5
data
, and
benchmarks
of 94.4 percent
that
are nc)t
matrix300
4 to 19 percent
between
The
schemes
of appropriate
fpppp,
for
accuracy
testing.
Training
PSg
curve
and GSg is about
similarities
ing and
accu-
Static
unavailability
for eqntott,
top
prediction
on the
Figure
points
the
individually.
Benchmark
the
to the
are not
lower
07600
for
due
Adaptive
97 percent.
133
11: Comparison
of branch
prediction
schemes.
6
Concluding
In this
paper
we have
proposed
predictor
(Two-Level
Adaptive
achieves
substantially
higher
scheme
ware
that
we are
costs
scheme
of
and
of
utilizes
a per-address
history
We
have
and
and
ulation
Adaptive
the
percent
measured
sensitivity
to
and
s, the
size
ble.
We
reported
effects
the
we should
cent
prediction
and
that
needed.
point
out
in
the
which
will
issue
of speculative
work
to a branch
prediction
miss
examining
that
hopefully
reduce
rate
ment
and
Motorola
at
the
provide,
and
will
in
and E. S. Davidson,
“Characterization
of
Branch and Data Dependencies
in Programs
for EvalTransactions
on
uating Pipeline
Performance”
, IEEE
Computers,
(July 1987), pp.859-876.
the
depth
further
have
[10] P. G. Emma
is still
the
[11] J. A. DeRosa
and
Branch Architectures
national
Symposium
1987), pp.10-16.
of
the
to be thrown
Thus,
the
3 perWe
are
it
and
to characterize
for
the
and in particular,
to NCR
work.
for
32,
We
technical
Corporation
No.
which
wish
members
for
was
to
for their
and
the
also
gift
HPS
re-
D.R. Ditzel
and H.R. McLellan,
“Branch
Folding
in
the CRISP Microprocessor:
Reducing
Branch Delay to
of the Ilth
International
Symposium
Zero”, Proceedings
Architecture,
(June 1987), pp.2-9.
on Computer
[13]
S. McFarling
and J. Hennessy,
“Reducing
the Cost of
Proceedings
of the 13th International
SYmBranches”,
posium
comments
grateful
to
[14]
sup-
of an NCR
useful
in
T-Y Yeh and Y.N. Patt, “Two-Level
Adaptive
Branch
Prediction”,
Technical
Report
CSE- TR-11 7-91,
Computer
Science
and Engineering
Division,
Department
of EECS1
The University
of Michigan,
(Nov.
1991).
posium
1991),
and
pp.
Workshop
on
Microarchitecture
Branch
(1986),
J. Lee and A. J. Smith,
“Branch
Prediction
IEEE
and Branch
Target
Buffer
Design”,
(January
1984), pp.6-22.
pp.396-403.
Strategies
Computer,
[16]
D.A. Patterson
and C.H. Sequin, “RISC-I:
A Reduced
Proceedings
of the
Instruction
Set VLSI
Computer”,
8th International
Symposium
on Computer
Architecture, (May. 1981), pp.443-458.
[17]
J.E. Smith,
“A
gies”, Proceedings
on
(Nov.
51-61.
[3] M. Butler,
T-Y Yeh, Y.N. Patt,
and M. Shebanow,
“Instruction
Architecture,
T.R.
Gross and J. Hennessy,
“Optimizing
Delayed
Proceedings
of the 15th Annual
Workshop
Branches”,
on Microprogramming,
(Oct. 1982), pp.114-120.
Sym,
Computer
[15]
References
[2] T-Y Yeh and Y.N. Patt, “Two-Level
Adaptive
Prediction”,
The 24th A CM/IEEE
International
on
our
work.
[1]
“An Evaluation
of
of the Iith
InterArchitecture,
(June
[12]
environ-
financial
very
H. M. Levy,
“, Proceedings
on Computer
acknowl-
of the
stimulating
are
“Reducing
the Branch Penalty in Pipelined
Computer,
(July 1988), pp.47-55.
“, IEEE
and Y.N.
Patt,
“Checkpoint
Repair
for
IEEE
Transactions
Execution
Machines”,
on Computers,
(December
1987), pp.1496-1514.
enough
improvement.
authors
other
on this
Model
good
Ari-
[9] W.W.
Hwu
Out-of-order
97 per-
engines
miss.
to try
Michigan
Corporation
and
not
our
to increase
that
The
gratitude
group
they
port,
ta-
it.
suggestions
Tower,
[8] D. J. Lilja,
Processors
various
prediction
rate
needs
3 percent
Acknowledgments
with
are
combine
amount
prediction
we feel
computing
out
search
the
Phoenix,
noted
history
of
branch
performance
increase
pipeline,
edge
paramregister,
pattern
that
figures
research
High
will
cent
94.4
We
history
effectiveness
accuracy
future
due
of the
in the
the
the
predictors.
length
entry
on
most
of varying
Adaptive
of each
97 percent,
at
algorithms
that use the pattern
history
table
We showed
the effects of cent ext swit th-
Finally,
the
k,
is about
achieve
Manual”,
[7] N.P. Jouppi and D. Wall, “Available
Instruction-Level
Parallelism
for Superscalar
and Superpipelined
MaProceedings
of the Third
International
Conchines.”,
ference
on Architectural
Support
for Programming
Languages
and operating
Systems,
(April
1989), pp. 272282.
Two-
accuracy.
the
the
sim-
for
User’s
Hwu, T. M. Conte, and P. P. Chang,
“Comparing
[6] W.W.
Software and Hardware
Schemes for Reducing
the Cost
Proceedings
of the 16th
International
of Branchesn,
Symposium
on Computer
Architecture,
(May 1989).
We have
accuracy
schemes
the
dynamic
benchmarks.
Prediction
prediction
Two-Level
future
[5] Motorola
Inc., “M881OO
zona, (March
13, 1989).
Pre-
trace-driven
prediction
known
of the
prediction
information.
ing.
SPEC
of
Branch
proposed
using
Inter(May.
34-42.
a global
accuracy
popular
ten
Branch
other
have
this
imple-
and
Adaptive
schemes
18th
Prediction
table
prediction
other
average
average
We
eters
the
of the
the
Level
effective
Branch
history
Two-Level
several
while
most
hardof
o.f the
Architecture,
[4] D. R. Kaeli and P. G. Emma,
“Branch
History
Table
Prediction
of Moving Target Branches Due to Subrouof the 18th
International
tine Returns”
, Proceedings
Symposium
on Computer
Architecture,
(May 1991), pp.
other
the
variations
Adaptive
prediction
that
that
any
computed
the
branch
of
of nine
shown
than
three
that
measured
static
We
branch
Prediction)
table.
variations
diction
dynamic
accuracy
of.
Two-Level
pattern
three
aware
a new
Branch
implementing
determined
mentation
Greater
Than
Two”,
Proceedings
national
Symposium
on Computer
1991), PP. 276-286.
Remarks
[18]
M. Alsup,
H. Scales,
Level Parallelism
is
134
Computer
Study
of Branch
Prediction
Strateof the 8th International
Symposium
Architecture,
(May. 1981), pp.135-148.
T. C. Chen, “Parallelism,
Pipelining
and Computer
EfComputer
Design,
Vol. 10, No. 1, (Jan. 1971),
ficiency”,
pp.69-74.
Download