Branch Prediction

advertisement
Lecture Objectives:
1)
2)
3)
Define branch prediction.
Draw a state machine for a 2 bit branch prediction scheme
Explain the impact on the compiler of branch delay.
Control Hazards
• Consider:
add
beq
$t1, $zero, $zero
# t1=0
$t1, $zero, Ifequal
Notequal:
addi
$v0, $zero, 4
Ifequal:
addi
$v0, $zero, 17
• Branch determines flow of control
– Fetching next instruction depends on branch
outcome
Chapter 4 — The Processor —
2
Stall on Branch
• Wait until branch outcome determined before
fetching next instruction
– Pipeline can’t determine next instruction until MEM
stage of beq
• Still working on ID stage of beq when IF should begin!
add $t1, $zero, $zero
beq $t1, $zero, Ifequal
addi $v0, $zero, 4 #Notequal
addi $v0, $zero, 17 #Ifequal
Chapter 4 — The Processor —
3
Next instr determined here
Deciding earlier helps a little…
• Extra hardware can be designed to test registers
and update the PC in the ID stage
– Then IF of next inst can be done one step earlier
• Still have a 1-cycle stall, however
add $t1, $zero, $zero
beq $t1, $zero, Ifequal
addi $v0, $zero, 4 #Notequal
addi $v0, $zero, 17 #Ifequal
Next instr determined here
with extra hardware
Chapter 4 — The Processor —
4
Performance penalty of stalling on
branch:
17% of instructions executed in the
SPECint2006 benchmark are branch
instructions
– If we always stalled for 1 clock cycle on a branch,
what performance penalty would we have?
CS2710 Computer Organization
5
Branch Prediction
– A method of resolving branch hazards that
assumes a given outcome for the branch and
proceeds from that assumption rather than
waiting to ascertain the actual outcome
CS2710 Computer Organization
6
1-bit Dynamic Branch Prediction
– One possibility is to have each branch instruction reserve a
bit that retains the “history” of the last decision
• 0: branch not taken
• 1: branch taken
– To execute a branch
• Check history bit, expect the same outcome
• Start fetching from fall-through (next instruction) or branch target
• If wrong, flush pipeline and flip prediction bit
add $t1, $zero, $zero
Next actual
instr determined here
beq $t1, $zero, Ifequal
addi $v0, $zero, 4 #Notequal
CS2710 Computer Organization
7
Problems with 1-bit Dynamic Branch
Prediction
– Consider a loop that branches 9 times in a row, then is not
taken once (end of loop condition is met)
• Branch taken 9 times, not taken 1 time
– At steady state
• The first branch decision will be incorrect (from previous
execution)
• The final branch decision will be incorrect
• Thus, the prediction accuracy would only be 80%
CS2710 Computer Organization
8
2-Bit Predictor
• Only change prediction on two successive
mispredictions
Chapter 4 — The Processor —
9
Loops and Static Branch Prediction
• Consider the following loop of code
– Which branch might we reliably predict?
.text
main:
li $t0, 100
loop:
addi $t0, $t0, -1
add $t0, $t0, $zero
bnez $t0, loop
#other instructions follow here…
CS2710 Computer Organization
10
Example 2: Assembly while-loop
.text
main:
li $t0, 10
Which branch is more probable?
loop: beqz $t0, exitLoop
addi $t0, $t0, -1
add $t0, $t0, $zero
j loop
exitLoop:
# Goto main
j main
CS2710 Computer Organization
11
Static prediction based on code
analysis (done by compiler)
• Assume all branches to a previous address are
always taken
• Assume all branches to a subsequent address
are not taken
CS2710 Computer Organization
12
Dynamic Versus static branch
prediction
• Static branch prediction
– Based on typical branch behavior
– Example: loop and if-statement branches
• Predict backward branches taken
• Predict forward branches not taken
• Dynamic branch prediction
– Hardware measures actual branch behavior
• e.g., record recent history of each branch
– Assume future behavior will continue the trend
• When wrong, stall while re-fetching, and update history
CS2710 Computer Organization
13
Survey
• The branch prediction methods we just
discussed were examples of
A. Static Branch Prediction
B. Dynamic Branch Prediction
C. I haven’t a clue
CS2710 Computer Organization
14
MIPS approach: delayed branching
• Always assume “branch not taken”.
– This means the instruction immediately following the
branch instruction will always begin to execute
– The actual decision to branch will/will not be taken until after that
instruction begins to execute!
– Leaves it to the compiler to insert a “useful” instruction right after the
branch that would have needed to execute whether or not the branch
was taken
add $t1, $zero, $zero
beq $t1, $zero, Ifequal
#next inst after beq!!
Chapter 4 — The Processor —
15
Next actual instr determined here
with extra hardware
Delayed branching example
before
after
#previous instructions
add $s1, $s2, $s3
beq $s2, $zero, Ifequal
# no-branch instructions
Ifequal:
# branch instructions
#previous instructions
beq $s2, $zero, Ifequal
add $s1, $s2, $s3
# no-branch instructions
+
Ifequal:
# branch instructions
•MIPS always assumes “branch not taken”, so the pipeline will automatically
begin executing the next instruction following the beq.
•The actual branch will be delayed until AFTER the next instruction executes
•The compiler must help out by inserting a “useful” instruction after the beq to
execute while the branch decision is being made by the processor
CS2710 Computer Organization
16
Delayed branching pitfall
before
not possible
#previous instructions
add $s2, $s1, $s3
beq $s2, $zero, Ifequal
# no-branch instructions
Ifequal:
# branch instructions
#previous instructions
beq $s2, $zero, Ifequal
add $s2, $s1, $s3
# no-branch instructions
+
Ifequal:
# branch instructions
•In this case, the beq instruction depends on $s2 being up-to-date before the
branching decision is made
•If the compiler moves the add instruction until after beq, then $s2 will be
updated too late – beq would use a “stale” value of $s2!!
•The compiler in this case would have to search for a different instruction that it
could insert after the beq
•If no such instruction can be found (which is rare), the pipeline
will stall
CS2710 Computer Organization
17
Download