ControlHazard

advertisement
Csci 136 Computer Architecture II
– Branch Hazards, Exceptions
Xiuzhen Cheng
cheng@gwu.edu
Announcement
Homework assignment #10, Due time – Before class,
April 12
Readings: Sections 6.4 – 6.5
Problems: 6.17-6.19, 6.21-6.22, 6.33-6.36, 6.39-6.40 (six of them
will be graded. Your TA will give hints in the lab sections.)
Project #3 is due on April 10, 2005
Quiz #4: April 12, 2005
Final: Thursday, May 12, 12:40AM-2:40PM
Note: you must pass final to pass this course!
Review on Data Hazards, Forwarding, Stall
When does a data hazard happen?
Data dependencies
Using forwarding to overcome data hazards
Data is available after ALU stage
Forwarding conditions
Stall the pipeline for load-use instructions
Data is available after MEM stage (lw instruction)
Hazard detection conditions
Why in ID stage?
Review on Data Hazards
Review on Data Hazards, Forwarding, Stall
PC+4
Sign-extend
LW and SW
Sign-Ext
lw $5, 0($15)
sw $5, 100($15)
lw $5, 0($15)
beq$5, $0, Exit
sw $5, 100($15)
lw $5, 0($15)
add $8, $8, $8
sw $5, 100($15)
SW is in MEM Stage
sw
lw
Sign-Ext
lw
sw
$5, 0($15)
$5, 100($15)
EX/MEM
MEM/WB.RegWrite and EX/MEM.MemWrite and
MEM/WB.RegisterRd = EX/MEM.RegisterRd and
MEM/WB.RegisterRD != 0
Data
memory
SW is In EX Stage
sw
Sign-Ext
ID/EX.MemWrite and MEM/WB.RegWrite and
MEM/WB.RegisterRd = ID/EX.RegisterRt and
MEM/WB.RegisterRd != 0
lw
More Cases
lw $15, 0($8)
sw $5, 100($15)
# load-use,
# stall pipeline
R-Type followed by sw?
The result from R-Type will be saved into memory
R-Type will overwrite base register for sw
An Example
40:
lw
$2, 20($1)
44:
48:
and
or
$4, $2, $5
$8, $2, $4
Clock Cycle 1:
Clock Cycle 2:
Clock Cycle 3:
Clock Cycle 4:
Clock 1
Lw $2, 20($1)
PC+4
44
Sign-extend
Clock 1
Clock 2
And $4, $2, $5
Lw $2, 20($1)
11
010
PC+4
0001
44
48
$1
Sign-extend
20
1
2
2
Clock 2
Clock 3
Or $8, $2, $4
PC+4
And $4, $2, $5
Lw $2, 20($1)
10
11
000
010
1100
44
52
$1
$2
$5
20
Sign-extend
2
5
5
4
Clock 3
1
2
2
Clock 4
Or $8, $2, $4
PC+4
And $4, $2, $5
Bubble
10
00
000
000
1100
44
52
$2
$5
Sign-extend
2
5
5
4
Clock 4
Lw $2, 20($1)
11
Clock 5
And $4, $2, $5
Or $8, $2, $4
10
10
000
000
Bubble
Lw $2, 20($1)
00
11
1100
44
PC+4
$2
$2
$4
$5
Sign-extend
2
4
4
8
Clock 5
2
5
5
4
4
2
Branch Hazards
Control hazard: attempt to make a decision before condition is evaluated
Branch Hazards
Decision is made here
flush
flush
flush
Observations
Branch decision does not occur until MEM stage; 3
CCs are wasted. – Current design, non-optimized
Is it possible to reduce branch delay?
YES
In EXE stage?
Two CCs branch delay
In ID Stage?
One CC branch delay
How? – for beq $x, $y, label, $x xor $y then or all bits, much faster
than ALU operation. Also we have a separate ALU to compute branch
address.
3 strategies
Delayed branch; Static branch prediction; Dynamic branch
Prediction
Delayed Branch
Will always execute the instruction following the
branch.
Only one will be executed
Done by compiler or assembler
50% successful rate
Losing popularity
Why?
More pipeline stages
Superscalar
Scheduling the Branch Delay Slot
Independent instruction, best choice
B is good when branch taking probability is high.
It must be OK to execute the sub instruction when
the branch goes to the unexpected direction
Static Branch Prediction
Assume the branch will not be taken; If prediction is
wrong, clear the effect of sequential instruction
execution.
How to discard instructions in the pipeline?
Branch decision is made at MEM stage: instructions in IF, ID, EX
stages need to be discarded.
Branch decision is made at ID stage: only flush IF/ID pipeline register!
Static Branch Prediction
Decision is made here
flush
flush
flush
Static Branch Prediction
IF.Flush
Pipelined Branch – An Example
44:
40:
36:
28
44
72
$4
$8
10
IF.Flush
Pipelined Branch – An Example
72:
Dynamic Branch Prediction
Static branch prediction is crude!
Take history into consideration
If a branch was taken last time, then fetching the new instruction
from the same place
Branch prediction buffer – indexed by the lower bits of the branch
instruction
This memory contains a bit (or bits) which tells whether the branch
was recently taken or not
Is the prediction correct? Any bad effect?
taken
1-bit prediction scheme
2-bit prediction scheme
Not taken
Prediction Taken
taken
taken
Prediction Taken
Not taken
taken
Prediction not Taken
Prediction not Taken
Not taken
Not taken
Observation
Since we move branch prediction to the ID stage, we
need to copy forwarding control related hardware to
the ID stage too!
Beq following lw
Hazard detection unit should work.
In-Class Exercise
Consider a loop branch that branches nine times in a
row, then is not taken once. What is the prediction
accuracy for this branch, assuming the prediction bit
for this branch remains in the prediction buffer?
1-bit prediction?
With 2-bit prediction?
taken
Not taken
Prediction Taken
taken
taken
Prediction Taken
Not taken
taken
Prediction not Taken
Prediction not Taken
Not taken
Not taken
Performance Comparision
Compare the performance of single-cycle, multi-cycle
and pipelined datapath
200ps for memory access, 100ps for ALU operation, 50ps for
register file access
25% loads, 10% stores, 11% branches, 2% jumps, 52% ALU ops
For piplelined datapath,
50% of load are immediately followed an instruction that uses the
result
Branch delay on misprediction is 1 clock cycle and 25% branches are
mispredicted
Jump delay is 1 clock cycle
Exceptions
Exceptions: events other than branch or jump that
change the normal flow of instruction
Arithmetic overflow, undefined instruction, etc
Internal of the processor
Interrupts from external – IO interrupts
Use arithmetic overflow as an example
When an overflow is detected, we need to transfer control to the
exception handling routine at location 0x 8000 0180
immediately because we do not want this invalid value to
contaminate other registers or memory locations
Similar idea as branch hazard
Detected in the EX stage
De-assert all control signals in EX and ID stages, flush IF/ID
Exceptions
80000180
Example
sub
and
or
add
slt
lw
$11, $2, $4
$12, $2, $5
$13, $2, $6
$1, $2, $1
$15, $6, $7
$16, 50($7)
-- overflow occurs
Exceptions handling routine:
0x 8000 0180
sw
$25, 1000($0)
0x 8000 0184
sw
$26, 1004($0)
Example
80000180
Clock 6
Example
80000180
Clock 7
Questions?
Download