Section 8

advertisement
Pipelining review
Pipelining review
instr1
instr2
Instructions
instr3
instr4
Clock
Cycle 1
Clock
Cycle 2
IM
Reg
IM
Clock
Cycle 3
Clock
Cycle 4
Clock
Cycle 5
Clock
Cycle 6
Clock
Cycle 7
DM
Reg
IM
DM
Reg
IM
DM
Reg
DM
Time
Pipeline Control – big picture
Control
Unit
ID/EX
EX/Mem
Mem/WB
IF/ID
IF
ID
EX
Mem
WB
Pipeline in action
ID
IF
EX
MEM
Ctrl
Instruction
memory
CLK
clk
PC
0x1234
Let’s execute a program
at address 0x1234:
lw $1, 4($3)
add $2, $3, $4
or $3, $1, $2
bne $1, $3, L
Register
file
ALU
SE
Data
Memory
Suppose: $1 = 1, $2 = 2, $3 = 1000, $4 = 6
WB
Pipeline in action 1
ID
IF
EX
MEM
lw $1, 4($3)
Instruction
memory
CLK
Register
file
ALU
SE
PC
Data
Memory
0x1234
Clock:
1
$1 = 1, $2 = 2, $3 = 1000, $4 = 4
WB
Pipeline in action 2
ID
IF
EX
MEM
add $2, $3, $4
Instruction
memory
CLK
Register
file
ALU
SE
PC
lw $1, 4($3)
0x1238
Data
Memory
Clock:
1
2
$1 = 1, $2 =2, $3 =1000, $4 = 6
WB
Pipeline in action 3
ID
IF
EX
MEM
or $3, $1, $2
Instruction
memory
CLK
Register
file
ALU
SE
PC
add $2, $3, $4
0x123C
lw $1, 4($3)
Data
Memory
Clock:
1
2
3
$1 = 1, $2 =2, $3 =1000, $4 = 6
WB
Pipeline in action 4
ID
IF
(anything wrong?)
EX
MEM
WB
bne $1, $3, L
Instruction
memory
CLK
Register
file
ALU
SE
PC
Data
Memory
0x1240
or $3, $1, $2
add $2, $3, $4
lw $1, 4($3)
Clock:
1
2
3
4
Let’s assume memory returns 42 for lw.
$1 = 1, $2 =2, $3 =1000, $4 = 6
Data Hazards!
ID
IF
EX
MEM
WB
bne $1, $3, L
Instruction
memory
CLK
Register
file
ALU
SE
PC
Data
Memory
0x1240
or $3, $1, $2
add $2, $3, $4
lw $1, 4($3)
Clock:
1
2
3
4
$1 = 1, $2 = 2, $3 = 1000, $4 = 6
Pipeline in action 5
ID
IF
EX
MEM
WB
sll $4, $3, 5
Instruction
memory
CLK
Register
file
ALU
SE
PC
Data
Memory
0x1244
bne $1, $3, L
Clock:
1
2
3
4
5
or $3, $1, $2
add $2, $3, $4
$1 and $2 both
wrong for or
because we got old values!
lw $1, 4($3)
$1 = 42, $2 = 2, $3 = 1000, $4 = 6
Data Hazards

Data dependence



result of an operation needed before it is stored back in reg. file:
lw $1, 4($3)
add $2, $3, $4
or $3, $1, $2
The data hazard above is read after write (RAW)
Data dependence (RAW) occurs when:


An instruction wants to read a register in stage 2, and
One instruction in stage 3 or stage 4 is going to write that
register


Note: if the instruction writing the register is in stage 5, this is fine
since we can write a register and read it in the same cycle
Hazard detection unit compares register fields across
stages.
Resolving data dependencies

Have the compiler generate no-ops


Stall the pipeline, i.e., create bubbles


Don’t have to deal with data hazards in hardware.
the resulting delays are the same as for no-ops
Send the result generated in stage 3 or stage 4
to the appropriate input of the ALU.


This is forwarding or bypassing.
More performance at the cost of more hardware


For one simple pipeline, cost is slightly more control and
extra buses
For several pipelines, say n, communication grows as O(n2)
Forwarding from WB to EX
ID
IF
EX
MEM
WB
sll $4, $3, 5
Instruction
memory
CLK
Register
file
ALU
SE
PC
Data
Memory
0x1244
bne $1, $3, L
Clock:
1
2
3
4
5
or $3, $1, $2
$1 fixed,
$2 still wrong
add $2, $3, $4
lw $1, 4($3)
$1 = 42, $2 = 2, $3 = 1000, $4 = 6
Forwarding from MEM to EX
ID
IF
EX
MEM
WB
sll $4, $3, 5
Instruction
memory
CLK
Register
file
ALU
SE
PC
Data
Memory
0x1244
bne $1, $3, L
Clock:
or $3, $1, $2
add $2, $3, $4
lw $1, 4($3)
1
2
3
4
5
$1 = 42, $2 = 2, $3 = 1000, $4 = 6
Generally, need to forward to both inputs!
ID
IF
EX
MEM
WB
sll $4, $3, 5
Instruction
memory
CLK
Register
file
ALU
SE
PC
Data
Memory
0x1244
beq $1, $3, L
Clock:
or $3, $1, $2
What if we had:
or $3, $1, $1
1
2
3
4
5
add $2, $3, $4
lw $1, 4($3)
Need to forward $1 to both ALU inputs
$1 = 42, $2 = 2, $3 = 1000, $4 = 6
Forwarding a register twice?

What if had:




add $1, $2, $2
or $1, $3, $4
and $5, $1, $1
Where do you forward $1 from?
Another look at forwarding…
Clock
Cycle 1
Clock
Cycle 2
IM
Reg
lw $1, 4($3)
Clock
Cycle 3
Clock
Cycle 4
Clock
Cycle 5
Clock
Cycle 6
Clock
Cycle 7
DM
Forward $1
add $2, $3, $4
IM
Reg
DM
Forward $2
or $3, $1, $2
IM
Reg
DM
Instructions
fwd $3
bne $1, $3, L
IM
Reg
DM
Time
Data hazards and loads
Clock
Cycle 1
lw $1, 4($3)
IM
or $3, $1, $2
Clock
Cycle 2
Clock
Cycle 3
Reg
IM
Clock
Cycle 4
Clock
Cycle 5
DM
Reg
Clock
Cycle 6
Clock
Cycle 7
Can’t do forwarding
in this case: can’t
go back in time.
?
Have to insert a
bubble between lw
and or
DM
What if we switch these?
Instructions
add $2, $3, $4
bne $1, $3, L
IM
Reg
IM
DM
Reg
Time
DM
Inserting a bubble
• Have to do it when you load into a register which is used in the next instruction
Clock
Cycle 1
Clock
Cycle 2
IM
Reg
lw $1, 4($3)
Clock
Cycle 3
Clock
Cycle 4
Clock
Cycle 5
Clock
Cycle 6
Clock
Cycle 7
Time
DM
Now we can forward $1
or $3, $1, $2
IM
Reg
DM
do nothing
Instructions
add $2, $3, $4
bne $1, $3, L
Insert
bubble
here
IM
Reg
IM
DM
Reg
DM
…
Other hazards

We’ve seen data hazards


when an instruction in the pipeline is dependent on another
instruction still in the pipeline.
Resolved by:



Bypassing/forwarding
Stalling
Other types of hazards:

Structural hazards



where two instructions at different stages want to use the same
resource.
Solved by using more resources (e.g., instruction and data memory;
several ALU’s). Won’t happen in our pipeline.
Control hazards


happens on a taken branch.
Evaluation of branch condition and calculation of branch target is
not completed before next instruction is fetched.
Branch path
ID
IF
EX
MEM
WB
instr at L
Instruction
memory
CLK
Register
file
ALU
SE
sll $4, $3, 5
PC
L:
Data
Memory
This is wrong!
(Should be instr at L)
result of comparison
Determine new PC
Clock:
=?
bne $1, $3, L
1
2
3
4
5
or $3, $1, $2
add $2, $3, $4
6
$1 = 42, $2 = 1006, $3 = 1000, $4 = 6
Resolving control hazards

Stall until result of the condition & target are known


too slow
Reduce penalty by redesigning the pipeline:


move branch calculation to ID stage
move branch comparison to ID stage



use both edges of the clock
Delay slots


how to do quick compare?
specify in ISA that instruction after branch is always executed.
Branch prediction


in IF stage, guess branch outcome
correct it if turns out to be wrong.
Lw & sw
ID
IF
EX
MEM
Ctrl
Instruction
memory
CLK
clk
PC
Consider a memory copy
program…
loop:
lw $1, 0($3)
sw $1, 0($4)
addi $3, $3, 4
addi $4, $4, 4
bne $4, $5, Loop
Register
file
ALU
SE
Data
Memory
Too many stalls!
How do we fix this?
WB
Download