ppt

advertisement
inst.eecs.berkeley.edu/~cs61c/su05
CS61C : Machine Structures
Lecture #19: Pipelining II
2005-07-21
Andy Carle
CS 61C L19 Pipelining II (1)
A Carle, Summer 2005 © UCB
+4
1. Instruction
Fetch
ALU
Data
memory
rd
rs
rt
registers
PC
instruction
memory
Review: Datapath for MIPS
imm
5. Write
2. Decode/
3. Execute 4. Memory
Back
Register Read
• Use datapath figure to represent pipeline
IFtch Dcd Exec Mem WB
CS 61C L19 Pipelining II (2)
Reg
ALU
I$
D$
Reg
A Carle, Summer 2005 © UCB
Review: Problems for Computers
• Limits to pipelining: Hazards prevent
next instruction from executing during
its designated clock cycle
• Structural hazards: HW cannot support
this combination of instructions (single
person to fold and put clothes away)
• Control hazards: Pipelining of branches &
other instructions stall the pipeline until
the hazard; “bubbles” in the pipeline
• Data hazards: Instruction depends on
result of prior instruction still in the
pipeline (missing sock)
CS 61C L19 Pipelining II (3)
A Carle, Summer 2005 © UCB
Review: C.f. Branch Delay vs. Load Delay
• Load Delay occurs only if necessary
(dependent instructions).
• Branch Delay always happens (part of
the ISA).
• Why not have Branch Delay
interlocked?
• Answer: Interlocks only work if you can
detect hazard ahead of time. By the time
we detect a branch, we already need its
value … hence no interlock is possible!
CS 61C L19 Pipelining II (4)
A Carle, Summer 2005 © UCB
FYI: Historical Trivia
• First MIPS design did not interlock and
stall on load-use data hazard
• Real reason for name behind MIPS:
Microprocessor without
Interlocked
Pipeline
Stages
• Word Play on acronym for
Millions of Instructions Per Second,
also called MIPS
• Load/Use  Wrong Answer!
CS 61C L19 Pipelining II (5)
A Carle, Summer 2005 © UCB
Outline
• Pipeline Control
• Forwarding Control
• Hazard Control
CS 61C L19 Pipelining II (6)
A Carle, Summer 2005 © UCB
Piped Proc So Far …
CS 61C L19 Pipelining II (7)
A Carle, Summer 2005 © UCB
New Representation: Regs more explicit
S
S
B
D
Reg.
File
ME/WB
Data
Mem
A
EX/ME
Exec
DE/EX
Reg
File
IR
Inst. Mem
PC
Next PC
IF/DE
M
IF/DE.Ir = Instruction
DE/EX.A = BusA out of Reg
EX/ME.S = AluOut
EX/ME.D = Bus B pass-through for sw
ME/WB.S = ALuOut pass-through
ME/WB.M = Mem Result from lw
CS 61C L19 Pipelining II (8)
A Carle, Summer 2005 © UCB
New Representation: Regs more explicit
S
S
B
D
Reg.
File
ME/WB
Data
Mem
A
EX/ME
Exec
DE/EX
Reg
File
IR
Inst. Mem
PC
Next PC
IF/DE
M
 What’s Missing???
CS 61C L19 Pipelining II (9)
A Carle, Summer 2005 © UCB
Pipelined Processor (almost) for slides
Idea: Parallel Piped Control …
M
Data
Mem
D
Mem
Access
S
B
Equal
WB Ctrl
Reg.
File
IRwb
Mem Ctrl
IRmem
Ex Ctrl
IRex
A
Exec
Dcd Ctrl
IR
Reg
File
PC
Next PC
Inst. Mem
Valid
CS 61C L19 Pipelining II (10)
A Carle, Summer 2005 © UCB
Pipelined Control
IR <- Mem[PC]; PC <– PC+4;
A <- R[rs]; B<– R[rt]
S <– A + SX;
M <– Mem[S]
Mem[S] <- B
If Cond
PC < PC+SX;
S
M
D
CS 61C L19 Pipelining II (11)
Mem
Access
B
Data
Mem
A
Reg.
File
Equal
R[rd] <– M;
IR
Inst. Mem
PC
R[rt] <– S;
Next PC
R[rd] <– S;
S <– A + SX;
Exec
S <– A or ZX;
Reg
File
S <– A + B;
A Carle, Summer 2005 © UCB
Data Stationary Control
• The Main Control generates the control signals during Reg/Dec
• Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later
• Control signals for Mem (MemWr Branch) are used 2 cycles later
• Control signals for Wr (MemtoReg MemWr) are used 3 cycles later
Reg/Dec
ALUSrc
ALUSrc
ALUOp
ALUOp
RegDst
MemWr
Branch
MemtoReg
RegWr
CS 61C L19 Pipelining II (12)
RegDst
MemWr
Branch
MemtoReg
RegWr
MemWr
Branch
MemtoReg
RegWr
Wr
Mem/Wr Register
ExtOp
Mem
Ex/Mem Register
ExtOp
ID/Ex Register
IF/ID Register
Main
Control
Exec
MemtoReg
RegWr
A Carle, Summer 2005 © UCB
Let’s Try it Out
10
lw
14
addI r2, r2, 3
20
sub
r3, r4, r5
24
beq
r6, r7, 100
28
ori
r8, r9, 17
32
add
r10, r11, r12
100
and
r13, r14, 15
CS 61C L19 Pipelining II (13)
r1, 36(r2)
A Carle, Summer 2005 © UCB
Start: Fetch 10
n
WB
Ctrl
rs rt
im
A
Exec
Mem
Ctrl
Reg
File
IR
n
S
M
Reg.
File
n
Decode
Inst. Mem
n
=
10
PC
Next PC
D
Mem
Acces
s
Data
Mem
B
IF 10
lw
r1, 36(r2)
14
addI r2, r2, 3
20
sub
r3, r4, r5
24
beq
r6, r7, 100
30
ori
r8, r9, 17
34
add
r10, r11, r12
100 and
CS 61C L19 Pipelining II (14)
r13, r14, 15
A Carle, Summer 2005 © UCB
n
WB
Ctrl
im
A
S
M
Reg.
File
rt
Exec
2
n
Mem
Ctrl
Reg
File
IR
n
Decode
lw r1, 36(r2)
Inst. Mem
Fetch 14, Decode 10
=
14
PC
Next PC
D
Mem
Acces
s
Data
Mem
B
ID 10
lw
r1, 36(r2)
IF 14
addI r2, r2, 3
20
sub
r3, r4, r5
24
beq
r6, r7, 100
30
ori
r8, r9, 17
34
add
r10, r11, r12
100 and
CS 61C L19 Pipelining II (15)
r13, r14, 15
A Carle, Summer 2005 © UCB
n
WB
Ctrl
S
M
Reg.
File
r2
2
n
Mem
Ctrl
Exec
36
lw r1
rt
Reg
File
IR
Decode
addI r2, r2, 3
Inst. Mem
Fetch 20, Decode 14, Exec 10
=
20
PC
Next PC
D
Mem
Acces
s
Data
Mem
B
EX 10
lw
r1, 36(r2)
ID 14
addI r2, r2, 3
IF 20
sub
r3, r4, r5
24
beq
r6, r7, 100
30
ori
r8, r9, 17
34
add
r10, r11, r12
100 and
CS 61C L19 Pipelining II (16)
r13, r14, 15
A Carle, Summer 2005 © UCB
D
Reg.
File
M
Mem
Acces
s
Data
Mem
lw r1
WB
Ctrl
Mem
Ctrl
r2+36
Exec
3
r2
4
n
addI r2, r2, 3
5
Reg
File
IR
Decode
sub r3, r4, r5
Inst. Mem
Fetch 24, Decode 20, Exec 14, Mem 10
B
24
PC
Next PC
=
M 10
lw
EX 14
addI r2, r2, 3
ID 20
sub
r3, r4, r5
24
beq
r6, r7, 100
30
ori
r8, r9, 17
34
add
r10, r11, r12
IF
r1, 36(r2)
100 and
CS 61C L19 Pipelining II (17)
r13, r14, 15
A Carle, Summer 2005 © UCB
Reg.
File
M[r2+36]
D
Mem
Acces
s
Data
Mem
WB
Ctrl
r2+3
r4
lw r1
addI r2
sub r3
Mem
Ctrl
7
Exec
6
Reg
File
IR
Decode
beq r6, r7 100
Inst. Mem
Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10
r5
30
PC
Next PC
=
Note Delayed Branch: always execute ori after beq
CS 61C L19 Pipelining II (18)
WB 10
M 14
lw
r1, 36(r2)
addI r2, r2, 3
EX 20
ID 24
sub
r3, r4, r5
beq
r6, r7, 100
IF 30
ori
r8, r9, 17
add
r10, r11, r12
34
100 and
r13, r14, 15
A Carle, Summer 2005 © UCB
r1=M[r2+35]
WB
Ctrl
Reg.
File
addI r2
Mem
Ctrl
r2+3
sub r3
r4-r5
r6
9
Exec
100
beq
xx
Reg
File
IR
Decode
ori r8, r9 17
Inst. Mem
Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14
=
100
PC
Next PC
D
Mem
Acces
s
Data
Mem
r7
10
WB 14
M 20
lw
r1, 36(r2)
addI r2, r2, 3
sub
r3, r4, r5
EX 24
ID 30
beq
r6, r7, 100
ori
r8, r9, 17
34
add
r10, r11, r12
IF 100 and r13, r14, 15
CS 61C L19 Pipelining II (19)
A Carle, Summer 2005 © UCB
????
• Remember:
D
B
Reg.
File
M
Mem
Acces
s
Data
Mem
PC
Next PC
Equal
IRmem
WB Ctrl
Ex Ctrl
Exec
S
IRwb
IRex
A
Mem Ctrl
Dcd Ctrl
Reg
File
IR
Inst. Mem
Valid
means triggered on edge.
• What is wrong here?
CS 61C L19 Pipelining II (20)
A Carle, Summer 2005 © UCB
Double-Clocked Signals
D
B
Reg.
File
M
Mem
Acces
s
Data
Mem
PC
Next PC
Equal
IRmem
WB Ctrl
Ex Ctrl
Exec
S
IRwb
IRex
A
Mem Ctrl
Dcd Ctrl
Reg
File
IR
Inst. Mem
Valid
• Some signals are double clocked!
• In general: Inputs to edge components are their
own pipeline regs
• Watch out for stalls and such!
CS 61C L19 Pipelining II (21)
A Carle, Summer 2005 © UCB
Administrivia
• Proj 2 – Due Sunday
• HW6 – Due Tuesday
• Midterm 2:
• Friday, July 29: 11:00 – 2:00
• Location TBD
• If you are really so concerned about the
drop deadline that this is a problem for
you, talk to me about the possibility of
taking the exam on Thursday
CS 61C L19 Pipelining II (22)
A Carle, Summer 2005 © UCB
Megahertz Myth?
CS 61C L19 Pipelining II (23)
A Carle, Summer 2005 © UCB
Outline
• Pipeline Control
• Forwarding Control
• Hazard Control
CS 61C L19 Pipelining II (24)
A Carle, Summer 2005 © UCB
Review: Forwarding
Fix by Forwarding result as soon as we have it
to where we need it:
Reg
Reg
D$
Reg
I$
Reg
D$
Reg
I$
Reg
D$
Reg
I$
Reg
ALU
xor $t9,$t0,$t10
I$
D$
ALU
or $t7,$t0,$t8 *
Reg
WB
ALU
and $t5,$t0,$t6
I$
EX MEM
ALU
sub $t4,$t0,$t3
ID/RF
ALU
add $t0,$t1,$t2
IF
D$
Reg
* “or” hazard solved by register hardware
CS 61C L19 Pipelining II (25)
A Carle, Summer 2005 © UCB
Forwarding
In general:
• For each stage i that has reg inputs
• For each stage j after I that has reg output
- If i.reg == j.reg  forward j value back to i.
- Some exceptions ($0, invalid)
In particular:
• ALUinput
 (ALUResult, MemResult)
• MemInput  (MemResult)
CS 61C L19 Pipelining II (26)
A Carle, Summer 2005 © UCB
Pending Writes In Pipeline Registers
IAU
npc
I mem
Regs
B
op rw rs rt
A
im
PC
n
alu
S
n
D mem
m
n
Regs
CS 61C L19 Pipelining II (27)
A Carle, Summer 2005 © UCB
Pending Writes In Pipeline Registers
IAU
npc
• Current operand
registers
I mem
• Pending writes
Regs
B
op rw rs rt
A
im
n op rw
alu
S
n op rw
• hazard <=
((rs == rwex) & regWex) OR
((rs == rwmem) & regWme) OR
((rs == rwwb) & regWwb) OR
((rt == rwex) & regWex) OR
((rt == rwmem) & regWme) OR
((rt == rwwb) & regWwb)
D mem
m
PC
n op rw
Regs
CS 61C L19 Pipelining II (28)
A Carle, Summer 2005 © UCB
Forwarding Muxes
IAU
npc
I mem
Regs
op rw rs rt
Forward
mux
B
A
im
n op rw
alu
S
n op rw
D mem
m
n op rw
PC
• Detect nearest
valid write op
operand
register and
forward into op
latches,
bypassing
remainder of
the pipe
• Increase muxes to
add paths from
pipeline registers
• Data Forwarding
= Data Bypassing
Regs
CS 61C L19 Pipelining II (29)
A Carle, Summer 2005 © UCB
What about memory operations?
Tricky situation:
MIPS:
lw 0($t0)
sw 0($t1)
RTL:
R1 <- Mem[ R2 + I ];
Mem[R3+34] <- R1
op Rd Ra Rb
op Rd Ra Rb
Rd
A
D
B
R
Mem
Rd
T
to reg
file
CS 61C L19 Pipelining II (30)
A Carle, Summer 2005 © UCB
What about memory operations?
Tricky situation:
MIPS:
lw 0($t0)
sw 0($t1)
RTL:
R1 <- Mem[ R2 + I ];
Mem[R3+34] <- R1
op Rd Ra Rb
op Rd Ra Rb
Rd
A
D
B
R
Mem
Solution:
Handle with bypass in
memory stage!
CS 61C L19 Pipelining II (31)
Rd
T
to reg
file
A Carle, Summer 2005 © UCB
Outline
• Pipeline Control
• Forwarding Control
• Hazard Control
CS 61C L19 Pipelining II (32)
A Carle, Summer 2005 © UCB
Data Hazard: Loads (1/4)
• Forwarding works if value is available (but not
written back) before it is needed. But consider …
I$
Reg
I$
sub $t3,$t0,$t2
EX
MEM
WB
D$
Reg
Reg
ALU
ID/RF
ALU
lw $t0,0($t1)
IF
D$
Reg
•Need result before it is calculated!
•Must stall use (sub) 1 cycle and then
forward. …
CS 61C L19 Pipelining II (33)
A Carle, Summer 2005 © UCB
Data Hazard: Loads (2/4)
• Hardware must stall pipeline
• Called “interlock”
CS 61C L19 Pipelining II (34)
I$
WB
D$
Reg
Reg
bub
ble
I$
D$
bub
ble
Reg
D$
bub
ble
I$
Reg
ALU
or $t7,$t0,$t6
Reg
MEM
ALU
and $t5,$t0,$t4
I$
EX
ALU
sub $t3,$t0,$t2
ID/RF
ALU
lw $t0, 0($t1)
IF
Reg
Reg
D$
A Carle, Summer 2005 © UCB
Data Hazard: Loads (3/4)
• Instruction slot after a load is called
“load delay slot”
• If that instruction uses the result of the
load, then the hardware interlock will
stall it for one cycle.
• If the compiler puts an unrelated
instruction in that slot, then no stall
• Letting the hardware stall the instruction
in the delay slot is equivalent to putting
a nop in the slot (except the latter uses
more code space)
CS 61C L19 Pipelining II (35)
A Carle, Summer 2005 © UCB
Data Hazard: Loads (4/4)
• Stall is equivalent to nop
or $t7,$t0,$t6
CS 61C L19 Pipelining II (36)
bub bub
ble ble
bub
ble
bub
ble
bub
ble
I$
Reg
D$
Reg
I$
Reg
D$
Reg
I$
Reg
ALU
and $t5,$t0,$t4
Reg
ALU
sub $t3,$t0,$t2
D$
ALU
nop
I$
ALU
lw $t0, 0($t1)
D$
Reg
A Carle, Summer 2005 © UCB
Hazards / Stalling
In general:
• For each stage i that has reg inputs
• If I’s reg is being written later on in the pipe
but is not ready yet
- Stages 0 to i: Stall (Turn CEs off so no change)
- Stage i+1: Make a bubble (do nothing)
- Stages i+2 onward: As usual
In particular:
• ALUinput  (MemResult)
CS 61C L19 Pipelining II (37)
A Carle, Summer 2005 © UCB
Hazards / Stalling
Alternative Approach:
• Detect non-forwarding hazards in decode
• Possible since our hazards are formal.
- Not always the case.
• Stalling then becomes:
- Issue nop to EX stage
- Turn off nextPC update (refetch same inst)
- Turn off InstReg update (re-decode same inst)
CS 61C L19 Pipelining II (38)
A Carle, Summer 2005 © UCB
Stall Logic
• 1. Detect nonresolving
hazards.
IAU
npc
I mem
Regs
op rw rs rt
Forward
mux
B
A
im
n op rw
PC
• 2a. Insert Bubble
• 2b. Stall nextPC,
IF/DE
alu
S
n op rw
D mem
m
n op rw
Regs
CS 61C L19 Pipelining II (39)
A Carle, Summer 2005 © UCB
Stall Logic
• Stall-on-issue is used quite a bit
• More complex processors: many cases
that stall on issue.
• More complex processors: cases that
can’t be detected at decode
- E.g. value needed from mem is not in cache
– proc must stall multiple cycles
CS 61C L19 Pipelining II (40)
A Carle, Summer 2005 © UCB
By the way …
• Notice that our forwarding and stall
logic is stateless!
• Big Idea: Keep it simple!
• Option 1: Store old fetched inst in reg
(“stall_temp”), keep state reg that says
whether to use stall_temp or value
coming off inst mem.
• Option 2: Re-fetch old value by turning
off PC update.
CS 61C L19 Pipelining II (41)
A Carle, Summer 2005 © UCB
Download