CS 152 Computer Architecture and Engineering Lecture 4 Pipelining

advertisement
CS 152
Computer Architecture and Engineering
Lecture 4 – Pipelining
2014-1-30
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
Play:
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Motorola 68000
Next week
we will return
to the
microcode
story ...
Today is the
anti-microcode
story - pipelining !
RISC CPU
Caches
Data
Path
and
Control
Today: Pipelining
Pipelining: an idea from assembly
line production applied to CPU design
Why pipelining is hard: data hazards,
control hazards, structural hazards.
Visualizing pipelines to evaluate
hazard detection and resolution.
Short Break.
A tool kit for hazard resolution.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Starting Point: Performance Equation
Seconds
Program
Goal is to
optimize
execution
time, not
individual
equation
terms.
CS 152: L4 Pipelining
=
Instructions
Program
Machines
are
optimized
with
respect to
program
workloads.
Cycles
Instruction
The CPI of
the
program.
Reflects
the
program’s
instruction
mix.
Seconds
Cycle
Clock
period.
Optimize
jointly
with
machine
CPI.
UC Regents Spring 2014 © UCB
Pipelining
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Recall: Our single-cycle processor
Challenge: Speed up clock while keeping CPI == 1
Seconds
Program
CS 152: L4 Pipelining
=
Instructions
Program
Cycles
Instruction
Seconds
Cycle
CPI == 1
This is good.
Slow.
This is bad.
UC Regents Spring 2014 © UCB
Recall: An R-format CPU design
Decode fields to get : ADD $8 $9 $10
opcode
rs
rt
rd
shamt funct
Logic
op
5
5
5
32
ws
32
wd
32
CS 152: L4 Pipelining
32
RegFile
rs1
rd1
rs2
rd2
32
A
L
U
32
WE
UC Regents Spring 2014 © UCB
Reminder: How data flows after posedge
Instr
Mem
PC
D
+
Q
Addr
Data
0x4
Logic
op
5
5
5
32
ws
32
wd
32
CS 152: L4 Pipelining
32
RegFile
rs1
rd1
rs2
rd2
32
A
L
U
32
WE
UC Regents Spring 2014 © UCB
Next posedge: Update state and repeat
PC
5
5
5
D
Q
RegFile
rs1
rd1
rs2
32
ws
32
wd
32
CS 152: L4 Pipelining
rd2
WE
UC Regents Spring 2014 © UCB
Observation: Logic idle most of cycle
For most of cycle, ALU is either “waiting” for
its inputs, or “holding” its output
Ideal: a CPU architecture where each part is always
“working”.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Inspiration: Automobile assembly line
Assembly line moves on a steady clock.
Each station does the same task on each car.
The clock
Car
body
shell
Merge
station
Bolting
station
CS 152: L4 Pipelining
Car
chassis
UC Regents Spring 2014 © UCB
Inspiration: Automobile assembly line
Simpler station tasks → more cars per hour.
Simple tasks take less time, clock is faster.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Inspiration: Automobile assembly line
Line speed limited by slowest task.
Most efficient if all tasks take same time to do
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Inspiration: Automobile assembly line
Simpler tasks, complex car → long line!
These lines go 24 x 7,
and rarely shut down.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Lessons from car assembly lines
Faster line movement yields more cars
per hour off the line.
Faster line movement requires more
stages, each doing simpler tasks.
To maximize efficiency, all stages
should take same amount of time
(if not, workers in fast stages are idle)
“Filling”, “flushing”, and “stalling”
assembly line are all bad news.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Key analogy: The instruction is the car
Pipeline Stage #1
Stage #2 Stage #3 Stage #4 Stage #5
Instruction Fetch
IR
IR
Controls
hardware
in
stage 2
IR
Controls
hardware
in
stage 3
IR
Controls
hardware
in
stage 4
Controls
hardware
in
stage 5
“Data-stationary control”
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Example: Decode & Register Fetch stage
Pipeline Stage #1
Stage #3
Stage #2
Instr Fetch
Decode & Reg Fetch
ADD R4,R3,R2
OR R7,R6,R5
SUB R10,
IR
R9,R8 IR
IR
A sample program
A
ADD R4,R3,R2
OR R7,R6,R5
SUB R10,R9,R8
M
B
CS 152: L4 Pipelining
R’s chosen so that
instructions are
independent - like
cars on the line.
UC Regents Spring 2014 © UCB
Performance Equation and Pipelining
Seconds
Program
=
Instr Fetch
Instructions
Program
IR
Cycles
Instruction
Decode & Reg Fetch
Stage #3
IR
CPI == 1
Once pipe is fill,
one instruction
completes per
A
cycle
M
B
CS 152: L4 Pipelining
Seconds
Cycle
IR
Clock period is
shorter
Less work to do
in each cycle
To get shortest
clock period,
balance the work
to do in each
pipeline stage.
UC Regents Spring 2014 © UCB
Hazards: An instruction is not a car ...
Stage #3
Stage #1
Stage #2
Instr Fetch
Decode & Reg Fetch
IR
ADD R4,R3,R2
OR R5,R4,R2
IR
... wrong value of
R4 fetched from
RegFile, contract
with programmer
A
broken! Oops!
M
B
CS 152: L4 Pipelining
IR
R4 not written yet ...
New sample program
ADD R4,R3,R2
OR R5,R4,R2
An example of a
“hazard” -- we must
(1) detect and
(2) resolve all hazards
to make a CPU that
matches ISA
UC Regents Spring 2014 © UCB
Performance Equation and Hazards
Seconds
Program
=
Instr Fetch
Instructions
Program
IR
Cycles
Instruction
Decode & Reg Fetch
Stage #3
IR
Some ways to
cope with hazards
makes CPI > 1
“stalling pipeline”
A
M
B
CS 152: L4 Pipelining
Seconds
Cycle
IR
Added logic to
detect and resolve
hazards increases
clock period
“Software slows
the machine
down”
Seymour Cray
UC Regents Spring 2014 © UCB
A (simplified) 5-stage pipelined CPU
1
“IF” Stage
Instr Fetch
2
3
5
4
“ID/RF” Stage “EX” Stage “MEM” Stage WB
Memory Write
Decode & Reg Fetch Execution
Back
IR
IR
IR
IR
WE, MemToReg
Mux,Logic
A
Y
R
M
M
B
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Sometimes, “contract” is a challenge
1
“IF” Stage
Instr Fetch
2
LW R4,0(R0)
OR R5,R4,R2
5
4
“ID/RF” Stage “EX” Stage “MEM” Stage WB
Memory Write
Decode & Reg Fetch Execution
Back
OR R5,R4,R2
Sample Program
3
IR
... but we haven’t
even started the
load yet!
Mux,Logic
LW R4,
0(R0) IR
IR
IR
WE, MemToReg
A
Y
R
M
B
CS 152: L4 Pipelining
M
One approach:
change the contract!
UC Regents Spring 2014 © UCB
From Lecture 1: Delayed Loads ...
Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
Fetch the load inst from memory
opcode
rs
rt
offset
“I-Format”
Decode fields to get : LW $1, 32($2)
“Retrieve” register value: $2
Compute memory address: 32 + $2
Load memory address contents into: $1
Prepare to fetch instr that follows
the LW in the program. Depending on
load semantics, new $1 is visible to
that instr, or not until the following
instr (”delayed loads”).
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
After we change the contract ...
1
“IF” Stage
Instr Fetch
2
LW R4,0(R0)
OR R5,R4,R2
5
4
“ID/RF” Stage “EX” Stage “MEM” Stage WB
Memory Write
Decode & Reg Fetch Execution
Back
OR R5,R4,R2
Sample Program
3
IR
... “delayed load”
contract does not
guarantee new
R4 is seen.
Mux,Logic
LW R4,
0(R0) IR
IR
IR
WE, MemToReg
A
Y
R
M
B
CS 152: L4 Pipelining
M
Only partially solves
problem ... soon, we
finish the story.
UC Regents Spring 2014 © UCB
Visualizing Pipelines
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Pipeline Representation #1: Timeline
IF (Fetch)
ID (Decode)
EX (ALU)
IR
IR
MEM
WB
IR
IR
Good for visualizing pipeline fills.
Sample Program Time: t1
Inst
I1: ADD R4,R3,R2
IF
I1:
AND
R6,R5,R4
I2:
I2:
I3: SUB R1,R9,R8
I3:
XOR
R3,R2,R1
I4:
I4:
OR
R7,R6,R5
I5:
I5:
I6:
CS 152: L4 Pipelining
t2
t3
t4
t5
ID
IF
EX
ID
IF
MEM
EX
ID
IF
WB
MEM
EX
ID
IF
Pipeline
is “full”
t6
t7
t8
WB
MEM
EX
ID
IF
WB
MEM
EX
ID
WB
MEM
EX
UC Regents Spring 2014 © UCB
Representation #2: Resource Usage
IF (Fetch)
ID (Decode)
EX (ALU)
IR
IR
MEM
WB
IR
IR
Good for visualizing pipeline stalls.
Sample Program Time: t1
Stage
I1: ADD R4,R3,R2
I1
IF:
AND
R6,R5,R4
I2:
ID:
I3: SUB R1,R9,R8
EX:
XOR
R3,R2,R1
I4:
MEM:
OR
R7,R6,R5
I5:
WB:
CS 152: L4 Pipelining
t2
t3
t4
t5
t6
t7
t8
I2
I1
I3
I2
I1
I4
I3
I2
I1
I5
I4
I3
I2
I1
I6
I5
I4
I3
I2
I7
I6
I5
I4
I3
I8
I7
I6
I5
I4
Pipeline
is “full”
UC Regents Spring 2014 © UCB
Hazard Taxonomy
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Structural Hazards
Several pipeline stages need to use the
same hardware resource at the same time.
Solution #1: Add extra copies of
the resource (only works sometime).
Solution #2: Change resource so
that it can handle concurrent use.
Solution #3: Stages “take turns”
by stalling parts of the pipeline.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Structural Hazard Example: One Memory
“IF” Stage
Used by
IF stage
and
MEM stage
PC
“ID/RF” Stage
IR
“EX” Stage “MEM” Stage WB
IR
IR
IR
WE, MemToReg
Mux,Logic
A
Y
R
To branch logic
M
M
MemToReg
B
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
A solution: “Extra copies” of memory
1
“IF” Stage
Instr Fetch
2
3
5
4
“ID/RF” Stage “EX” Stage “MEM” Stage WB
Memory Write
Decode & Reg Fetch Execution
Back
IR
IR
IR
IR
WE, MemToReg
Mux,Logic
A
Y
R
M
B
CS 152: L4 Pipelining
M
I and D caches
are a
hybrid solution
UC Regents Spring 2014 © UCB
Alternatively: Concurrent use ...
1
“IF” Stage
Instr Fetch
2
3
5
4
“ID/RF” Stage “EX” Stage “MEM” Stage WB
Memory Write
Decode & Reg Fetch Execution
Back
IR
IR
IR
IR
WE, MemToReg
Mux,Logic
A
Y
R
M
B
CS 152: L4 Pipelining
M
ID and WB stages
use register file in
same clock cycle
UC Regents Spring 2014 © UCB
Data Hazards: 3 Types (RAW, WAR, WAW)
Several pipeline stages read or write the
same data location in an incompatible way.
Read After Write (RAW) hazards.
Instruction I2 expects to read a data
value written by an earlier instruction,
but I2 executes “too early” and reads
the wrong copy of the data.
Note “data value”, not “register”. Data hazards
are possible for any architected state (such as
main memory). In practice, main memory hazard
avoidance is the job of the memory system.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Recall: RAW example
Stage #3
Stage #1
Stage #2
Instr Fetch
Decode & Reg Fetch
Sample program
ADD R4,R3,R2
OR R5,R4,R2
IR
ADD R4,R3,R2
OR R5,R4,R2
IR
... wrong value of
R4 fetched from
RegFile, contract
with programmer
A
broken! Oops!
M
B
CS 152: L4 Pipelining
IR
R4 not written yet ...
This is what
we mean
when we say
Read After
Write (RAW)
Hazard
UC Regents Spring 2014 © UCB
Data Hazards: 3 Types (RAW, WAR, WAW)
Write After Read (WAR) hazards. Instruction I2
expects to write over a data value after an
earlier instruction I1 reads it. But instead, I2
writes too early, and I1 sees the new value.
Write After Write (WAW) hazards. Instruction
I2 writes over data an earlier instruction I1
also writes. But instead, I1 writes after I2, and
the final data value is incorrect.
WAR and WAW not possible in our 5-stage
pipeline. But are possible in other pipeline
designs.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Control Hazards: A taken branch/jump
IF (Fetch)
ID (Decode)
IR
EX (ALU)
IR
MEM
IR
WB
IR
Note: with branch delay slot, I2 MUST
complete, I3 MUST NOT complete.
Sample Program Time: t1
(ISA w/o branch Inst
IF
I1:
delay slot)
I2:
I1: BEQ R4,R3,25
I3:
I2: AND R6,R5,R4
I4:
SUB
R1,R9,R8
I3:
I5:
I6:
CS 152: L4 Pipelining
t2
t3
t4
t5
ID
IF
EX
ID
IF
MEM
WB
t6 t7 t8
EX stage
computes
if branch is
taken
If branch is taken,
these instructions
MUST NOT complete!
UC Regents Spring 2014 © UCB
Hazards Recap
Structural Hazards
Data Hazards (RAW, WAR, WAW)
Control Hazards (taken branches and
jumps)
On each clock cycle, we must detect the
presence
of all of these hazards, and resolve them before
they break the “contract with the programmer”.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Break
Play:
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Hazard Resolution Tools
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
The Hazard Resolution Toolkit
Stall earlier instructions in pipeline.
Forward results computed in later
pipeline stages to earlier stages.
Add new hardware or rearrange
hardware design to eliminate hazard.
Change ISA to eliminate hazard.
Kill earlier instructions in pipeline.
Make hardware handle concurrent
requests to eliminate hazard.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Resolving a RAW hazard by stalling
Stage #3
Stage #1
Stage #2
Instr Fetch
Decode & Reg Fetch
Sample program
ADD R4,R3,R2
OR R5,R4,R2
IR
IR
Keep executing
OR instruction
until R4 is ready.
Until then, send
NOPS to IR 2/3.
Freeze PC and IR
until stall is over.
CS 152: L4 Pipelining
ADD R4,R3,R2
OR R5,R4,R2
IR
Let ADD proceed to
WB stage, so that R4
is written to regfile.
A
New datapath
hardware
M
(1) Mux into IR 2/3
to feed in NOP.
B
(2) Write enable on
PC and IR 1/2
UC Regents Spring 2014 © UCB
The Hazard Resolution Toolkit
Stall earlier instructions in pipeline.
Forward results computed in later
pipeline stages to earlier stages.
Add new hardware or rearrange
hardware design to eliminate hazard.
Change ISA to eliminate hazard.
Kill earlier instructions in pipeline.
Make hardware handle concurrent
requests to eliminate hazard.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Resolving a RAW hazard by forwarding
1
“IF” Stage
2
3
“ID/RF” Stage
Decode & Reg Fetch
“EX” Stage
Execution
OR R5,R4,R2
ADD R4,R3,R2
Instr Fetch
Sample program
ADD R4,R3,R2 IR
OR R5,R4,R2
IR
ALU computes R4
in
the EX stage, so ...
Just forward it
back!
A
Y
M
M
B
CS 152: L4 Pipelining
IR
Unlike stalling, does
not change CPI. May
hurt cycle time.
UC Regents Spring 2014 © UCB
The Hazard Resolution Toolkit
Stall earlier instructions in pipeline.
Forward results computed in later
pipeline stages to earlier stages.
Add new hardware or rearrange
hardware design to eliminate hazard.
Change ISA to eliminate hazard.
Kill earlier instructions in pipeline.
Make hardware handle concurrent
requests to eliminate hazard.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Control Hazards: Fix with more hardware
IF (Fetch)
ID (Decode)
IR
EX (ALU)
IR
MEM
IR
WB
IR
If we add hardware, can we
move it here?
Sample Program Time: t1
(ISA w/o branch Inst
IF
I1:
delay slot)
I2:
I1: BEQ R4,R3,25
I3:
I2: AND R6,R5,R4
I4:
SUB
R1,R9,R8
I3:
I5:
I6:
CS 152: L4 Pipelining
t2
t3
t4
t5
ID
IF
EX
ID
IF
MEM
WB
t6 t7 t8
EX stage
computes
if branch is
taken
If branch is taken,
these instructions
MUST NOT complete!
UC Regents Spring 2014 © UCB
Resolving control hazard with hardware
Stage #3
Stage #1
Stage #2
Instr Fetch
Decode & Reg Fetch
To branch
control logic
IR
IR
IR
==
A
M
B
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Control Hazards: After more hardware
IF (Fetch)
ID (Decode)
IR
EX (ALU)
IR
MEM
IR
WB
IR
If we change ISA, can we always let I2
complete (”branch delay slot”) and
eliminate the control hazard.
Sample Program Time: t1
(ISA w/o branch Inst
IF
I1:
delay slot)
I2:
I1: BEQ R4,R3,25
I3:
I2: AND R6,R5,R4
I4:
SUB
R1,R9,R8
I3:
I5:
I6:
CS 152: L4 Pipelining
t2
t3
t4
t5
ID
IF
EX
MEM
WB
t6 t7 t8
ID stage
computes
if branch is
taken
If branch is taken, this
instruction MUST NOT
complete!
UC Regents Spring 2014 © UCB
From Lecture 1: BEQ $1,$2,25
Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
Fetch branch inst from memory
opcode
rs
rt
offset
“I-Format”
Decode fields to get: BEQ $1, $2, 25
“Retrieve” register values: $1, $2
Compute if we take branch: $1 == $2 ?
ALWAYS prepare to fetch instr that
follows the BEQ in the program
(”delayed branch”). IF we take branch,
the instr we fetch AFTER that
instruction is PC + 4 + 100.
CS 152: L4 Pipelining
PC == “Program
UC Regents Spring 2014 © UCB
The Hazard Resolution Toolkit
Stall earlier instructions in pipeline.
Forward results computed in later
pipeline stages to earlier stages.
Add new hardware or rearrange
hardware design to eliminate hazard.
Change ISA to eliminate hazard.
Kill earlier instructions in pipeline.
Make hardware handle concurrent
requests to eliminate hazard.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Resolve control hazard by killing instr
Stage #3
Stage #1
Stage #2
Instr Fetch
Decode & Reg Fetch
Sample program
(no delay slot)
J 200
OR R5,R4,R2
IR
J 200
Detect J
instruction, mux
a NOP into IR 1/2
A
M
Compute new
PC using hardware not shown
CS 152: L4 Pipelining
IR
IR
This hurts
CPI.
Can we do
better?
B
UC Regents Spring 2014 © UCB
The Hazard Resolution Toolkit
Stall earlier instructions in pipeline.
Forward results computed in later
pipeline stages to earlier stages.
Add new hardware or rearrange
hardware design to eliminate hazard.
Change ISA to eliminate hazard.
Kill earlier instructions in pipeline.
Make hardware handle concurrent
requests to eliminate hazard.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Structural hazard solution: concurrent use
1
“IF” Stage
Instr Fetch
Does not
come for
free ...
2
3
5
4
“ID/RF” Stage “EX” Stage “MEM” Stage WB
Memory Write
Decode & Reg Fetch Execution
Back
IR
IR
IR
IR
WE, MemToReg
Mux,Logic
A
Y
R
M
B
CS 152: L4 Pipelining
M
ID and WB stages
use register file in
same clock cycle
UC Regents Spring 2014 © UCB
Hazard Diagnosis
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Data Hazards: Read After Write
Read After Write (RAW) hazards.
Instruction I2 expects to read a data
value written by an earlier instruction,
but I2 executes “too early” and reads
the wrong copy of the data.
Classic solution: use forwarding heavily,
fall back on stalling when forwarding won’t
work or slows down the critical path too much.
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Full bypass network ...
ID (Decode)
IR
EX
IR
WB
MEM
IR
IR
WE, MemToReg
Mux,Logic
From
WB
A
Y
R
M
M
B
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Common bug: Multiple forwards ...
ADD R4,R3,R2
OR R2,R3,R1 AND R2,R2,R1
Which do we forward from?
ID (Decode)
IR
EX
IR
WB
MEM
IR
IR
WE, MemToReg
Mux,Logic
From
WB
A
Y
R
M
M
B
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Common bug: Multiple forwards II ...
ADD R4,R0,R2
OR R0,R3,R1 AND R0,R2,R1
Which do we forward from?
ID (Decode)
IR
EX
IR
WB
MEM
IR
IR
WE, MemToReg
Mux,Logic
From
WB
A
Y
R
M
M
B
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
LW and Hazards
No load
“delay slot”
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Questions about LW and forwarding
ADDIU R1 R1 24
OR R3,R3,R2 LW R1 128(R29)
Do we need to stall ?
ID (Decode)
IR
EX
IR
WB
MEM
IR
IR
WE, MemToReg
Mux,Logic
From
WB
A
Y
R
M
M
B
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Questions about LW and forwarding
ADDIU R1 R1 24
LW R1 128(R29) OR R1,R3,R1
Do we need to stall ?
ID (Decode)
IR
EX
IR
WB
MEM
IR
IR
WE, MemToReg
Mux,Logic
From
WB
A
Y
R
M
M
B
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Resolving a RAW hazard by stalling
Stage #3
Stage #1
Stage #2
Instr Fetch
Decode & Reg Fetch
Sample program
ADD R4,R3,R2
OR R5,R4,R2
IR
IR
Keep executing
OR instruction
until R4 is ready.
Until then, send
NOPS to IR 2/3.
Freeze PC and IR
until stall is over.
CS 152: L4 Pipelining
ADD R4,R3,R2
OR R5,R4,R2
IR
Let ADD proceed to
WB stage, so that R4
is written to regfile.
A
New datapath
hardware
M
(1) Mux into IR 2/3
to feed in NOP.
B
(2) Write enable on
PC and IR 1/2
UC Regents Spring 2014 © UCB
Branches and Hazards
Single
“delay slot”
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Recall: Control hazard and hardware
Stage #3
Stage #1
Stage #2
Instr Fetch
Decode & Reg Fetch
To branch
control logic
IR
IR
IR
==
A
M
B
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Recall: After more hardware, change ISA
IF (Fetch)
ID (Decode)
IR
EX (ALU)
IR
MEM
IR
WB
IR
If we change ISA, can we always let I2
complete (”branch delay slot”) and
eliminate the control hazard.
Sample Program Time: t1
(ISA w/o branch Inst
IF
I1:
delay slot)
I2:
I1: BEQ R4,R3,25
I3:
I2: AND R6,R5,R4
I4:
SUB
R1,R9,R8
I3:
I5:
I6:
CS 152: L4 Pipelining
t2
t3
t4
t5
ID
IF
EX
MEM
WB
t6 t7 t8
ID stage
computes
if branch is
taken
If branch is taken, this
instruction MUST NOT
complete!
UC Regents Spring 2014 © UCB
Question about branch and forwards:
BEQ R1 R3 label
OR R3,R3,R1
Will this work as shown?
ID (Decode)
IR
EX
IR
To branch
control logic
Mux,Logic
WB
MEM
IR
IR
WE, MemToReg
==
A
Y
R
M
M
B
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Lessons learned
Pipelining is hard
Study every instruction
Write test code in advance
Think about interactions ...
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Lessons learned
Pipelining is hard
Study every instruction
Write test code in advance
Think about interactions ...
between forwarding, branch and
jump delay slots, R0 issues
LW issues ... a long list!
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Control Implementation
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
Recall: What is single cycle control?
Instr
Mem
Combinational Logic
(Only Gates, No Flip Flops)
Equal
32
Addr
Data
Just specify logic functions!
RegDest
PCSrc
RegWr
ExtOp
MemToReg
ALUsrc
5
5
5
RegFile
rs1
rs2
ws
wd
RegDest
32
ALUctr
32
rd1
rd2
32
Equal
WE
Ext
RegWr
CS 152: L4 Pipelining
MemWr
ExtOp
MemToReg
ALUsrc
MemWr
UC Regents Spring 2014 © UCB
In pipelines, all IR registers are used
ID (Decode)
IR
EX
IR
WB
MEM
IR
IR
Combinational Logic
(Only Gates, No Flip Flops)
Equal
(add extra state outside!)
RegDest
PCSrc
RegWr
ExtOp
MemToReg
A “conceptual” design -- for shortest critical
path, IR registers may hold decoded info,
not the complete 32-bit instruction
CS 152: L4 Pipelining
UC Regents Spring 2014 © UCB
On Tuesday
Quantitative instruction set architecture ...
Also, we will revisit the 68000 CPU
design, and the topic of microcode.
Have a good weekend !
Download