Lecture 12: Pipelined Processor •  Last time •  Today

advertisement
Lecture 12: Pipelined Processor
•  Last time
–  Simple processor organization
–  Logic & control
–  Motivation & idea behind pipeling
•  Today
–  Take QUIZ 8 over P&H 4.7-10, before 11:59pm today
–  Homework 4 due Thursday March 4, 2010
–  Pipelining in the real world
UTCS 352, Lecture 12
1
Pipelining Lessons
6 PM
T
a
s
k
8
9
Time
30 30 30 30 30 30 30
A
B
O
r
d
e
r
7
C
D
UTCS 352, Lecture 12
•  Pipelining doesn’t help latency
of single task, it helps
throughput of entire
workload
•  Multiple tasks operating
simultaneously using
different resources
•  Potential speedup = Number
pipe stages
•  Pipeline rate limited by
slowest pipeline stage
•  Unbalanced lengths of pipe
stages reduces speedup
•  Time to “fill” pipeline and time
to “drain” it reduces speedup
•  Stall for Dependences
2
1
Graphically Representing MIPS Pipeline
ALU
IM
Reg
DM
Reg
•  Can help with answering questions like:
–  How many cycles does it take to execute this code?
–  What is the ALU doing during cycle 4?
–  Is there a hazard, why does it occur, and how can it be
fixed?
UTCS 352, Lecture 12
3
Why Pipeline? For Performance!
Time (clock cycles)
Inst 3
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
Inst 2
DM
ALU
Inst 1
Once the
pipeline is full,
one instruction
is completed
every cycle, so
CPI = 1
Reg
ALU
IM
ALU
O
r
d
e
r
Inst 0
ALU
I
n
s
t
r.
Inst 4
Reg
Reg
Reg
Reg
DM
Reg
Time to fill the pipeline
UTCS 352, Lecture 12
4
2
Can Pipelining Get Us Into Trouble?
•  Yes: Pipeline Hazards
–  structural hazards: attempt to use the same resource by
two different instructions at the same time
–  data hazards: attempt to use data before it is ready
•  An instruction’s source operand(s) are produced by a
prior instruction still in the pipeline
–  control hazards: attempt to make a decision about
program control flow before the condition has been
evaluated and the new PC target address calculated
•  branch instructions
•  Can always resolve hazards by waiting
–  pipeline control must detect the hazard
–  and take action to resolve hazards
UTCS 352, Lecture 12
5
A Single Memory Would Be a Structural Hazard
Time (clock cycles)
Inst 4
Reg
Reg
Mem
Reg
Reg
Mem
Reg
Reg
Mem
Reg
Reg
ALU
Inst 3
Mem
Reading data from
memory
Mem
ALU
Inst 2
Reg
ALU
Inst 1
Mem
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
Mem
Mem
Mem
Reading instruction
from memory
Mem
•  Fix with separate instr and data memories (I$ and D$)
Reg
6
3
How About Register File Access?
Time (clock cycles)
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
Inst 1
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
Inst 2
add $2,$1,
Reg
Reg
Reg
DM
Reg
UTCS 352, Lecture 12
7
How About Register File Access?
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
Inst 2
DM
ALU
Inst 1
Reg
ALU
O
r
d
e
r
IM
ALU
I
n
s
t
r.
add $1,
add $2,$1,
clock edge that controls
register writing
UTCS 352, Lecture 12
Reg
Reg
Fix register file
access hazard by
doing reads in the
second half of the
cycle and writes in
the first half
Reg
DM
Reg
clock edge that controls
loading of pipeline state
registers
8
4
Register Usage Can Cause Data Hazards
Dependencies backward in time cause hazards
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
DM
IM
Reg
ALU
sub $4,$1,$5
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
and $6,$1,$7
or
$8,$1,$9
xor $4,$1,$5
Reg
Reg
Reg
Reg
DM
Reg
Read before write is ready: data hazard
UTCS 352, Lecture 12
9
Register Usage Can Cause Data Hazards
Dependencies backward in time cause hazards
Reg
DM
IM
Reg
ALU
DM
IM
Reg
ALU
DM
IM
Reg
ALU
DM
IM
Reg
ALU
sub $4,$1,$5
IM
ALU
add $1,
and $6,$1,$7
or
$8,$1,$9
xor $4,$1,$5
Reg
Reg
Reg
Reg
DM
Reg
Write-read data hazard
UTCS 352, Lecture 12
10
5
Loads Can Cause Data Hazards
Dependencies backward in time cause hazards
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
DM
IM
Reg
ALU
sub $4,$1,$5
IM
ALU
$1,4($2)
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
and $6,$1,$7
or
$8,$1,$9
xor $4,$1,$5
Reg
Reg
Reg
Reg
DM
Reg
•  Load-use data hazard
UTCS 352, Lecture 12
11
12
Resolving Hazards: Pipeline Stalls
•  Can resolve any type of hazard
–  data, control, or structural
•  Detect the hazard
•  Freeze the pipeline up to the dependent stage
until the hazard is resolved
UTCS 352, Lecture 12
6
One Way to “Fix” a Data Hazard
IM
DM
IM
and $6,$1,$7
Reg
Reg
DM
IM
Reg
ALU
sub $4,$1,$5
Reg
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
Can fix data
hazard by
waiting – stall
– but impacts
CPI
Reg
DM
UTCS 352, Lecture 12
Reg
13
Another Way to “Fix” a Data Hazard
or
$8,$1,$9
xor $4,$1,$5
UTCS 352, Lecture 12
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
and $6,$1,$7
DM
ALU
sub $4,$1,$5
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
Fix data hazards
by forwarding
results as soon
as they are
available to
where they are
needed
Reg
Reg
Reg
Reg
DM
Reg
14
7
Data Forwarding (aka Bypassing)
• 
Take the result from the earliest point that it exists in any of the
pipeline state registers and forward it to the functional units
(e.g., the ALU) that need it that cycle
• 
For ALU functional unit: the inputs can come from any pipeline
register
– 
adding multiplexors to the inputs of the ALU
– 
connecting the Rd write data in EX/MEM or MEM/WB to either (or
both) of the EX’s stage Rs and Rt ALU mux inputs
– 
adding the proper control hardware to control the new muxes
• 
Other functional units may need forwarding logic (e.g., the DM)
• 
Forwarding can achieve a CPI of 1 even in the presence of data
dependencies
UTCS 352, Lecture 12
15
Datapath with Forwarding Hardware
PCSrc
ID/EX
EX/MEM
Control
IF/ID
Add
Shift
left 2
4
PC
Instruction
Memory
Read
Address
Add
Read Addr 1
Data
Memory
Register Read
Read Addr 2Data 1
File
Write Addr
Write Data
16
Sign
Extend
MEM/WB
Branch
ALU
Read
Data 2
Address Read
Data
Write Data
ALU
cntrl
32
EX/MEM.RegisterRd
ID/EX.RegisterRt
ID/EX.RegisterRs
UTCS 352, Lecture 12
Forward
Unit
MEM/WB.RegisterRd
16
8
Forwarding Illustration
sub $4,$1,$5
Reg
DM
IM
Reg
DM
IM
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
and $6,$7,$1
EX/MEM hazard
forwarding
Reg
Reg
DM
Reg
MEM/WB hazard
forwarding
UTCS 352, Lecture 12
17
Data Forwarding Control Conditions
EX/MEM hazard:
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd
and (EX/MEM.RegisterRd
ForwardA = 10
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd
and (EX/MEM.RegisterRd
ForwardB = 10
!= 0)
= ID/EX.RegisterRs))
!= 0)
= ID/EX.RegisterRt))
Forwards the
result from the
previous instr.
to either input
of the ALU
MEM/WB hazard:
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd
and (MEM/WB.RegisterRd
ForwardA = 01
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd
and (MEM/WB.RegisterRd
ForwardB = 01
UTCS 352, Lecture 12
Forwards the
!= 0)
= ID/EX.RegisterRs)) result from the
second
previous instr.
to either input
!= 0)
= ID/EX.RegisterRt)) of the ALU
18
9
Yet Another Complication!
•  Another potential data hazard can occur when there is
a conflict between the result of the WB stage
instruction and the MEM stage instruction – which
should be forwarded?
add $1,$1,$4
Reg
DM
IM
Reg
DM
IM
Reg
ALU
add $1,$1,$3
IM
ALU
O
r
d
e
r
add $1,$1,$2
ALU
I
n
s
t
r.
Reg
UTCS 352, Lecture 12
Reg
DM
Reg
19
Corrected Data Forwarding Control Conditions
MEM/WB hazard:
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd != 0)
and (EX/MEM.RegisterRd != ID/EX.RegisterRs)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd != 0)
and (EX/MEM.RegisterRd != ID/EX.RegisterRt)
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01
UTCS 352, Lecture 12
20
10
Memory Data Hazards
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
sub $4,$1,$5
Reg
ALU
IM
ALU
$1,4($2)
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
and $6,$1,$7
or
$8,$1,$9
xor $2,$7,$9
Reg
Reg
Reg
Reg
DM
Reg
•  Does forwarding solve all our problems?
UTCS 352, Lecture 12
21
Forwarding with Load-use Data Hazards
or
$8,$1,$9
xor $2,$7,$9
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
and $6,$1,$7
DM
ALU
sub $4,$1,$5
Reg
ALU
$1,4($2) IM
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
Reg
Reg
Reg
Reg
DM
Reg
•  still must stall one cycle even with forwarding!
UTCS 352, Lecture 12
22
11
Forwarding with Load-use Data Hazards
or
xor $8,$1,$9
$4,$1,$5
xor $4,$1,$5
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
or $6,$1,$7
$8,$1,$9
and
IM
ALU
and $4,$1,$5
$6,$1,$7
sub
DM
ALU
stall $4,$1,$5
sub
Reg
ALU
IM
$1,4($2)
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
Reg
Reg
Reg
UTCS 352, Lecture 12
Reg
Reg
DM
23
Load-use Hazard Detection Unit
•  Need a Hazard detection Unit in the ID stage that inserts a
stall between the load and its use
ID Hazard Detection
if (ID/EX.MemRead
and ((ID/EX.RegisterRt = IF/ID.RegisterRs)
or (ID/EX.RegisterRt = IF/ID.RegisterRt)))
stall the pipeline
•  The first line tests to see if the instruction now in
the EX stage is a lw; the next two lines check to see
if the destination register of the lw matches either
source register of the instruction in the ID stage
(the load-use instruction)
•  After this one cycle stall, the forwarding logic can
handle the remaining data hazards
UTCS 352, Lecture 12
24
12
Stall Hardware
•  We have to detect this case & implement the stall
•  Prevent the instructions in the IF and ID stages from
progressing down the pipeline – done by preventing the PC
register and the IF/ID pipeline register from changing
–  Hazard detection Unit controls the writing of the PC (PC.write)
and IF/ID (IF/ID.write) registers
•  Insert a “bubble” between the lw instruction (in the EX stage)
and the load-use instruction (in the ID stage) (i.e., insert a noop
in the execution stream)
–  Set the control bits in the EX, MEM, and WB control fields of the
ID/EX pipeline register to 0 (noop). The Hazard Unit controls the
mux that chooses between the real control values and the 0’s.
•  Let the lw instruction and the instructions after it in the
pipeline (before it in the code) proceed normally down the
pipeline
UTCS 352, Lecture 12
Adding the Hazard Hardware
Hazard
Unit
PC
ID/EX.MemRead
EX/MEM
0
Control 0
Shift
left 2
4
Instruction
Memory
Read
Address
Add
Read Addr 1
File
Write Addr
16
Sign
Extend
Read
Data 2
32
ID/EX.RegisterRt
MEM/WB
Branch
Data
Memory
Register Read
Read Addr 2Data 1
Write Data
UTCS 352, Lecture 12
PCSrc
1
IF/ID
Add
ID/EX
25
ALU
Address Read
Data
Write Data
ALU
cntrl
Forward
Unit
26
13
Compiler Instruction Scheduling
•  The compiler can rearrange instructions, eliminating
load-use hazard!
•  Proebsting & Fischer (1991) show how to optimally
schedule a straight line sequence of instructions,
given sufficient registers and a delay of one pipeline
stage.
•  Approach
•  Build a dependence graph that describes the partial order of
instruction definitions and uses
•  Schedule R independent loads (load; load; load; ..)
•  Each load requires a register,
•  thus R is the minimum number of live registers
•  Schedule operation independent of the previous load and
another
UTCS 352,
Lecture 12 load in a pair (operation; load)
27
Compiler Scheduling to Avoid
Load-use Data Hazards
or
$8,$1,$9
xor $2,$7,$9
UTCS 352, Lecture 12
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
and $6,$1,$7
DM
ALU
sub $4,$1,$5
Reg
ALU
$1,4($2) IM
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
Reg
Reg
Reg
Reg
DM
Reg
28
14
Compiler Scheduling to Avoid
Load-use Data Hazards
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
sub $4,$1,$5
DM
ALU
xor $2,$7,$9
Reg
ALU
$1,4($2) IM
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
and $6,$1,$7
or
$8,$1,$9
Reg
Reg
Reg
UTCS 352, Lecture 12
Reg
DM
Reg
29
30
Types of Data Hazards
•  RAW (read after write)
–  only hazard for ‘fixed’ pipelines
–  later instruction must read after earlier write
•  WAW (write after write)
–  variable-length pipeline
–  later instruction must write after earlier write
•  WAR (write after read)
–  pipelines with late read
–  later instruction must write after earlier read
UTCS 352, Lecture 12
15
Memory-to-Memory Copies
•  For loads immediately followed by stores (memory
-to-memory copies) can avoid a stall by adding
forwarding hardware from the MEM/WB register to
the data memory input.
–  Would need to add a Forward Unit and a mux to the memory
access stage
sw $1,4($3)
IM
Reg
DM
IM
Reg
ALU
O
r
d
e
r
lw $1,4($2)
ALU
I
n
s
t
r.
Reg
DM
Reg
UTCS 352, Lecture 12
31
32
Summary
•  The real world of pipelining
–  Just stall
–  Forwarding for register and memory hazards
•  Next Time
–  Prediction for control hazards
–  Multiple issue and out-of-order processors
–  Homework 4 due Thursday March 4, 2010
•  Reading: P&H 4.11-15
UTCS 352, Lecture 12
16
Download