18-447 Computer Architecture Lecture 9: Data Dependence Handling

18-447
Computer Architecture
Lecture 9: Data Dependence Handling
Prof. Onur Mutlu
Carnegie Mellon University
Spring 2013, 2/6/2013
Reminder: Homework 2

Homework 2 out




Due February 11 (next Monday)
LC-3b microcode
ISA concepts, ISA vs. microarchitecture, microcoded machines
Remember: Homework 1 solutions were out
2
Reminder: Lab Assignment 2

Lab Assignment 1.5



Verilog practice
Not to be turned in
Lab Assignment 2





Due Feb 15
Single-cycle MIPS implementation in Verilog
All labs are individual assignments
No collaboration; please respect the honor code
Do not forget the extra credit portion!
3
Homework 1 Grades
HW 1 Score Distribution
30
Number of Students
25
20
Average = 92%
Standard Deviation = 8%
15
10
5
0
0%
10%
20%
30%
40%
50% 60%
Score
70%
80%
90%
100%
4
Lab 1 Grades
Lab 1 Score Distribution
20
Number of Students
18
16
14
Average = 81%
Standard Deviation = 17%
12
10
8
6
4
2
0
0%
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Score
5
Readings for Next Few Lectures


P&H Chapter 4.9-4.11
Smith and Sohi, “The Microarchitecture of Superscalar
Processors,” Proceedings of the IEEE, 1995



More advanced pipelining
Interrupt and exception handling
Out-of-order and superscalar execution concepts
6
Today’s Agenda

Deep dive into pipelining

Dependence handling
7
Review: Pipelining: Basic Idea

Idea:



Divide the instruction processing cycle into distinct “stages” of
processing
Ensure there are enough hardware resources to process one
instruction in each stage
Process a different instruction in each stage



Instructions consecutive in program order are processed in
consecutive stages
Benefit: Increases instruction processing throughput (1/CPI)
Downside: ???
8
Review: Execution of Four Independent ADDs

Multi-cycle: 4 cycles per instruction
F
D
E
W
F
D
E
W
F
D
E
W
F
D
E
W
Time

Pipelined: 4 cycles per 4 instructions (steady state)
F
D
E
W
F
D
E
W
F
D
E
F
D
Is life always this beautiful?
W
E
W
Time
9
d
reagtai s te r
1M
uu
xx
Writ
ri te
W
e
ta
ddaata
W ri te
data
11
M
0M
uu
xx
D a ta
mem
m o rry
m
y
Review: Pipelined Operation Example
16
32
1166
SSi ig
g nn
tenndd
eexxte
00
Wrrite
W
ite
ddaatta
a
Sign
e x te n d
3322
C lo c k 1
C lo cC
k lo
5 ck 3
s lw
u b $$1101, , 2$02($
, $13)
In sstru
tru cctio
tio nn fe
fe tc
tc hh
In
slw
u b$$1101, ,2$02($, 1$ )3
lw $ 1 0 , 2 0 ($ 1 )
In
In ss tru
tru cctio
tio nn ddee ccoo dd ee
E x e c u tio n
su b $11 , $2, $ 3
0
00
M
M
M
uuu
xxx
1
11
1 3)
slw
u b $$1101, ,2$02( ,$ $
E x e c u tio n
IF/I
/ID
IF
D
IF
/I/ID
D
IDD/E
/EXX
X
IID
ID
/E
slw
u b$$1101, ,2 $02( $, 1$ )3
W rite
rite bbaacckk
W
Me
em
mo
o ry
ry
M
EXX
X//M
/M
EM
M
//M
MEE
EE
M
M
MEE
EM
M/W
/W BB
B
M
M
M
/W
Addddd
d
AA
Ad d
Add
ddd
d AAdddd
AA
ressssuu
ult
ltt
lt
re
llt
re
444
PC
C
PP
C
Add
dddd
dre
ressssss
AA
re
Inssstr
tru
tio
ttio
ionnn
tru
In
uucccttio
io
In
tr
meeem
mooory
ry
m
m
m
ry
In
tru
titio
In
trtru
uccctitio
oonnn
Inssstru
Shh
hif
iift
ftfttt
SS
iifift
lefftft
ftt 22
2
le
le
Ree
eaa
add
d
R
R
regg
gis
terrr 11
1
re
te
iis
sste
re
iis
Reeeaa
add
d
R
R
ata
ta
1
ttta
aa 11
dddaa
Ree
eaa
add
d
R
R
regg
gis
terrr 22
2
re
te
iis
sste
re
iis
Ree
egggis
iste
te
rs R e a d
ttte
eers
R
R
is
rs
R
Reeaadd
W
ri
te
W rite
rite
ri
te
rit
ee
W
ata
ta
2
W
rit
ttta
aa 22
dddaa
regg
giis
is
te
r
re
is
te
r
te
re
s
r
is
Zeeero
ro
ro
ZZ
0
00
M
M
M
M
uuu
xxx
ALL
LU
U
AA
U
ALL
LU
U
AA
U
resssuu
ultltlt
re
re
Is life always this beautiful?
Write
rite
te
rite
ri
rit
ee
W
W
rit
taa
tta
dd
ddaa
ata
ta
re
Addddddre
ressssss
AA
D
aaatta
a
ta
DD
Da
ta
m
ry
m
eeem
ooory
me
m
m
ry
mo
ry
1
11
166
11
66
Siig
n
S
gnn
n
SS
gg
iiig
tenn
eexxxte
te
ndd
dd
ee
te
Ree
eaa
addd
R
R
daa
ata
ta
dd
ta
Wri
ri
te
W
rrite
ite
te
W
rrite
ite
daa
atta
dd
aa
tta
1
11
M
M
u
uu
xxx
0
00
322
22
33
lo c k 3
C
lo
ccC
kC
C
lo
kk56 c21k 4
CC
lolo
kcclo
sub $11, $2, $3
10
lw $ 1 0 , 2 0 ($ 1 )
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
su b $11 , $2, $ 3
lw $ 1 0 , 2 0 ( $ 1 )
sub $11, $2, $ 3
Review: Instruction Pipeline: Not An Ideal Pipeline

Identical operations ... NOT!
 different instructions do not need all stages
- Forcing different instructions to go through the same multi-function pipe
 external fragmentation (some pipe stages idle for some instructions)

Uniform suboperations ... NOT!
 difficult to balance the different pipeline stages
- Not all pipeline stages do the same amount of work
 internal fragmentation (some pipe stages are too-fast but take the
same clock cycle time)

Independent operations ... NOT!
 instructions are not independent of each other
- Need to detect and resolve inter-instruction dependencies to ensure the
pipeline operates correctly
 Pipeline is not always moving (it stalls)
11
Review: Fundamental Issues in Pipeline Design

Balancing work in pipeline stages


How many stages and what is done in each stage
Keeping the pipeline correct, moving, and full in the
presence of events that disrupt pipeline flow

Handling dependences




Data
Control
Handling resource contention
Handling long-latency (multi-cycle) operations

Handling exceptions, interrupts

Advanced: Improving pipeline throughput

Minimizing stalls
12
Review: Data Dependences

Types of data dependences




Flow dependence (true data dependence – read after write)
Output dependence (write after write)
Anti dependence (write after read)
Which ones cause stalls in a pipelined machine?



For all of them, we need to ensure semantics of the program
is correct
Flow dependences always need to be obeyed because they
constitute true dependence on a value
Anti and output dependences exist due to limited number of
architectural registers


They are dependence on a name, not a value
We will later see what we can do about them
13
Data Dependence Types
Flow dependence
r3
 r1 op r2
r5
 r3 op r4
Read-after-Write
(RAW)
Anti dependence
r3
 r1 op r2
r1
 r4 op r5
Write-after-Read
(WAR)
Output-dependence
r3
 r1 op r2
r5
 r3 op r4
r3
 r6 op r7
Write-after-Write
(WAW)
14
d
reagtai s te r
1M
uu
xx
Writ
ri te
W
e
ta
ddaata
11
Pipelined Operation Example
16
32
Sign
e x te n d
1166
SSi ig
g nn
tenndd
eexxte
W ri te
data
M
0M
uu
xx
D a ta
mem
m o rry
m
y
00
Wrrite
W
ite
ddaatta
a
3322
C lo c k 1
C lo cC
k lo
5 ck 3
s lw
u b $$1101, , 2$02($
, $13)
In sstru
tru cctio
tio nn fe
fe tc
tc hh
In
slw
u b$$1101, ,2$02($, 1$ )3
lw $ 1 0 , 2 0 ($ 1 )
In
In ss tru
tru cctio
tio nn ddee ccoo dd ee
E x e c u tio n
su b $11 , $2, $ 3
0
00
M
M
M
uuu
xxx
1
11
1 3)
slw
u b $$1101, ,2$02( ,$ $
E x e c u tio n
IF/I
/ID
IF
D
IF
/I/ID
D
IDD/E
/EXX
X
IID
ID
/E
slw
u b$$1101, ,2 $02( $, 1$ )3
W rite
rite bbaacckk
W
Me
em
mo
o ry
ry
M
EXX
X//M
/M
EM
M
//M
MEE
EE
M
M
MEE
EM
M/W
/W BB
B
M
M
M
/W
Addddd
d
AA
Ad d
Add
ddd
d AAdddd
AA
ressssuu
ult
ltt
lt
re
llt
re
444
PC
C
PP
C
Add
dddd
dre
ressssss
AA
re
Inssstr
tru
tio
ttio
ionnn
tru
In
uucccttio
io
In
tr
meeem
mooory
ry
m
m
m
ry
In
tru
titio
In
trtru
uccctitio
oonnn
Inssstru
Shh
hif
iift
ftfttt
SS
iifift
lefftft
ftt 22
2
le
le
Ree
eaa
add
d
R
R
regg
gis
terrr 11
1
re
te
iis
sste
re
iis
Reeeaa
add
d
R
R
ata
ta
1
ttta
aa 11
dddaa
Ree
eaa
add
d
R
R
regg
gis
terrr 22
2
re
te
iis
sste
re
iis
Ree
egggis
iste
te
rs R e a d
ttte
eers
R
R
is
rs
R
Reeaadd
W
ri
te
W rite
rite
ri
te
rit
ee
W
ata
ta
2
W
rit
ttta
aa 22
dddaa
regg
giis
is
te
r
re
is
te
r
te
re
s
r
is
Zeeero
ro
ro
ZZ
0
00
M
M
M
M
uuu
xxx
ALL
LU
U
AA
U
ALL
LU
U
AA
U
resssuu
ultltlt
re
re
re
Addddddre
ressssss
AA
Ree
eaa
addd
R
R
daa
ata
ta
dd
ta
D
aaatta
a
ta
DD
Da
ta
m
ry
m
eeem
ooory
me
m
m
ry
mo
ry
What if the SUB were dependent on LW?
Write
rite
te
rite
ri
rit
ee
W
W
rit
taa
tta
dd
ddaa
ata
ta
1
11
Wri
ri
te
W
rrite
ite
te
W
rrite
ite
daa
atta
dd
aa
tta
166
11
66
Siig
n
S
gnn
n
SS
gg
iiig
tenn
eexxxte
te
ndd
dd
ee
te
1
11
M
M
u
uu
xxx
0
00
322
22
33
lo c k 3
C
lo
ccC
kC
C
lo
kk56 c21k 4
CC
lolo
kcclo
sub $11, $2, $3
15
lw $ 1 0 , 2 0 ($ 1 )
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
su b $11 , $2, $ 3
lw $ 1 0 , 2 0 ( $ 1 )
sub $11, $2, $ 3
How to Handle Data Dependences

Anti and output dependences are easier to handle

write to the destination in one stage and in program order

Flow dependences are more interesting

Five fundamental ways of handling flow dependences



Detect and wait until value is available in register file
Detect and forward/bypass data to dependent instruction
Detect and eliminate the dependence at the software level



No need for the hardware to detect dependence
Predict the needed value(s), execute “speculatively”, and verify
Do something else (fine-grained multithreading)

No need to detect
16
Interlocking

Detection of dependence between instructions in a
pipelined processor to guarantee correct execution

Software based interlocking
vs.
Hardware based interlocking

MIPS acronym?

17
Approaches to Dependence Detection (I)

Scoreboarding



Each register in register file has a Valid bit associated with it
An instruction that is writing to the register resets the Valid bit
An instruction in Decode stage checks if all its source and
destination registers are Valid



Advantage:


Yes: No need to stall… No dependence
No: Stall the instruction
Simple. 1 bit per register
Disadvantage:

Need to stall for all types of dependences, not only flow dep.
18
Not Stalling on Anti and Output Dependences

What changes would you make to the scoreboard to enable
this?
19
Approaches to Dependence Detection (II)

Combinational dependence check logic




Advantage:


Special logic that checks if any instruction in later stages is
supposed to write to any source register of the instruction that
is being decoded
Yes: stall the instruction/pipeline
No: no need to stall… no flow dependence
No need to stall on anti and output dependences
Disadvantage:


Logic is more complex than a scoreboard
Logic becomes more complex as we make the pipeline deeper
and wider (flash-forward: think superscalar execution)
20
Once You Detect the Dependence in Hardware





What do you do afterwards?
Observation: Dependence between two instructions is
detected before the communicated data value becomes
available
Option 1: Stall the dependent instruction right away
Option 2: Stall the dependent instruction only when
necessary  data forwarding/bypassing
Option 3: …
21
Data Forwarding/Bypassing





Problem: A consumer (dependent) instruction has to wait in
decode stage until the producer instruction writes its value
in the register file
Goal: We do not want to stall the pipeline unnecessarily
Observation: The data value needed by the consumer
instruction can be supplied directly from a later stage in the
pipeline (instead of only from the register file)
Idea: Add additional dependence check logic and data
forwarding paths (buses) to supply the producer’s value to
the consumer right after the value is available
Benefit: Consumer can move in the pipeline until the point
the value can be supplied  less stalling
22
A Special Case of Data Dependence

Control dependence

Data dependence on the Instruction Pointer / Program Counter
23
Control Dependence


Question: What should the fetch PC be in the next cycle?
Answer: The address of the next instruction


If the fetched instruction is a non-control-flow instruction:



Next Fetch PC is the address of the next-sequential instruction
Easy to determine if we know the size of the fetched instruction
If the instruction that is fetched is a control-flow instruction:


All instructions are control dependent on previous ones. Why?
How do we determine the next Fetch PC?
In fact, how do we know whether or not the fetched
instruction is a control-flow instruction?
24
Data Dependence Handling:
More Depth & Implementation
25
Remember: Data Dependence Types
Flow dependence
r3
 r1 op r2
r5
 r3 op r4
Read-after-Write
(RAW)
Anti dependence
r3
 r1 op r2
r1
 r4 op r5
Write-after-Read
(WAR)
Output-dependence
r3
 r1 op r2
r5
 r3 op r4
r3
 r6 op r7
Write-after-Write
(WAW)
26
How to Handle Data Dependences

Anti and output dependences are easier to handle

write to the destination in one stage and in program order

Flow dependences are more interesting

Five fundamental ways of handling flow dependences



Detect and wait until value is available in register file
Detect and forward/bypass data to dependent instruction
Detect and eliminate the dependence at the software level



No need for the hardware to detect dependence
Predict the needed value(s), execute “speculatively”, and verify
Do something else (fine-grained multithreading)

No need to detect
27
RAW Dependence Handling

Following flow dependences lead to conflicts in the 5-stage
pipeline
addi
ra r- -
addi
r- ra -
addi
r- ra -
addi
r- ra -
addi
r- ra -
addi
r- ra -
IF
ID
EX
MEM WB
IF
ID
EX
MEM WB
IF
ID
EX
MEM
IF
ID
EX
IF
?ID
IF
28
Register Data Dependence Analysis
R/I-Type
LW
SW
Br
read RF
read RF
read RF
read RF
write RF
write RF
J
Jr
IF
ID
read RF
EX
MEM
WB

For a given pipeline, when is there a potential conflict
between 2 data dependent instructions?



dependence type: RAW, WAR, WAW?
instruction types involved?
distance between the two instructions?
29
Safe and Unsafe Movement of Pipeline
j:_rk
stage X
Reg Read
iOj
i:rk_
j:rk_
Reg Write
iAj
stage Y
Reg Write
RAW Dependence
i:_rk
j:rk_
Reg Write
iDj
Reg Read
WAR Dependence
i:rk_
Reg Write
WAW Dependence
dist(i,j)  dist(X,Y)  Unsafe
??
to keep j moving
dist(i,j) > dist(X,Y)  Safe
??
30
RAW Dependence Analysis Example
R/I-Type
LW
SW
Br
read RF
read RF
read RF
read RF
write RF
write RF
J
Jr
IF
ID
read RF
EX
MEM
WB

Instructions IA and IB (where IA comes before IB) have RAW
dependence iff


IB (R/I, LW, SW, Br or JR) reads a register written by IA (R/I or LW)
dist(IA, IB)  dist(ID, WB) = 3
What about WAW and WAR dependence?
What about memory data dependence?
31
Pipeline Stall: Resolving Data Dependence
t0
Insth IF
i
Insti
Instj
Instk
Instl
i: rx  _
j: _  rx
bubble
j: _  rx
bubble
j: _  rx
bubble
j: _  rx
t1
t2
t3
t4
ID
IF
ALU
ID
IF
MEM
ALU
ID
IF
WB
MEM
ALU
ID
ID
IF
IF
j
t5
WB
MEM
ALU
ID
ALU
ID
IF
ID
IF
IF
ID
WB
MEM
ALU
WB
MEM
ALU
MEM ID
ALU
ID
IF
WB
MEM
ALU
ALU IF
ID
IF
MEM
ALU
ID
ID
IF
ALU
ID
IF
IF
ID
IF
dist(i,j)=1
Stall==make the dependent instruction
IF
dist(i,j)=2wait until its source data value is available
dist(i,j)=3 1. stop all up-stream stages
dist(i,j)=4 2. drain all down-stream stages
32
How to Implement Stalling
P C S rc
ID /E X
0
M
u
x
WB
E X /M E M
1
C o n tro l
IF /ID
M
WB
EX
M
M E M /W B
WB
Add
A dd
In s tru c tio n
m e m ory
B ran ch
S h ift
le ft 2
ALU Src
R e ad
re g is te r 1
R ea d
d a ta 1
R e ad
re g is te r 2
R e g is te rs R e a d
W rite
d a ta 2
re g is te r
M e m to R e g
A d d re ss
Add
re s u lt
M e m W rite
PC
In s tru c tio n
R e g W rite
4
Z e ro
ALU
0
M
u
x
W rite
d a ta
A LU
re s u lt
Read
d a ta
A d d re s s
D a ta
m e m o ry
1
32
M
u
x
0
W rite
d a ta
In s tru c tio n
16
[1 5 – 0 ]
1
6
S ig n
e x te n d
A LU
c o n tr o l
M e m R e ad
In s tru c tio n
[2 0 – 1 6 ]
0
In s tru c tio n
[1 5 – 1 1 ]
ALUOp
M
u
x
1
Re g D st

Stall


disable PC and IR latching; ensure stalled instruction stays in its stage
Insert “invalid” instructions/nops into the stage following the stalled one
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
33
Stall Conditions

Instructions IA and IB (where IA comes before IB) have RAW
dependence iff



IB (R/I, LW, SW, Br or JR) reads a register written by IA (R/I or LW)
dist(IA, IB)  dist(ID, WB) = 3
In other words, must stall when IB in ID stage wants to read a
register to be written by IA in EX, MEM or WB stage
34
Stall Conditions

Helper functions



Stall when







rs(I) returns the rs field of I
use_rs(I) returns true if I requires RF[rs] and rs!=r0
(rs(IRID)==destEX) && use_rs(IRID) && RegWriteEX or
(rs(IRID)==destMEM) && use_rs(IRID) && RegWriteMEM
(rs(IRID)==destWB) && use_rs(IRID) && RegWriteWB or
(rt(IRID)==destEX) && use_rt(IRID) && RegWriteEX
or
(rt(IRID)==destMEM) && use_rt(IRID) && RegWriteMEM
(rt(IRID)==destWB) && use_rt(IRID) && RegWriteWB
or
or
It is crucial that the EX, MEM and WB stages continue to advance
normally during stall cycles
35
Impact of Stall on Performance

Each stall cycle corresponds to 1 lost ALU cycle

For a program with N instructions and S stall cycles,
Average CPI=(N+S)/N

S depends on



frequency of RAW dependences
exact distance between the dependent instructions
distance between dependences
suppose i1,i2 and i3 all depend on i0, once i1’s dependence is
resolved, i2 and i3 must be okay too
36
Sample Assembly (P&H)

for (j=i-1; j>=0 && v[j] > v[j+1]; j-=1) { ...... }
for2tst:
addi
slti
bne
sll
add
lw
lw
slt
beq
.........
addi
j
$s1, $s0, -1
$t0, $s1, 0
$t0, $zero, exit2
$t1, $s1, 2
$t2, $a0, $t1
$t3, 0($t2)
$t4, 4($t2)
$t0, $t4, $t3
$t0, $zero, exit2
3 stalls
3 stalls
3 stalls
3 stalls
3 stalls
3 stalls
$s1, $s1, -1
for2tst
exit2:
37
Data Forwarding (or Data Bypassing)

It is intuitive to think of RF as state


But, RF is just a part of a computing abstraction


“add rx ry rz” literally means get values from RF[ry] and RF[rz]
respectively and put result in RF[rx]
“add rx ry rz” means 1. get the results of the last instructions to
define the values of RF[ry] and RF[rz], respectively, and 2. until
another instruction redefines RF[rx], younger instructions that
refers to RF[rx] should use this instruction’s result
What matters is to maintain the correct “dataflow” between
operations, thus
add
ra r- r-
addi
r- ra r-
IF
ID
EX
MEM WB
IF
ID
ID
EX
ID
MEM
ID
WB
38
Resolving RAW Dependence with Forwarding

Instructions IA and IB (where IA comes before IB) have RAW
dependence iff



IB (R/I, LW, SW, Br or JR) reads a register written by IA (R/I or LW)
dist(IA, IB)  dist(ID, WB) = 3
In other words, if IB in ID stage reads a register written by IA in
EX, MEM or WB stage, then the operand required by IB is not yet
in RF
 retrieve operand from datapath instead of the RF
 retrieve operand from the youngest definition if multiple
definitions are outstanding
39
Data Forwarding Paths (v1)
a . N o fo rw a rd in g
ID / E X
E X /M E M
M E M /W B
dist(i,j)=3
M
u
x
R e g is t e rs
F o r w a rd A
M
u
x
internal
forward?
Rs
ALU
dist(i,j)=1
dist(i,j)=2
D a ta
m e m o ry
M
u
x
F o rw a rd B
Rt
Rt
Rd
M
u
x
E X /M E M .R e g is te rR d
F o rw a rd in g
M E M /W B .R e g is t e rR d
u n it
dist(i,j)=3
b . W ith fo rw a rd in g
[Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
40
Data Forwarding Paths (v2)
a . N o fo rw a rd in g
ID / E X
E X /M E M
M E M /W B
dist(i,j)=3
M
u
x
R e g is t e rs
F o r w a rd A
M
u
x
Rs
ALU
dist(i,j)=1
dist(i,j)=2
D a ta
m e m o ry
M
u
x
F o rw a rd B
Rt
Rt
Rd
M
u
x
E X /M E M .R e g is te rR d
F o rw a rd in g
M E M /W B .R e g is t e rR d
u n it
b . W ith fo rw a rd in g
[Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
41
Assumes RF forwards internally
Data Forwarding Logic (for v2)
if (rsEX!=0) && (rsEX==destMEM) && RegWriteMEM then
forward operand from MEM stage
// dist=1
else if (rsEX!=0) && (rsEX==destWB) && RegWriteWB then
forward operand from WB stage // dist=2
else
use AEX (operand from register file)
// dist >= 3
Ordering matters!! Must check youngest match first
Why doesn’t use_rs( ) appear in the forwarding logic?
What does the above not take into account?
42
Data Forwarding (Dependence Analysis)
R/I-Type
LW
SW
Br
J
Jr
IF
ID
EX
MEM
use
use
produce
use
use
produce
(use)
use
WB

Even with data-forwarding, RAW dependence on an immediately
preceding LW instruction requires a stall
43
Sample Assembly, Revisited (P&H)

for (j=i-1; j>=0 && v[j] > v[j+1]; j-=1) { ...... }
addi $s1, $s0, -1
for2tst:
slti
$t0, $s1, 0
bne $t0, $zero, exit2
sll
$t1, $s1, 2
add $t2, $a0, $t1
lw
$t3, 0($t2)
lw
$t4, 4($t2)
nop
slt
$t0, $t4, $t3
beq $t0, $zero, exit2
.........
addi $s1, $s1, -1
j
for2tst
exit2:
44
Pipelining the LC-3b
45
Pipelining the LC-3b

Let’s remember the single-bus datapath

We’ll divide it into 5 stages






Fetch
Decode/RF Access
Address Generation/Execute
Memory
Store Result
Conservative handling of data and control dependences


Stall on branch
Stall on flow dependence
46
An Example LC-3b Pipeline
48
49
50
51
52
53
Control of the LC-3b Pipeline

Three types of control signals

Datapath Control Signals


Control Store Signals


Control signals that control the operation of the datapath
Control signals (microinstructions) stored in control store to be
used in pipelined datapath (can be propagated to stages later
than decode)
Stall Signals

Ensure the pipeline operates correctly in the presence of
dependencies
54
55
Control Store in a Pipelined Machine
56
Stall Signals



Pipeline stall: Pipeline does not move because an operation
in a stage cannot complete
Stall Signals: Ensure the pipeline operates correctly in the
presence of such an operation
Why could an operation in a stage not complete?
57