Lecture 18: Pipelining • Today’s topics: Hazards and instruction scheduling

advertisement
Lecture 18: Pipelining
• Today’s topics:
 Hazards and instruction scheduling
 Branch prediction
 Out-of-order execution
1
Example 5
• Show the instruction occupying each stage in each cycle (with bypassing)
if I1 is R1+R2R3 and I2 is R3+R4R5 and I3 is R3+R8R9.
Identify the input latch for each input operand.
CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8
IF
IF
IF
IF
IF
IF
IF
IF
D/R
D/R
D/R
D/R
D/R
D/R
D/R
D/R
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
DM
DM
DM
DM
DM
DM
DM
DM
RW
RW
RW
RW
RW
RW
RW
RW
2
Example 5
• Show the instruction occupying each stage in each cycle (with bypassing)
if I1 is R1+R2R3 and I2 is R3+R4R5 and I3 is R3+R8R9.
Identify the input latch for each input operand.
CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8
IF
I1
IF
I2
IF
I3
IF
I4
IF
I5
IF
IF
IF
D/R
D/R
I1
D/R
I2
D/R
I3
D/R
I4
D/R
D/R
D/R
ALU
ALU
L3 L3
ALU
I1
L4 L3
ALU
I2
L5 L3
ALU
I3
ALU
ALU
ALU
DM
DM
DM
DM
I1
DM
I2
DM
I3
DM
DM
RW
RW
RW
RW
RW
I1
RW
I2
RW
I3
RW
Example 1
IF
add $1, $2, $3
lw
D/R
ALU
DM
RW
IF
D/R
ALU
DM
RW
IF
D/R
ALU
DM
RW
IF
D/R
ALU
DM
RW
$4, 8($1)
4
Example 2
IF
lw
lw
$1, 8($2)
$4, 8($1)
D/R
ALU
DM
RW
IF
D/R
ALU
DM
RW
IF
D/R
ALU
DM
RW
IF
D/R
ALU
DM
RW
5
Example 3
IF
lw
sw
$1, 8($2)
D/R
ALU
DM
RW
IF
D/R
ALU
DM
RW
IF
D/R
ALU
DM
RW
IF
D/R
ALU
DM
RW
$1, 8($3)
6
Control Hazards
• Simple techniques to handle control hazard stalls:
 for every branch, introduce a stall cycle (note: every
6th instruction is a branch!)
 assume the branch is not taken and start fetching the
next instruction – if the branch is taken, need hardware
to cancel the effect of the wrong-path instruction
 fetch the next instruction (branch delay slot) and
execute it anyway – if the instruction turns out to be
on the correct path, useful work was done – if the
instruction turns out to be on the wrong path,
hopefully program state is not lost
 make a smarter guess and fetch instructions from the
expected target
7
Branch Delay Slots
8
Source: H&P textbook
Pipeline without Branch Predictor
IF (br)
PC
Reg Read
Compare
Br-target
PC + 4
9
Pipeline with Branch Predictor
IF (br)
PC
Branch
Predictor
Reg Read
Compare
Br-target
10
Bimodal Predictor
14 bits
Branch PC
Table of
16K entries
of 2-bit
saturating
counters
11
2-Bit Prediction
• For each branch, maintain a 2-bit saturating counter:
if the branch is taken: counter = min(3,counter+1)
if the branch is not taken: counter = max(0,counter-1)
… sound familiar?
• If (counter >= 2), predict taken, else predict not taken
• The counter attempts to capture the common case for
each branch
12
Slowdowns from Stalls
• Perfect pipelining with no hazards  an instruction
completes every cycle (total cycles ~ num instructions)
 speedup = increase in clock speed = num pipeline stages
• With hazards and stalls, some cycles (= stall time) go by
during which no instruction completes, and then the stalled
instruction completes
• Total cycles = number of instructions + stall cycles
13
Multicycle Instructions
• Multiple parallel pipelines – each pipeline can have a different
number of stages
• Instructions can now complete out of order – must make sure
that writes to a register happen in the correct order
14
An Out-of-Order Processor Implementation
Reorder Buffer (ROB)
Instr 1
Instr 2
Instr 3
Instr 4
Instr 5
Instr 6
Branch prediction
and instr fetch
R1  R1+R2
R2  R1+R3
BEQZ R2
R3  R1+R2
R1  R3+R2
Instr Fetch Queue
Decode &
Rename
T1
T2
T3
T4
T5
T6
T1  R1+R2
T2  T1+R3
BEQZ T2
T4  T1+T2
T5  T4+T2
Register File
R1-R32
ALU
ALU
ALU
Results written to
ROB and tags
broadcast to IQ
Issue Queue (IQ)
15
Example Code
Completion times
ADD
ADD
LW
ADD
ADD
LW
ADD
ADD
R1, R2, R3
R4, R1, R2
R5, 8(R4)
R7, R6, R5
R8, R7, R5
R9, 16(R4)
R10, R6, R9
R11, R10, R9
with in-order
with ooo
5
6
7
9
10
11
13
14
5
6
7
9
10
7
9
10
16
Title
• Bullet
17
Download