Pipelining (Chapter 6) ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2016

advertisement
ELEC 5200-001/6200-001
Computer Architecture and Design
Spring 2016
Pipelining (Chapter 6)
Vishwani D. Agrawal
James J. Danaher Professor
Department of Electrical and Computer Engineering
Auburn University, Auburn, AL 36849
http://www.eng.auburn.edu/~vagrawal
vagrawal@eng.auburn.edu
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
1
ILP: Instruction Level Parallelism
Single-cycle and multi-cycle datapaths
execute one instruction at a time.
How can we get better performance?
Answer: Execute multiple instructions at a
time:
Pipelining – Enhance a multi-cycle datapath to
fetch one instruction every cycle.
Parallelism – Fetch multiple instructions every
cycle.
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
2
Automobile Team Assembly
1 hour
1 hour
1 hour
1 hour
1 car assembled every four hours
6 cars per day
180 cars per month
2,040 cars per year
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
3
Automobile Assembly Line
Task 1
1 hour
Mecahnical
Task 2
1 hour
Task 3
1 hour
Task 4
1 hour
Electrical
Painting
Testing
First car assembled in 4 hours (pipeline latency)
thereafter, 1 car completed per hour
21 cars on first day, thereafter 24 cars per day
717 cars per month
8,637 cars per year
What gives 4X increase?
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
4
Throughput: Team Assembly
Red car
started
Red car
completed
Mechanical Electrical Painting Testing Mechanical Electrical Painting Testing
Blue car
started
Blue car
completed
Time of assembling one car
Time
=
n
hours
where n is the number of nearly equal subtasks,
each requiring 1 unit of time
Throughput
Spring 2016, Feb 29 . . .
=
1/n
cars per unit time
ELEC 5200-001/6200-001 Lecture 6
5
Throughput: Assembly Line
Car 1 Mechanical Electrical
Car 2
Painting
Testing
Mechanical Electrical
Car 3
Painting
Mechanical Electrical
Painting
Mechanical Electrical
Car 4
Car 1
complete
.
.
Testing
Time to complete first car
=n
Cars completed in time T
=T–n+1
Throughput
= 1 – (n – 1)/ T
Throughput (assembly line)
───────────────────
Throughput (team assembly)
Spring 2016, Feb 29 . . .
Testing
Painting
Testing
Car 2
complete
time
time units (latency)
=
cars per unit time
1 – (n – 1)/ T
──────── =
1/n
ELEC 5200-001/6200-001 Lecture 6
n(n – 1)
n – ───── →
T
n
as T→∞
6
Some Features of Assembly Line
Electrical parts
delivered (JIT)
Task 1
1 hour
Mechanical
Stall assembly line
to fix the cause of
defect
Spring 2016, Feb 29 . . .
Task 2
1 hour
Task 3
1 hour
Task 4
1 hour
Electrical
Painting
Testing
3 cars in the assembly line are suspects,
to be removed (flush pipeline)
ELEC 5200-001/6200-001 Lecture 6
Defect
found
7
Pros and Cons
Advantages:
Efficient use of labor.
Specialists can do better job.
Just in time (JIT) methodology eliminates warehouse cost.
Disadvantages:
Penalty of defect latency.
Lack of flexibility in production.
Assembly line work is monotonous and boring.
https://www.youtube.com/watch?v=IjarLbD9r30
https://www.youtube.com/watch?v=ANXGJe6i3G8
https://www.youtube.com/watch?v=5lp4EbfPAtI
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
8
A Single Cycle Example
Delay of 1-bit full adder = 1ns
Clock period ≥ 32ns
a31
.
.
.
a2
a1
a0
b31
.
.
.
b2
b1
b0
1-b full
adder
1-b full
adder
1-b full
adder
c32
s31
.
.
.
s2
s1
s0
1-b full
adder
0
Time of adding words ~ 32ns
Time of adding bytes ~ 32ns
Clock
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
9
a31
.
.
.
a2
a1
a0
b31
.
.
.
b2
b1
b0
Delay of 1-bit full adder = 1ns
Clock period ≥ 1ns
Time of adding words ~ 32ns
Time of adding bytes ~ 8ns
1-b full
adder
c32
FF
Initialize
to 0
s31
.
.
.
s2
s1
s0
Shift
Shift
Shift
A Multicycle Implementation
Clock
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
10
A Pipeline Implementation
Delay of 1-bit full adder = 1ns
Clock period ≥ 1ns
a31
.
.
.
a2
a1
a0
b31
.
.
.
b2
b1
b0
P1
P2
P3 .. P31
1-b full
adder
1-b full
adder
1-b full
adder
c32
s31
.
.
.
s2
s1
s0
1-b full
adder
0
Time of adding words ~ 1ns
Time of adding bytes ~ 1ns
Clock
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
11
Pipelining in a Computer
Divide datapath into nearly equal tasks, to be
performed serially and requiring non-overlapping
resources.
Insert registers at task boundaries in the
datapath; registers pass the output data from
one task as input data to the next task.
Synchronize tasks with a clock having a cycle
time just enough to complete the longest task.
Break each instruction down into a set of tasks
so that instructions can be executed in a
staggered fashion.
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
12
Pipelining of RISC Instructions
(From Lecture 3, Slide 6)
Fetch
Instruction
Examine
Opcode
ALU
Operation
Memory
Read/Write
Store
Result
IF
ID
EX
MEM
WB
Instruction
Fetch
Decode
instruction and
Fetch operands
Execute
Memory
Operation
Write
Back
to Reg
file
Although an instruction takes five clock cycles,
one instruction is completed every cycle.
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
13
Pipelining a Single-Cycle Datapath
Instruction
class
Instr.
Instr.
Execution
Data
(ALU
access
fetch Decode
(IF) (also reg. Operation) (MEM)
file read)
(EX)
(ID)
Write
Back
(Reg.
file
write)
(WB)
Total
time
lw
2ns
1ns
2ns
2ns
1ns 8ns
sw
2ns
1ns
2ns
2ns
8ns
R-format
2ns
1ns
2ns
1ns 8ns
2ns
1ns
2ns
8ns
add, sub, and, or, slt
B-format, beq
No operation on data; idle times equalize instruction lengths.
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
14
Execution Time: Single-Cycle
0
lw $1, 100($0)
lw $2, 200($0)
lw $3, 300($0)
2
IF ID
4
EX
6
8
10
12
14
16
.
.
Time (ns)
MEM WB
IF ID
EX MEM WB
IF ID
EX MEM WB
Clock cycle time = 8 ns
Total time for executing three lw instructions = 24 ns
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
15
Pipelined Datapath
Instruction
class
Instr.
fetch
(IF)
Instr.
Decode
(also reg.
file read)
(ID)
Execution
(ALU
Operation)
(EX)
Data
access
(MEM)
Write
Back
(Reg.
file
write)
(WB)
2ns
1ns
2ns
2ns
1ns
2ns
R-format: add,
sub, and, or, slt
2ns
1ns
2ns
2ns
2ns
1ns
2ns
B-format:
beq
2ns
1ns
2ns
2ns
2ns
1ns
2ns
lw
sw
2ns
2ns
Total
time
2ns
1ns
2ns
10ns
2ns
1ns
2ns
10ns
10ns
10ns
No operation on data; idle time inserted to equalize instruction lengths.
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
16
Execution Time: Pipeline
0
lw $1, 100($0)
2
IF
lw $2, 200($0)
lw $3, 300($0)
4
6
8
ID
EX
MEM
IF
ID
EX
IF
ID
10
12
14
16
.
.
Time (ns)
RW
MEM RW
EX
MEM RW
Clock cycle time = 2 ns, four times faster than single-cycle clock
Total time for executing three lw instructions = 14 ns
Performance ratio
Spring 2016, Feb 29 . . .
=
Single-cycle time
────────────
Pipeline time
ELEC 5200-001/6200-001 Lecture 6
=
24
── = 1.7
14
17
Pipeline Performance
Clock cycle time = 2 ns
1,003 lw instructions:
Total time for executing 1,003 lw instructions
Performance ratio
=
=
2,014 ns
Single-cycle time
────────────
Pipeline time
=
8,024
──── = 3.98
2,014
80,024 / 20,014
=
3.998 → Clock cycle ratio (4)
10,003 lw instructions:
Performance ratio
=
Pipeline performance approaches clock-cycle ratio for long programs.
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
18
Single-Cycle Datapath
Instr.
mem.
16-20
11-15
RegWrite
ALU
1 mux 0
MEM: mem.
access
MemtoReg
ALUSrc
zero
MemWrite
MemRead
Data
mem.
0 mux 1
PC
1 mux 0
21-25
Branch
ALU
26-31
EX: Execute,
address calc.
1 mux 0
opcode
Reg. File
Add
4
CONTROL
ID: Instr. decode,
reg. file read
IF: Instr. fetch
WB:
write
back
RegDst
0-15
Sign
ext.
Shift
left 2
ALUOp
ALU
Cont.
0-5
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
19
This requires a
CONTROL not too
different from
single-cycle
16-20
11-15
ALU
zero
MemWrite
MemRead
Data
mem.
0 mux 1
Instr.
mem.
ALUSrc
ALU
PC
1 mux 0
21-25
MemtoReg
MEM/WB
RegWrite
1 mux 0
26-31
EX/MEM
Branch
Reg. File
opcode
CONTROL
4
ID/EX
Add
IF/ID
1 mux 0
Pipeline Registers
RegDst
0-15
Sign
ext.
Shift
left 2
ALUOp
ALU
Cont.
0-5
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
20
Pipeline Register Functions
Four pipeline registers are added:
Register
name
Data held
IF/ID
PC+4, Instruction word (IW)
ID/EX
PC+4, R1, R2, IW(0-15) sign ext., IW(11-15)
EX/MEM
PC+4, zero, ALUResult, R2, IW(11-15) or IW(16-20)
MEM/WB M[ALUResult], ALUResult, IW(11-15) or IW(16-20)
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
21
EX/MEM
Shift
left 2
opcode
ALU
4
ID/EX
Add
IF/ID
1 mux 0
Pipelined Datapath
MEM/WB
26-31
16-20
mem
Data
mem.
0 mux 1
PC
ALU
Instr
1 mux 0
21-25
Reg. File
zero
Sign
ext.
11-15 for R-type
16-20 for I-type lw
0-15
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
22
Five-Cycle Pipeline
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
CC5
MEM/WB
REG.
FILE
WRITE
DM
CC4
EX/MEM
CC3
ALU
ID, REG.
FILE
READ
ID/EX
CC2
IF/ID
IM
PC
CC1
23
Add Instruction
add
$t0, $s1, $s2
Machine instruction word
000000 10001 10010 01000 00000 100000
opcode $s1 $s2 $t0
function
IF
Spring 2016, Feb 29 . . .
ID
read $s1
read $s2
EX
add
$s1+$s2
MEM
ELEC 5200-001/6200-001 Lecture 6
REG.
FILE
WRITE
CC5
MEM/WB
DM
CC4
EX/MEM
CC3
ALU
ID, REG.
FILE
READ
ID/EX
CC2
IF/ID
IM
PC
CC1
WB
write $t0
24
EX/MEM
Shift
left 2
opcode
PC
mem
16-20
s2
Spring 2016, Feb 29 . . .
$s2
Sign
ext.
11-15 for R-type
16-20 for I-type lw
t0
$s1
addr
Data
mem
0 mux 1
Instr
zero
ALU
21-25
MEM/WB
s1
Reg. File
26-31
1 mux 0
4
ALU
ID/EX
Add
IF/ID
1 mux 0
Pipelined Datapath Executing add
data
0-15
ELEC 5200-001/6200-001 Lecture 6
25
Load Instruction
lw
$t0, 1200 ($t1)
100011 01001 01000 0000 0100 1000 0000
opcode $t1
$t0
1200
IF
Spring 2016, Feb 29 . . .
ID
read $t1
sign ext
1200
EX
MEM
add
read
$t1+1200 M[addr]
ELEC 5200-001/6200-001 Lecture 6
REG.
FILE
WRITE
CC5
MEM/WB
DM
CC4
EX/MEM
CC3
ALU
ID, REG.
FILE
READ
ID/EX
CC2
IF/ID
IM
PC
CC1
WB
write $t0
26
EX/MEM
Shift
left 2
opcode
PC
16-20
mem
Sign
ext.
11-15 for R-type
16-20 for I-type lw
t0
Spring 2016, Feb 29 . . .
$t1
addr
Data
mem
0 mux 1
Instr
zero
ALU
21-25
MEM/WB
t1
Reg. File
26-31
1 mux 0
4
ALU
ID/EX
Add
IF/ID
1 mux 0
Pipelined Datapath Executing lw
data
0-15
1200
ELEC 5200-001/6200-001 Lecture 6
27
Store Instruction
sw
$t0, 1200 ($t1)
101011 01001 01000 0000 0100 1000 0000
opcode $t1
$t0
1200
IF
Spring 2016, Feb 29 . . .
ID
read $t1
sign ext
1200
EX
MEM
add
write
$t1+1200 M[addr]
(addr)
← $t0
ELEC 5200-001/6200-001 Lecture 6
REG.
FILE
WRITE
CC5
MEM/WB
DM
CC4
EX/MEM
CC3
ALU
ID, REG.
FILE
READ
ID/EX
CC2
IF/ID
IM
PC
CC1
WB
28
Shift
left 2
opcode
PC
mem
16-20
t0
$t1
$t0
Sign
ext.
11-15 for R-type
16-20 for I-type lw
addr
Data
mem
0 mux 1
Instr
zero
ALU
21-25
MEM/WB
t1
Reg. File
26-31
ALU
EX/MEM
1 mux 0
4
ID/EX
Add
IF/ID
1 mux 0
Pipelined Datapath Executing sw
data
0-15
1200
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
29
Executing a Program
Consider a five-instruction segment:
lw
sub
add
lw
add
Spring 2016, Feb 29 . . .
$10, 20($1)
$11, $2, $3
$12, $3, $4
$13, 24($1)
$14, $5, $6
ELEC 5200-001/6200-001 Lecture 6
30
add $14, $5, $6
Spring 2016, Feb 29 . . .
EX/MEM
DM
MEM/WB
REG.
FILE
WRITE
ALU
EX/MEM
DM
MEM/WB
REG.
FILE
WRITE
IF/ID
ID, REG.
FILE
READ
ID/EX
ALU
EX/MEM
ELEC 5200-001/6200-001 Lecture 6
MEM/WB
REG.
FILE
WRITE
DM
MEM/WB
REG.
FILE
WRITE
EX/MEM
MEM/WB
REG.
FILE
WRITE
DM
DM
ALU
CC5
lw
$10, 20($1)
sub
$11, $2, $3
Program instructions
ALU
ALU
EX/MEM
IF/ID
ID, REG.
FILE
READ
ID/EX
CC4
IM
IF/ID
ID, REG.
FILE
READ
ID/EX
IM
IF/ID
ID, REG.
FILE
READ
ID/EX
CC3
PC
lw $13, 24($1)
IM
IF/ID
ID, REG.
FILE
READ
ID/EX
CC2
PC
add $12, $3, $4
PC
IM
PC
CC1
IM
Program Execution
time
31
CC5
4
ID/EX
EX/MEM
Add
IF/ID
Shift
left 2
opcode
1 mux 0
ID: lw $13, 24($1)
ALU
IF: add $14, $5, $6
MEM:
WB:
EX: add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1)
MEM/WB
26-31
16-20
mem
Data
mem.
0 mux 1
PC
ALU
Instr
1 mux 0
21-25
Reg. File
zero
Sign
ext.
11-15 for R-type
16-20 for I-type lw
0-15
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
32
Advantages of Pipeline
After the fifth cycle (CC5), one instruction is
completed each cycle; CPI ≈ 1, neglecting the
initial pipeline latency of 5 cycles.
– Pipeline latency is defined as the number of stages in
the pipeline, or
– The number of clock cycles after which the first
instruction is completed.
The clock cycle time is about four times shorter
than that of single-cycle datapath and about the
same as that of multicycle datapath.
For multicycle datapath, CPI = 3. ….
So, pipelined execution is faster, but . . .
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
33
Science is always wrong. It never solves a problem
without creating ten more.
George Bernard Shaw
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
34
Pipeline Hazards
Definition: Hazard in a pipeline is a
situation in which the next instruction
cannot complete execution one clock cycle
after completion of the present instruction.
Three types of hazards:
– Structural hazard (resource conflict)
– Data hazard
– Control hazard
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
35
Structural Hazard
Two instructions cannot execute due to a
resource conflict.
Example: Consider a computer with a
common data and instruction memory.
The fourth cycle of a lw instruction
requires memory access (memory read)
and at the same time the first cycle of the
fourth instruction requires instruction fetch
(memory read). This will cause a memory
resource conflict.
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
36
lw
$13, 24($1)
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
REG.
FILE
WRITE
IM/DM
IM/DM
REG.
FILE
WRITE
MEM/WB
REG.
FILE
WRITE
CC5
lw
$10, 20($1)
sub
$11, $2, $3
Program instructions
REG.
FILE
WRITE
MEM/WB
MEM/WB
IM/DM
ALU
EX/MEM
EX/MEM
ALU
MEM/WB
EX/MEM
IM/DM
ALU
CC4
ID, REG.
FILE
READ
ID/EX
IF/ID
EX/MEM
ALU
CC3
IM/DM
IF/ID
ID, REG.
FILE
READ
ID/EX
IM/DM
ID/EX
ID, REG.
FILE
READ
ID/EX
IF/ID
CC2
PC
Common data
and instr. Mem.
add $12, $3, $4
PC
IF/ID
ID, REG.
FILE
READ
CC1
IM/DM
PC
IM/DM
Example of Structural Hazard
time
Nedded by two instructions
37
Possible Remedies for Structural
Hazards
Provide duplicate hardware resources in
datapath.
Control unit or compiler can insert delays
(no-op cycles) between instructions. This
is known as pipeline stall or bubble.
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
38
lw $13, 24($1)
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
REG.
FILE
WRITE
IM/DM
Stall (bubble)
Program instructions
MEM/WB
REG.
FILE
WRITE
MEM/WB
EX/MEM
CC5
lw
$10, 20($1)
sub
$11, $2, $3
MEM/WB
REG.
FILE
WRITE
IM/DM
EX/MEM
ALU
IF/ID
ID, REG.
FILE
READ
ID/EX
IM/DM
ALU
MEM/WB
REG.
FILE
WRITE
IM/DM
CC4
IM/DM
EX/MEM
ALU
EX/MEM
ALU
IF/ID
ID, REG.
FILE
READ
ID/EX
CC3
ID, REG.
FILE
READ
ID/EX
IF/ID
IM/DM
IF/ID
ID, REG.
FILE
READ
ID/EX
CC2
PC
add $12, $3, $4
PC
IM/DM
PC
CC1
IM/DM
Stall (Bubble) for Structural Hazard
time
39
Data Hazard
Data hazard means that an instruction
cannot be completed because the needed
data, being generated by another
instruction in the pipeline, is not available.
Example: consider two instructions:
add $s0, $t0, $t1
sub $t2, $s0, $t3
Spring 2016, Feb 29 . . .
# needs $s0
ELEC 5200-001/6200-001 Lecture 6
40
Example of Data Hazard
time
CC5
Write s0 in CC5
sub
$s0, $t0, $t1
$t2, $s0, $t3
Program instructions
REG.
FILE
WRITE
add
MEM/WB
DM
MEM/WB
REG.
FILE
WRITE
EX/MEM
ALU
ALU
ID/EX
IF/ID
ID, REG.
FILE
READ
ID/EX
IM
DM
CC4
EX/MEM
CC3
IF/ID
ID, REG.
FILE
READ
CC2
PC
IM
PC
CC1
Read s0 and t3 in CC3
We need to read s0 from reg file in cycle 3
But s0 will not be written in reg file until cycle 5
However, s0 will only be used in cycle 4
And it is available at the end of cycle 3
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
41
Forwarding or Bypassing
Output of a resource used by an
instruction is forwarded to the input of
some resource being used by another
instruction.
Forwarding can eliminate some, but not
all, data hazards.
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
42
Forwarding for Data Hazard
time
CC5
sub
$s0, $t0, $t1
$t2, $s0, $t3
Program instructions
DM
REG.
FILE
WRITE
REG.
FILE
WRITE
add
MEM/WB
MEM/WB
Write s0 in CC5
EX/MEM
DM
ALU
EX/MEM
CC4
ID, REG.
FILE
READ
ID/EX
IF/ID
IM
ALU
CC3
IF/ID
ID, REG.
FILE
READ
ID/EX
CC2
PC
IM
PC
CC1
Read s0 and t3 in CC3
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
43
Forwarding Unit Hardware
Source reg.
IDs from
opcode
Spring 2016, Feb 29 . . .
MEM/WB
Data
Mem.
MUX
ALU
Data
to reg.
file
FORW. MUX
EX/MEM
FORW. MUX
ID/EX
Destination registers
Forwarding
Unit
ELEC 5200-001/6200-001 Lecture 6
44
Forwarding Alone May Not Work
time
CC5
lw
$s0, 20($s1)
sub
$t2, $s0, $t3
Program instructions
DM
REG.
FILE
WRITE
REG.
FILE
WRITE
MEM/WB
MEM/WB
Write s0 in CC5
EX/MEM
DM
ALU
ALU
EX/MEM
IM
ID/EX
CC4
IF/ID
ID, REG.
FILE
READ
ID/EX
CC3
IF/ID
ID, REG.
FILE
READ
CC2
PC
IM
PC
CC1
Read s0 and t3 in CC3
data needed by sub
(data hazard)
data available from memory
only at the end of cycle 4
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
45
Use Bubble and Forwarding
time
CC5
MEM/WB
REG.
FILE
WRITE
ALU
Write s0 in CC5
ID/EX
ALU
DM
CC4
EX/MEM
CC3
ID/EX
CC2
IF/ID
ID, REG.
FILE
READ
IM
PC
CC1
lw
$s0, 20($s1)
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
Program instructions
REG.
FILE
WRITE
MEM/WB
DM
EX/MEM
IF/ID
ID, REG.
FILE
READ
$t2, $s0, $t3
IM
sub
PC
stall
(bubble)
46
Hazard Detection Unit Hardware
Instruction
IF/ID
to reg.
file
register
IDs from
prev. instr.
Spring 2016, Feb 29 . . .
ALU
0
PC
EX/MEM
FORW. MUX
Control
ID/EX
FORW. MUX
Hazard
Detection
Unit
NOP MUX
Disable
write
Forwarding
Unit
ELEC 5200-001/6200-001 Lecture 6
MEM/WB
Data
Mem.
Control
signals
47
Resolving Hazards
Hazards are resolved by Hazard detection
and forwarding units.
Compiler’s understanding of how these
units work can improve performance.
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
48
Avoiding Stall by Code Reorder
C code:
A = B + E;
C = B + F;
MIPS code:
lw
$t1,
lw
$t2,
add
$t3,
sw
$t3,
lw
$t4,
add
$t5,
sw
$t5,
Spring 2016, Feb 29 . . .
0($t0)
4($t0)
$t1, $t2
12($t0)
8($t0)
$t1, $t4
16($t0)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ELEC 5200-001/6200-001 Lecture 6
$t1 written
$t2 written
$t1, $t2 needed
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
$t4 written
$t4 needed
.
.
.
.
.
49
Reordered Code
C code:
A = B + E;
C = B + F;
MIPS code:
lw
$t1,
lw
$t2,
lw
$t4,
add
$t3,
sw
$t3,
add
$t5,
sw
$t5,
Spring 2016, Feb 29 . . .
0($t0)
4($t0)
8($t0)
$t1, $t2
12($t0)
$t1, $t4
16($t0)
no hazard
no hazard
ELEC 5200-001/6200-001 Lecture 6
50
Control Hazard
Instruction to be fetched is not known!
Example: Instruction being executed is
branch-type, which will determine the next
instruction:
40
Spring 2016, Feb 29 . . .
add $4, $5, $6
beq $1, $2, 40
next instruction
...
and $7, $8, $9
ELEC 5200-001/6200-001 Lecture 6
51
next instruction or
and $7, $8, $9
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
Program instructions
CC5
MEM/WB
REG.
FILE
WRITE
DM
add
EX/MEM
ALU
MEM/WB
REG.
FILE
WRITE
MEM/WB
REG.
FILE
WRITE
CC4
DM
EX/MEM
Stall (bubble)
IF/ID
ID, REG.
FILE
READ
ID/EX
DM
CC3
ALU
EX/MEM
ALU
CC2
IM
IF/ID
ID, REG.
FILE
READ
ID/EX
IM
IF/ID
ID, REG.
FILE
READ
ID/EX
CC1
PC
beq $1, $2, 40
PC
IM
PC
Stall on Branch
time
$4, $5, $6
52
Why Only One Stall?
Extra hardware in ID phase:
Additional ALU to compute branch address
Comparator to generate zero signal
Hazard detection unit writes the branch address
in PC
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
53
Ways to Handle Branch
Stall or bubble
Branch prediction:
– Heuristics
Next instruction
Prediction based on statistics (dynamic)
Hardware decision (dynamic)
– Prediction error: pipeline flush
Delayed branch
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
54
Delayed Branch Example
Stall on branch
add $4, $5, $6
beq $1, $2, skip
next instruction
...
skip or $7, $8, $9
Delayed branch
beq $1, $2, skip
add $4, $5, $6
next instruction
...
skip or $7, $8, $9
Instruction executed irrespective
of branch decision
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
55
next instruction or
skip or $7, $8, $9
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
DM
ALU
Program instructions
REG.
FILE
WRITE
MEM/WB
DM
CC5
MEM/WB
REG.
FILE
WRITE
DM
MEM/WB
REG.
FILE
WRITE
EX/MEM
ID/EX
EX/MEM
ALU
EX/MEM
CC4
ID, REG.
FILE
READ
IF/ID
ALU
CC3
IM
ID/EX
IF/ID
ID, REG.
FILE
READ
ID/EX
CC2
IM
IF/ID
ID, REG.
FILE
READ
PC
add $4, $5, $6
IM
CC1
PC
beq $1, $2, skip
PC
Delayed Branch
time
56
Summary: Hazards
Structural hazards
– Cause: resource conflict
– Remedies: (i) hardware resources, (ii) stall (bubble)
Data hazards
– Cause: data unavailablity
– Remedies: (i) forwarding, (ii) stall (bubble), (iii) code
reordering
Control hazards
– Cause: out-of-sequence execution (branch or jump)
– Remedies: (i) stall (bubble), (ii) branch prediction/pipeline
flush, (iii) delayed branch/pipeline flush
Spring 2016, Feb 29 . . .
ELEC 5200-001/6200-001 Lecture 6
57
Download