ECE232

advertisement
ECE232: Hardware Organization and Design
Part 11: Pipelining
Chapter 4/6
http://www.ecs.umass.edu/ece/ece232/
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB
CPI Calculation
CPI stands for average number of Cycles Per Instruction
Assume an instruction mix of 24% loads, 12% stores, 44% Rformat, 18% branches, and 2% jumps
CPI = 0.24 * 5 + 0.12 * 4 + 0.44 * 4 + 0.18 * 3 + 0.02 * 3 =
4.04
Speedup?
Question: Can we achieve a CPI of 1???
ECE232: Pipelining I 2
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Speeding up through pipelining
Ann, Brian, Cathy, Dave each have one load
of clothes to wash, dry, and fold
A
• Washer takes 30 minutes
B
C
D
• Dryer takes 30 minutes
• “Folder” takes 30 minutes
• “Stasher” takes 30 minutes
to put clothes into drawers
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
ECE232: Pipelining I 3
Koren
Sequential Laundry
6 PM
T
a
s
k
O
r
d
e
r
A
7
8
9
10
11
12
1
2 AM
30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
Time
B
C
D
Sequential laundry takes 8 hours for 4 loads
If they learned pipelining, how long would laundry take?
ECE232: Pipelining I 4
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Pipelined Laundry: Start work ASAP
6 PM
T
a
s
k
8
7
10
9
30 30 30 30 30 30 30
11
12
2 AM
1
Time
A
B
C
O
r
d
e
r
D
Pipelined laundry takes 3.5 hours for 4 loads!
ECE232: Pipelining I 5
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Pipelining Lessons
6 PM
T
a
s
k
8
9
Time
30 30 30 30 30 30 30
A
B
O
r
d
e
r
7
C
D
ECE232: Pipelining I 6
Pipelining doesn’t help
latency of single task, it helps
throughput of entire workload
Multiple tasks operating
simultaneously using
different resources
Potential speedup = Number
pipe stages
Pipeline rate limited by
slowest pipeline stage
Unbalanced lengths of pipe
stages reduces speedup
Time to “fill” pipeline and
time to “drain” it reduces
speedup
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Pipelining Instructions
Time (in cycles)
Instruction
F
Fetch = 10 ns
Decode = 6 ns
Execute = 8 ns
Memory = 10 ns
Write back = 6 ns
D
EX
M
W
F
D
EX
M
W
F
D
EX
M
W
F
D
EX
M
W
F
D
EX
M
W
F
D
EX
M
ECE232: Pipelining I 7
W
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Single Cycle, Multiple Cycle, vs. Pipeline
Cycle 1
Cycle 2
Clk
Single Cycle Implementation:
Load
Store
Waste
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Multiple Cycle Implementation:
Load
Ifetch
Reg
Exec
Mem
Wr
Exec
Mem
Wr
Reg
Exec
Mem
Store
Ifetch
Reg
Exec
Mem
R-type
Ifetch
Pipeline Implementation:
Load Ifetch
Reg
Store Ifetch
R-type Ifetch
ECE232: Pipelining I 8
Reg
Exec
Wr
Mem
Wr
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Why Pipeline?
Suppose we execute 100 instructions
Single Cycle Machine
• 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
Multicycle Machine
• 10 ns/cycle x 4.04 CPI (for the given inst mix) x 100 inst
= 4040 ns
• Instruction mix of 24% loads, 12% stores, 44% R-format, 18%
branches, and 2% jumps
Ideal pipelined machine (with 5 stages)
• 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
Speedup=4.33 vs. single-cycle
3.88 vs. multi-cycle (for the given inst mix)
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
ECE232: Pipelining I 9
Koren
Why Pipeline? Because the resources are there!
Time (clock cycles)
Inst 5
ECE232: Pipelining I 10
Reg
Im
Reg
Dm
Reg
Dm
Im
Reg
Im
Reg
Reg
Reg
Dm
Reg
ALU
Inst 4
Im
Dm
ALU
Inst 3
Reg
ALU
Inst 2
Im
ALU
O
r
d
e
r
Inst 1
ALU
I
n
s
t
r.
Dm
Reg
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Pipelining Rules
Inst 4
IMem
Inst 3
Inst 2
Inst 1
ALU
Inst 5
DMem
Reg
Reg
Forward traveling signals at each stage are latched
Only perform logic on signals in the same stage
• signal labeling useful to prevent errors,
• e.g., IRR, IRA, IRM, IRW
Backward travelling signals at each stage represent hazards
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
ECE232: Pipelining I 11
Koren
MIPS Pipelined Datapath
State registers between pipeline stages to isolate them
IF:IFetch
ID:Dec
Inst 5
Inst 4
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Inst 3
Inst 2
Inst 1
Add
16
Sign
Extend
ALU
Data
Memory
Address
Read
Data
Write Data
Mem/WB
File
Write Addr Read
Data 2
Write Data
Add
Exec/Mem
Read
Address
Read Addr 1
Register Read
1
Read Addr Data
2
Dec/Exec
PC
Instruction
Memory
Shift
left 2
IFetch/Dec
4
32
System Clock
ECE232: Pipelining I 12
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Pipeline Hazards
Data hazards: an instruction uses the result of a previous
instruction (RAW)
ADD R1, R2, R3
or
SW
R1, 4(R2)
SUB
R4, R1, R5
LW
R3, 4(R2)
Control hazards: the address of the next instruction to be
executed depends on a previous instruction
BEQ R1,R2,CONT
SUB R6,R7,R8
…
CONT: ADD R3,R4,R5
Structural hazards: two instructions need access to the same
resource
• e.g., single memory shared for instruction fetch and
load/store
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
ECE232: Pipelining I 13
Koren
Structural Hazard
Time (clock cycles)
Inst 4
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
ALU
Inst 3
Mem
Reading data from
memory
Mem
ALU
Inst 2
Reg
ALU
Inst 1
Mem
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
Mem
Reg
Mem
Reading instruction
from memory
Reg
Mem
Reg
Reg
Fix with separate instruction and data memories (I$ and D$)
ECE232: Pipelining I 14
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Data Hazards (RAW)
Time (in cycles)
Instruction
F
D
EX
M
W
Write Data to R1 Here
F
D
EX
M
W
Get data from R1 Here
ADD
SUB
ECE232: Pipelining I 15
R1, R2, R3
R4, R1, R5
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
One Way to handle a Data Hazard
O
r
d
e
r
add $1,…
IM
Reg
ALU
I
n
s
t
r.
DM
By waiting –
introducing
stalls – but
impacts CPI
Reg
stall
stall
stall
ECE232: Pipelining I 16
IM
Reg
ALU
sub $4,$1,$5
DM
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Reg
Koren
Must allow Wr/Rd in REG in same cycle
Split cycle into two halves
Time (clock cycles)
Im
Reg
Inst 3
Im
Dm
Reg
Dm
Im
Reg
Im
Reg
Inst 5
ECE232: Pipelining I 17
Reg
Reg
Dm
Reg
ALU
Inst 4
Reg
ALU
Im
Dm
ALU
Inst 2
Reg
ALU
O
r
d
e
r
Inst 1
ALU
I
n
s
t
r.
Dm
Reg
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Only two stall cycles
IM
Reg
DM
Reg
IM
Reg
ALU
O
r
d
e
r
add $1,…
ALU
I
n
s
t
r.
Write in 1st half,
Read in 2nd half
IM
Reg
stall
stall
sub $4,$1,$5
ECE232: Pipelining I 18
ALU
and $6,$1,$7
DM
Reg
DM
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Reg
Koren
Another Way to “Fix” a Data Hazard
Time
by forwarding
IM
Reg
IM
Reg
IM
Reg
ALU
sub $4,$1,$5
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,…
ALU
I
n
s
t
r.
IM
Reg
and $6,$1,$7
or
$8,$1,$9
DM
Reg
DM
DM
Reg
DM
ALU
xor $4,$1,$5
Reg
Reg
DM
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
ECE232: Pipelining I 19
Reg
Koren
Register File (write and then read)
Time (clock cycles)
Inst 2
Reg
IM
Reg
IM
Reg
DM
Reg
DM
Reg
DM
ALU
or $8,$1,$9
IM
ALU
Inst 1
Reg
ALU
O
r
d
e
r
IM
ALU
I
n
s
t
r.
add $1,
Fix register file
access hazard by
doing reads in the
second half of the
cycle and writes in
the first half
Reg
DM
Reg
clock edge that controls loading
of pipeline state registers
ECE232: Pipelining I 20
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Internal data forwarding
Reg
IM
Reg
IM
Reg
IM
Reg
and $6,$1,$7
or
$8,$1,$9
DM
xor $4,$1,$5
Reg
DM
DM
Reg
DM
ALU
IM
ALU
sub $4,$1,$5
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,…
ALU
I
n
s
t
r.
Fix data hazards
by forwarding
results as soon as
they are available
to where they are
Reg
needed
Reg
DM
Reg
ALU-to-ALU forwarding vs. full forwarding
ECE232: Pipelining I 21
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Forwarding with Load-use Data Hazards
and $6,$1,$7
or
$8,$1,$9
Reg
IM
Reg
IM
Reg
IM
Reg
DM
Reg
DM
Reg
DM
Reg
DM
ALU
xor $4,$1,$5
IM
ALU
sub $4,$1,$5
Reg
ALU
$1,4($2) IM
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
Reg
DM
Reg
sub needs to stall
Will still need one stall cycle even with forwarding
ECE232: Pipelining I 22
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Injecting Bubbles
IF
ID
EX
MEM
WB
and
sub
lw
Inst -1
Inst -2
and
sub
bubble
lw
Inst -1
Add
Inst –2
Inst –1
lw
sub
and
ECE232: Pipelining I 23
16
Sign
Extend
ALU
Read
Data
Address
Write Data
Mem/WB
File
Write Addr Read
Data 2
Write Data
Data
Memory
Exec/Mem
Read
Address
Read Addr 1
Register Read
1
Read Addr Data
2
Dec/Exec
PC
Instruction
Memory
Add
Shift
left 2
IFetch/Dec
4
32
System Clock
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
3 Types of Data Hazards
RAW (read after write)
• only hazard for ‘fixed’ pipelines
• later instruction must read after earlier instruction writes
F
D
EX
M
W
F
D
EX
M
add $1,$2,$3
sub $4,$1,$5
W
WAW (write after write)
• variable-length pipeline
• later instruction must write after earlier instruction writes
F
D
E1
E2
E3
E4
F
D
EX
M
W
E5
W
div $1,$4,$3
add $1,$2,$5
WAR (write after read)
• instruction with late read (e.g., waiting for an execution unit)
• later instruction must write after earlier instruction reads
mlt $4,$1,$3
add $1,$2,$5
ECE232: Pipelining I 24
F
D
s1
s2
s3
s4
F
D
EX
M
W
s5
E1
E2
E3
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
W
Koren
Control Hazard
Time (in cycles)
Instruction
F
D
EX
M
W
Destination Available Here
F
D
EX
M
W
Need Destination Here
XX:
JR
...
ADD
R25
...
Simple solution: Flush Instruction fetch until branch resolved
ECE232: Pipelining I 25
Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass
Koren
Download