Why Pipeline?

advertisement
IKI20210
Pengantar Organisasi Komputer
Kuliah no. 25: Pipeline
Sumber:
1. Hamacher. Computer Organization, ed-4.
2. Materi kuliah CS152, th. 1997, UCB.
10 Januari 2003
Bobby Nazief (nazief@cs.ui.ac.id)
Johny Moningka (moningka@cs.ui.ac.id)
bahan kuliah: http://www.cs.ui.ac.id/~iki20210/
1
Pipeline
Salah Satu Cara Mempercepat Eksekusi Instruksi
2
Pipelining is Natural!
° Laundry Example
° Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
A
B
C
D
° Washer takes 30 minutes
° Dryer takes 40 minutes
° “Folder” takes 20 minutes
3
Sequential Laundry
6 PM
7
8
9
10
11
Midnight
Time
30 40 20 30 40 20 30 40 20 30 40 20
T
a
s
k
A
B
O
r
d
e
r
C
D
° Sequential laundry takes 6 hours for 4 loads
° If they learned pipelining, how long would laundry
take?
4
Pipelined Laundry: Start work ASAP
6 PM
7
8
9
10
11
Midnight
Time
30 40
T
a
s
k
40
40
40 20
A
B
O
r
d
e
r
C
D
° Pipelined laundry takes 3.5 hours for 4 loads
5
Pipelining Lessons
6 PM
7
8
9
Time
30 40
T
a
s
k
O
r
d
e
r
40
40
40 20
° Pipelining doesn’t help
latency of single task, it
helps throughput of entire
workload
° Pipeline rate limited by
slowest pipeline stage
A
° Multiple tasks operating
simultaneously using
different resources
B
° Potential speedup =
Number pipe stages
C
° Unbalanced lengths of
pipe stages reduces
speedup
D
° Time to “fill” pipeline and
time to “drain” it reduce
speedup
° Stall for Dependences
6
Pipelining Instruction Execution
7
Kilas Balik: Tahapan Eksekusi Instruksi
Instruksi:
Add
R1,(R3)
; R1  R1 + M[R3]
Langkah-langkah:
1.
Fetch instruksi
1. PCout, MARin, Read, Clear Y, Set carry-in to ALU, Add, Zin
2. Zout, PCin, WMFC
3. MDRout, IRin
2.
Fetch operand #1 (isi lokasi memori yg ditunjuk oleh R3)
4. R3out, MARin, Read
5. R1out, Yin, WMFC
3.
Lakukan operasi penjumlahan
6. MDRout, Add, Zin
4.
Simpan hasil penjumlahan di R1
7. Zout, R1in, End
8
The Five Stages of (MIPS) Load Instruction
Cycle 1 Cycle 2
Load Ifetch
Reg/Dec
Cycle 3 Cycle 4 Cycle 5
Exec
Mem
Wr
° Ifetch: Instruction Fetch
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec: Calculate the memory address
° Mem: Read the data from the Data Memory
° Wr: Write the data back to the register file
Load/Store Architecture:
access to/from memory only by Load/Store instructions
9
Pipelined Execution
Time
IFetch Dcd
Exec
IFetch Dcd
Mem
WB
Exec
Mem
WB
Exec
Mem
WB
Exec
Mem
WB
Exec
Mem
WB
Exec
Mem
IFetch Dcd
IFetch Dcd
IFetch Dcd
Program Flow
IFetch Dcd
WB
° Overlapping instruction execution
° Maximum number instructions executed
simultaneously = number of stages
10
Why Pipeline?
° Non-pipeline machine
• 10 ns/cycle x 4.6 CPI (due to instr mix) x 100 inst = 4600 ns
° Ideal pipelined machine
• 10 ns/cycle x (4 cycle fill + 1 CPI x 100 inst) = 1040 ns
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Non-pipeline Implementation:
Load
Ifetch
Store
Reg
Exec
Mem
Wr
Exec
Mem
Wr
Reg
Exec
Mem
Ifetch
R-type
Reg
Exec
Mem
Ifetch
Pipeline Implementation:
Load Ifetch
Reg
Store Ifetch
R-type Ifetch
Reg
Exec
Wr
Mem
Wr
11
Why Pipeline? Because the resources are there!
Time (clock cycles)
Inst 4
Reg
Im
Reg
Dm
Reg
Dm
Im
Reg
Im
Reg
Reg
Reg
Dm
Reg
ALU
Inst 3
Im
Dm
ALU
Inst 2
Reg
ALU
Inst 1
Im
ALU
O
r
d
e
r
Inst 0
ALU
I
n
s
t
r.
Dm
Reg
12
Restructuring Datapath
13
Partitioning the Datapath (1/2)
Store
Source
(Register)
Operands
Store
Results
Result Store
MemWr
RegDst
RegWr
Reg.
File
Data
Mem
MemRd
MemWr
Exec
Mem
Access
ALUctr
ALUSrc
ExtOp
Store
Instruction
Operand
Fetch
Instruction
Fetch
PC
Next PC
nPC_sel
° Add registers between smallest steps
Store
Read-Data
(from
Memory)
14
Partitioning the Datapath (2/2)
Load
Reg.
File
Mem
Access
B
Cycle 1
Cycle 2
Cycle 3
Ifetch
Reg/Dec
Exec
Cycle 4
Mem
M
Data
Mem
PC
Next PC
Equal
WB Ctrl
IRwb
IRmem
R
Mem Ctrl
Ex Ctrl
IRexe
A
ALU
Dcd Ctrl
Reg
File
IR
Inst. Mem
Valid
Cycle 5
Wr
15
Pipeline Hazards
16
Can pipelining get us into trouble?
° Yes: Pipeline Hazards
• structural hazards: attempt to use the same resource two
different ways at the same time
- E.g., combined washer/dryer would be a structural hazard
or folder busy doing something else (watching TV)
• data hazards: attempt to use item before it is ready
- E.g., one sock of pair in dryer and one in washer; can’t fold
until get sock from washer through dryer
- instruction depends on result of prior instruction still in the
pipeline
• control hazards: attempt to make a decision before condition is
evaluated
- E.g., washing football uniforms and need to get proper
detergent level; need to see after dryer before next load in
- branch instructions
° Can always resolve hazards by waiting
• pipeline control must detect the hazard
• take action (or delay action) to resolve hazards
17
Single Memory is a Structural Hazard
Time (clock cycles)
Instr 4
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
ALU
Instr 3
Reg
ALU
Instr 2
Mem
Mem
ALU
Instr 1
Reg
ALU
O
r
d
e
r
Load
Mem
ALU
I
n
s
t
r.
Mem
Reg
Detection is easy in this case! (right half highlight means read, left half write)
18
Control Hazard Solutions
° Stall: wait until decision is clear
• Its possible to move up decision to 2nd stage by adding
hardware to check registers as being read
Beq
Load
Reg
Mem
Mem
Reg
Reg
Mem
Reg
Mem
Reg
ALU
Add
Mem
ALU
O
r
d
e
r
Time (clock cycles)
ALU
I
n
s
t
r.
Mem
Reg
° Impact: 2 clock cycles per branch instruction
=> slow
19
Control Hazard Solutions
° Predict: guess one direction then back up if wrong
• Predict not taken
Beq
Load
Reg
Mem
Mem
Reg
Reg
Mem
Reg
Mem
Reg
ALU
Add
Mem
ALU
O
r
d
e
r
Time (clock cycles)
ALU
I
n
s
t
r.
Mem
Reg
° Impact: 1 clock cycles per branch instruction if
right, 2 if wrong (right 50% of time)
° More dynamic scheme: history of 1 branch ( 90%)
20
Control Hazard Solutions
° Redefine branch behavior (takes place after next
instruction) “delayed branch”
Misc
Load
Mem
Mem
Reg
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
ALU
Beq
Reg
ALU
Add
Mem
ALU
O
r
d
e
r
Time (clock cycles)
ALU
I
n
s
t
r.
Mem
Reg
° Impact: 0 clock cycles per branch instruction if can
find instruction to put in “slot” ( 50% of time)
° As launch more instruction per clock cycle, less useful
21
Data Hazard on r1
add r1 ,r2,r3
sub r4, r1 ,r3
and r6, r1 ,r7
or r8, r1 ,r9
xor r10, r1 ,r11
22
Data Hazard on r1:
• Dependencies backwards in time are hazards
Time (clock cycles)
IF
Dm
Im
Reg
Dm
Im
Reg
Dm
Im
Reg
Dm
Im
Reg
ALU
xor r10,r1,r11
Reg
ALU
or r8,r1,r9
WB
ALU
and r6,r1,r7
MEM
ALU
O
r
d
e
r
sub r4,r1,r3
Im
EX
ALU
I
n
s
t
r.
add r1,r2,r3
ID/RF
Reg
Reg
Reg
Reg
Dm
Reg
23
Data Hazard Solution:
• “Forward” result from one stage to another
Time (clock cycles)
IF
Dm
Im
Reg
Dm
Im
Reg
Dm
Im
Reg
Dm
Im
Reg
ALU
xor r10,r1,r11
Reg
ALU
or r8,r1,r9
WB
ALU
and r6,r1,r7
MEM
ALU
O
r
d
e
r
sub r4,r1,r3
Im
EX
ALU
I
n
s
t
r.
add r1,r2,r3
ID/RF
Reg
Reg
Reg
Reg
Dm
Reg
24
Forwarding Structure
IAU
npc
I mem
Regs
op rw rs rt
Forward
mux
B
A
im
n op rw
alu
S
n op rw
PC
° Detect nearest
valid write op
operand register
and forward into
op latches,
bypassing
remainder of the
pipe
• Increase muxes to
add paths from
pipeline registers
• Data Forwarding =
Data Bypassing
D mem
m
n op rw
Regs
25
Forwarding (or Bypassing): What about Loads
• Dependencies backwards in time are hazards
Time (clock cycles)
IF
MEM
Reg
Dm
Im
Reg
ALU
sub r4,r1,r3
Im
EX
ALU
lw r1,0(r2)
ID/RF
WB
Reg
Dm
Reg
• Can’t solve with forwarding:
• Must delay/stall instruction dependent on loads
26
Execution Delay/Stall
Time (clock cycles)
IF
Reg
MEM
WB
Dm
Reg
Im
Reg
ALU
Im
EX
ALU
lw r1,0(r2)
ID/RF
no-op
sub r4,r1,r3
Dm
Reg
27
Download