Pipelining
CS365
Lecture 9
Outline
Today’s topic
Pipelining is an implementation technique in which multiple instructions are overlapped in execution
Subset of MIPS instructions
lw, sw, and, or, add, sub, slt, beq
Outline
Pipeline high-level introduction
Stages, hazards
Pipelined datapath and control design
D. Barbara
2
Pipeline CS465
Pipelining is Natural!
Laundry example
Ann, Brian, Cathy, Dave each has one load of clothes
A B C D to wash, dry, and fold
Washer takes 30 minutes
Dryer takes 40 minutes
“Folder” takes 20 minutes
D. Barbara
3
Pipeline CS465
e r
O r d
T a s k
Sequential Laundry
6 PM 7 8 9
Time
10 11 Midnight
30 40 20 30 40 20 30 40 20 30 40 20
A
B
C
D
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
Pipeline CS465 D. Barbara
4
Pipelined Laundry
6 PM 7 8 9
Time
10 11 Midnight
30 40 40 40 40 20 s k
T a
A
B d e
O r r
C
D
Start work ASAP
Pipelined laundry takes 3.5 hours for 4 loads
Pipeline CS465 D. Barbara
5
T a s k e r
O r d
Pipelining Lessons (I)
6 PM 7 8 9
Time
Multiple tasks operating simultaneously using different resources
30 40 40 40 40 20
A
Pipelining doesn’t help latency of single task, it helps throughput of entire workload
B
C
D
Pipeline rate is limited by slowest pipeline stage
Unbalanced lengths of pipeline stages reduces speedup
D. Barbara
6
Pipeline CS465
e r
O r d
T a s k
Pipelining Lessons (II)
6 PM 7 8 9
Potential speedup =
Number pipeline stages
Time
A
30 40 40 40 40 20
Time to “ fill ” pipeline and time to “ drain ” it reduces speedupstartup and wind down
B
C
D
Stall for dependencies
D. Barbara
7
Pipeline CS465
Five Stages of Workload
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Load Ifetch Reg/Dec Exec Mem Wr
Ifetch : Instruction Fetch
Fetch the instruction from the Instruction Memory
Reg/Dec : Registers Fetch and Instruction
Decode
Exec : Calculate the memory address
Mem : Read the data from the Data Memory
Wr : Write the data back to the register file
D. Barbara
8
Pipeline CS465
Clk
Single Cycle, Multi-Cycle, Pipeline
Cycle 1 Cycle 2
Single Cycle Implementation:
Load Store Waste
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Multiple Cycle Implementation:
Load
Ifetch Reg Exec Mem Wr
Store
Ifetch Reg Exec Mem
R-type
Ifetch
Pipeline Implementation:
Load Ifetch Reg Exec Mem Wr
Store Ifetch Reg Exec Mem Wr
Pipeline
R-type Ifetch Reg Exec Mem Wr
CS465 D. Barbara
9
Why Pipeline? (Performance)
Suppose we execute 100 instructions
Single cycle machine
45 (ns/cycle) x 1 (CPI) x 100 (inst) = 4500 ns
Multicycle machine
10 (ns/cycle) x 4.4 (CPI) (due to inst mix) x
100 (inst) = 4400 ns
Ideal pipelined machine
10 (ns/cycle) x (1 (CPI) x 100 (inst) + 4 cycle drain) = 1040 ns
10
D. Barbara Pipeline CS465
Pipelining Throughput
Ideal speedup is no. of stages in the pipeline; in practice:
Pipeline stage time are limited by the slowest resource, either the
ALU or memory access
Fill and drain time
Pipeline CS465
11
D. Barbara
Why Pipeline? (Resource)
Time (clock cycles) s t r.
I n
Inst 0
Inst 1
O r d e r
Inst 2
Inst 3
Inst 4
Im Reg
Im Reg
Dm Reg
Im Reg
Dm Reg
Im Reg
Dm Reg
Im Reg
Dm Reg
Dm Reg
12
D. Barbara Pipeline CS465
Pipeline Hazards
Hazards prevent next instruction from executing during its designated clock cycle
Structural hazards : attempt to use the same resource two different ways at the same time
E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)
One memory port
Data hazards : attempt to use data before it is ready
E.g., one sock of pair in dryer and one in washer; can’t fold until you get sock from washer through dryer
Instruction depends on result of prior instruction still in the pipeline
Control hazards : attempt to make a decision before condition is evaluated
Branch instructions
13
D. Barbara Pipeline CS465
Structural Hazard: One Memory
Time (clock cycles) s t
I n r.
Load
Instr 1
O r d e r
Instr 2
Instr 3
Instr 4
Mem Reg
Mem Reg
Mem Reg
Mem Reg
Mem
Mem Reg
Reg
Mem Reg
Mem Reg
Mem Reg
Mem Reg
Pipeline
•Solution 1: add more HW
•Hazards can always be resolved by waiting
CS465
14
D. Barbara
Structural Hazard: One Memory
Time (clock cycles) s t
I n r.
Load
Instr 1
O r d e r
Instr 2 stall
Instr 3
Mem Reg
Mem Reg
Mem Reg
Mem Reg
Mem Reg
Mem Reg
Bubble Bubble Bubble Bubble Bubble
Mem Reg Mem Reg
Pipeline
•Hazards can always be resolved by waiting
CS465
15
D. Barbara
Data Hazard Example
Data hazard: an instruction depends on the result of a previous instruction still in the pipeline add r1 ,r2,r3 sub r4, r1 ,r3 and r6, r1 ,r7 or r8, r1 ,r9 xor r10, r1 ,r11
16
D. Barbara Pipeline CS465
O r d e r
I n s t r.
Data Hazard Example
Dependences backward in time are hazards
Time (clock cycles)
I add r1 ,r2,r3
ID/R E
X
ME
M
Dm
W sub r4, r1 ,r3 and r6, r1 ,r7
Im Reg
Im Reg
Dm Reg
Dm or r8, r1 ,r9 xor r10, r1 ,r11
Im Reg
Im Reg
Reg
Dm Reg
Dm Reg
Compilers can help, but it gets messy and difficult
17
D. Barbara Pipeline CS465
Data Hazard Solution
Time (clock cycles)
I add r1 ,r2,r3
I n s t r.
sub r4, r1 ,r3 and r6, r1 ,r7
O r d e r or r8, r1 ,r9 xor r10, r1 ,r11
ID/R E
X
Im Reg
Im
ME
M
Dm
Reg
Im
Dm
W
Reg
Im
Reg
Dm
Reg
Reg
Dm Reg
Dm Reg
Solution : “forward” result from one stage to another
Pipeline CS465
18
D. Barbara
Data Hazard Even with Forwarding
Time (clock cycles)
I lw r1 ,0(r2) sub r4, r1 ,r3
ID/R E
X
Im Reg
ME W
Dm Reg
Can’t go back in time! Must delay/stall instruction dependent on loads
19
D. Barbara Pipeline CS465
Data Hazard Even with Forwarding
Time (clock cycles)
I lw r1 ,0(r2) sub r4, r1 ,r3
ID/R E
X
Stall
Im
ME W
Reg Dm Reg
Must delay/stall instruction dependent on loads
Sometimes the instruction sequence can be reordered to avoid pipeline stalls
Pipeline CS465
20
D. Barbara
Control Hazards
Branch instructions may change execution flow
Suppose we can do decoding/branch decision/branch target computation at stage 2
Still introduce 1-cycle stall
Implementation details later
Pipeline CS465
21
D. Barbara
Control Hazard Solution: Predict
Predict: guess one direction then back up if wrong
Impact: 0 lost cycles per branch instruction if right, 1 if wrong
Need to “Squash” and restart following instruction if wrong
Prediction scheme
Random prediction: correct 50% of time
History-based prediction: correct 90% of time
22
D. Barbara Pipeline CS465
Control Hazard Solution: Predict
Pipeline CS465
23
D. Barbara
Pipeline Overview Summary
Pipelining is a fundamental concept
Multiple steps using distinct resources
Utilize capabilities of the datapath by pipelined instruction processing
Start next instruction while working on the current one
Detect and resolve hazards
Structural hazards, data hazards, control hazards
All hazards can be solved by stall
Other approaches: forwarding, prediction, reordering
In modern processors, what really makes it hard:
Exception handling
Out-of-order execution
Next: datapath design for pipeling
24
D. Barbara Pipeline CS465
Single Cycle Datapath
Pipeline CS465
25
D. Barbara
Multi Cycle Datapath
Divide the work into stages; internal registers
Pipeline CS465
26
D. Barbara
Single-Cycle Pipeline Datagram
What do we need to add to split the datapath into stages?
Pipeline CS465
27
D. Barbara
Pipelined Datapath
64 128 97 64
How many bits stored in each pipeline register?
Pipeline CS465
28
D. Barbara
Observations
5-stage pipeline
IF, ID, EX, MEM, WB
Left-to-right flow of instructions
Instructions and data move generally from left to right
Two exceptions: WB stage and the selection of PC
May lead to data hazards and control hazards
Why there is no pipeline register at the end of the
WB stage?
Last stage must update either register file, or memory, or PC
Pipeline CS465
29
D. Barbara
Pipelining the Load Instruction
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Clock
1st lw Ifetch
2nd lw
Reg/Dec
Ifetch
Exec
Reg/Dec
Mem Wr
Exec Mem Wr
3rd lw Ifetch Reg/Dec Exec Mem Wr
The five independent functional units in the pipeline datapath are:
Instruction Memory for the IF stage
Register File’s Read Ports (busA and busB) for the ID stage
ALU for the EXE stage
Data Memory for the MEM stage
Register File’s Write port (bus W) for the WB stage
Pipeline CS465
30
D. Barbara
The Four Stages of R-type
Cycle 1 Cycle 2 Cycle 3 Cycle 4
R-type Ifetch Reg/Dec Exec Wr
IF : Instruction Fetch
Fetch the instruction from the Instruction Memory
ID : Registers Fetch and Instruction Decode
EXE : ALU operates on the two register operands
WB : Write the ALU output back to the register file
Pipeline CS465
31
D. Barbara
Pipelining R-type and Load Instruction
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Clock
R-type Ifetch
R-type
Reg/Dec Exec Wr
Ifetch Reg/Dec Exec Wr
Oops! We have a problem!
Load Ifetch
R-type
Reg/Dec
Ifetch
Exec
Reg/Dec
Mem Wr
Exec Wr
R-type Ifetch Reg/Dec Exec Wr
We have pipeline conflict or structural hazard:
Two instructions try to write to the register file at the same time!
Only one write port
Pipeline CS465
32
D. Barbara
Important Observation
Each functional unit can only be used once per instruction
Each functional unit must be used at the same stage for all instructions
Delay Rtype’s register write by one cycle:
Now Rtype instructions also use Reg File’s write port at Stage 5
Mem stage is a NO-OP stage: nothing is being done
Pipeline
R-type
1 2 3
Ifetch Reg/Dec Exec
Store Ifetch Reg/Dec Exec
Beq Ifetch Reg/Dec Exec
CS465
4
Mem
Mem
Mem
5
Wr
Wr
Wr
33
D. Barbara
Pipelined Execution
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Clock
R-type Ifetch Reg/Dec Exec
R-type Ifetch Reg/Dec
Mem Wr
Exec Mem Wr
Load Ifetch Reg/Dec Exec Mem Wr
Exec Mem Wr R-type Ifetch Reg/Dec
R-type Ifetch Reg/Dec Exec Mem Wr
All instruction types have five pipeline stages
Some stages may be wasted for some instructions
34
D. Barbara Pipeline CS465
Pipelined Execution of Load Instruction
Pipeline CS465
35
D. Barbara
Pipelined Execution of Load Instruction
Pipeline CS465
36
D. Barbara
Pipelined Execution of Load Instruction
Pipeline CS465
37
D. Barbara
Pipelined Execution of Load Instruction
Pipeline CS465
38
D. Barbara
Pipelined Execution of Load Instruction
Pipeline CS465
39
D. Barbara
Pipelined Execution of Store Instruction
Pipeline CS465
40
D. Barbara
Pipelined Execution of Store Instruction
Pipeline CS465
41
D. Barbara
Observations from Load and Store
Pass information needed from an earlier stage to a latter stage
Each logical component of the datapath – such as IM, Reg read ports, ALU, DM, Reg write port – can be used only within a single pipeline stage.
Otherwise, we would have structural hazard
A bug in the pipelined datapath for load. Can you tell?
42
D. Barbara Pipeline CS465
Modified Datapath
•For basic R-Type, LW/SW, and BEQ
Pipeline CS465
43
D. Barbara
Pipelined Execution for Multiple Instr.
Pipeline CS465
44
D. Barbara
Pipelined Execution for Multiple Instr.
Pipeline CS465
45
D. Barbara
Pipelined Execution for Multiple Instr.
Pipeline CS465
46
D. Barbara
Pipelined Execution for Multiple Instr.
Pipeline CS465
47
D. Barbara
Pipelined Execution for Multiple Instr.
Pipeline CS465
48
D. Barbara
Pipelined Execution for Multiple Instr.
Pipeline CS465
49
D. Barbara
Pipelined Datapath Control
Fig. 6.22
Pipeline CS465
50
D. Barbara
Overview on Datapath Control op
RegDst
ALUSrc
MemtoReg
RegWrite
MemWrite
Branch
Jump
ExtOp
ALUop
00 0000
R-type
1
1
0
0
0
0
0 x
“R-type”
00 1101 10 0011 10 1011 00 0100 00 0010 ori
0 lw
0 sw x beq x jump x
1
0
1
0
1
1
1
0
1 x
0
1
0 x
0
0 x x
0
0
0
Or
0
0
0
0
1
Add
0
0
1
Add
1
0 x
Subtract x xx
0
1
For the subset of instructions under consideration
ALUOp = 00 for Add, 01 for Sub, and 10 for R-type
51
D. Barbara Pipeline CS465
Observations
No write control for all pipeline registers and PC since they are updated at every clock cycle
To specify the control for the pipeline, set the control values during each pipeline stage
Control lines can be divided into 5 groups:
IF
ID
ALU –
–
–
MEM –
WB –
NONE
NONE
RegDst, ALUOp, ALUSrc
Branch, MemRead, MemWrite
MemtoReg, RegWrite
Group these nine control lines into 3 subsets:
ALUControl, MEMControl, WBControl
Control signals are generated at ID stage, how to pass them to other stages?
Pipeline CS465
52
D. Barbara
Pass Control Signals
Extend the pipeline registers to include control information
Pipeline CS465
53
D. Barbara
The Complete Pipelined Datapath
Fig 6.27
Pipeline CS465
54
D. Barbara
Example Pipeline Execution
Show the five instructions going through the pipeline: lw $10, 20($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9
Note that these instructions are independent from each other!
Pipeline CS465
55
D. Barbara
Clock1
Pipeline CS465
56
D. Barbara
Clock2
Pipeline CS465
57
D. Barbara
Clock3
Pipeline CS465
58
D. Barbara
Clock4
Pipeline CS465
59
D. Barbara
Clock5
Pipeline CS465
60
D. Barbara
Clock6
Pipeline CS465
61
D. Barbara
Clock7
Pipeline CS465
62
D. Barbara
Clock8
Pipeline CS465
63
D. Barbara
Clock9
Pipeline CS465
64
D. Barbara
Summary
Overview of pipeline
Stages
Hazards
Pipelined datapath
Pipeline registers
Pipelined execution
Pipelined control
Different signals for different stages
Propagate control signals
Pipeline CS465
65
D. Barbara
Next Lecture
Topic:
Pipeline hazards and solutions
Exception handling
Reading
Patterson & Hennessy Ch6.4-6.9
Pipeline CS465
66
D. Barbara