Uploaded by Scooby Doo Villain McGregor

L01 A Pipelining Basic Concepts (1)

advertisement
Course on: “Advanced Computer Architectures”
Pipelining: Basic Concepts
Prof. Cristina Silvano
Politecnico di Milano
email: cristina.silvano@polimi.it
Outline








Reduced Instruction Set of MIPS Processor
Implementation of MIPS Processor
Performance Optimization through Pipelining
MIPS Processor Pipeline
The Problem of Pipeline Hazards
The Solution of Data Hazards
MIPS Otpimized Pipeline
Performance Evaluation in Pipelining
Cristina Silvano – Politecnico di Milano
-2-
Spring 2021
Main Characteristics of MIPS Architecture


RISC (Reduced Instruction Set Computer) Architecture
Based on the concept of executing only simple instructions in a
reduced basic cycle to optimize the performance of CISC CPUs.
LOAD/STORE Architecture
ALU operands come from the CPU general purpose registers and they
cannot directly come from the memory.
Dedicated instructions are necessary to:
•
•

load data from memory to registers
store data from registers to memory
Pipeline Architecture:
Performance optimization technique based on the overlapping of the
execution of multiple instructions derived from a sequential
execution flow.
Cristina Silvano – Politecnico di Milano
-3-
Spring 2021
Reduced Instruction Set of MIPS Processor

ALU instructions:
add $s1, $s2, $s3
addi $s1, $s1, 4

Load/store instructions:
lw $s1, offset ($s2)
sw $s1, offset ($s2)

# $s1  $s2 + $s3
# $s1  $s1 + 4
# $s1  M[$s2+offset]
# M[$s2+offset]  $s1
Branch instructions to control the control flow of the program:
•
•
Conditional branches: the branch is taken only if the condition is satisfied.
Examples: beq (branch on equal) and bne (branch on not equal)
beq $s1, $s2, L1
# go to L1 if ($s1 == $s2)
bne $s1, $s2, L1
# go to L1 if ($s1 != $s2)
Unconditional jumps: the branch is always taken.
Examples: j (jump) and jr (jump register)
j
L1
# go to L1
jr $s1
# go to add. contained in $s1
Cristina Silvano – Politecnico di Milano
-4-
Spring 2021
Formats of MIPS 32-bit Instructions



Type R (Register)
• ALU Instructions
Type I (Immediate)
• Immediate Instructions
• Load/store instructions
• Conditional branch instructions
Tipo J (jump)
• Unconditional jumps instructions
R
I
J
31
6-bit
op
op
26
25
5-bit
rs
rs
op
Cristina Silvano – Politecnico di Milano
21
20
5-bit
16
15
rt
rt
5-bit
rd
11
10
5-bit
6
5
6-bit
0
shamt
funct
offset/immediate
address
-5-
Spring 2021
Phases of execution of MIPS Instructions
Every instruction in the MIPS subset can be implemented
in at most 5 clock cycles (phases) as follows:
1) Instruction Fetch (IF):
•
Send the content of Program Counter register to Instruction
Memory and fetch the current instruction from Instruction
Memory.
Update the PC to the next sequential address by adding 4 to the
PC (since each instruction is 4 bytes).
2) Instruction Decode and Register Read (ID):
•
•
Decode the current instruction (fixed-field decoding) and read
from the Register File of one or two registers corresponding to
the registers specified in the instruction fields.
Sign-extension of the offset field of the instruction in case it is
needed.
Cristina Silvano – Politecnico di Milano
-6-
Spring 2021
Phases of execution of MIPS Instructions
3) Execution (EX):
The ALU operates on the operands prepared in the previous
cycle depending on the instruction type:
• Register-Register ALU Instructions:
•
•
Register-Immediate ALU Instructions:
•
•
ALU executes the specified operation on the first operand read from the
RF and the sign-extended immediate operand
Memory Reference:
•
•
ALU executes the specified operation on the operands read from the RF
ALU adds the base register and the offset to calculate the effective
address.
Conditional branches:
•
Compare the two registers read from RF and compute the possible branch
target address by adding the sign-extended offset to the incremented PC.
Cristina Silvano – Politecnico di Milano
-7-
Spring 2021
Phases of execution of MIPS Instructions

Memory Access (ME)
•
•
•

Load instructions require a read access to the Data Memory using
the effective address
Store instructions require a write access to the Data Memory
using the effective address to write the data from the source
register read from the RF
Conditional branches can update the content of the PC with the
branch target address, if the conditional test yielded true.
Write-Back Cycle (WB)
•
•
Load instructions write the data read form memory in the
destination register of the RF
ALU instructions write the ALU results into the destination
register of the RF.
Cristina Silvano – Politecnico di Milano
-8-
Spring 2021
Phases of execution of MIPS Instructions
# $x  $y + $z
ALU Instructions: op $x,$y,$z
Instr. Fetch
&. PC Increm.
Read of Source
Regs. $y and $z
ALU OP
($y op $z)
Load Instructions: lw $x,offset($y)
Instr. Fetch
& PC Increm.
Read of Base
Reg. $y
Read of Base Reg.
$y & Source $x
# $x  M[$y + offset]
ALU Op.
Read Mem.
($y+offset)
M($y+offset)
Store Instructions: sw $x,offset($y)
Instr. Fetch
& PC Increm.
Write Back of
Destinat. Reg. $x
Write Back of
Destinat. Reg. $x
# M[$y + offset] $x
ALU Op.
Write Mem.
($y+offset)
M($y+offset)
Conditional Branch: beq $x,$y,offset
Instr. Fetch
& PC Increm.
Read of Source
Regs. $x and $y
Cristina Silvano – Politecnico di Milano
ALU Op. ($x-$y)
& (PC+4+offset)
-9-
Write
PC
Spring 2021
Implementation of MIPS Processor
Cristina Silvano – Politecnico di Milano
- 10 -
Spring 2021
Basic Implementation of MIPS data path
Data
PC
Address
Instruction
Instruction
memory
Register #
Registers
Register #
ALU
Address
Data
memory
Register #
Data


Instruction Memory (read-only memory) separated from Data
Memory
32 General-Purpose Registers organized in a Register File (RF) with
2 read ports and 1 write port.
Cristina Silvano – Politecnico di Milano
- 11 -
Spring 2021
Implementation of ALU and Load/Store
Instructions
WR
[25-21]
Ins truction [20-16]
M
U
[15-11] X
[15-0]
R egis ter
R ead 1
R egis ter
R ead 2
Content
R eg. 1
R eg is ter F ile
Write
R egis ter
Write
Data
WR
OP
Content
R eg. 2
AL U Zero
M
U
X
R es ult
R ead
Addres s
Write
Addres s
Write
Data
RD
R ead
Data
M
U
X
Data
Memory
S ign
16 bit E xtens ion 32 bit
Cristina Silvano – Politecnico di Milano
- 12 -
Spring 2021
Implementation of Conditional Branch
Instructions
2-bit Left
S hifter
WR
[25-21] R egis ter
R ead 1
Ins truction [20-16] R egis ter
R ead 2
Content
R eg. 1
R eg is ter F ile
Write
R egis ter
Write
Data
[15-0]
Content
R eg. 2
Adder
Zero
Branch Target
Addres s
PC +4
(form fetch)
AL U
S ign
16 bit E xtens ion 32 bit
Cristina Silvano – Politecnico di Milano
- 13 -
Spring 2021
Implementation of MIPS data path
+4
Adder
Adder
2-bit Left
S hifter
WR
PC
R ead
Addres s
Ins truction
Ins truction
Memory
I
i i
[25-21]
[20-16]
M
U
[15-11] X
[15-0]
Cristina Silvano – Politecnico di Milano
R egis ter
R ead 1
R egis ter
R ead 2
Content
R eg. 2
WR
OP
Content
R eg. 1
R eg is ter F ile
Write
R egis ter
Write
Data
M
U
X
AL U Zero
M
U
X
R es ult
R ead
Addres s
Write
Addres s
Write
Data
RD
R ead
Data
Data
Memory
S ign
16 bit E xtens ion 32 bit
- 14 -
Spring 2021
M
U
X
Implementation of MIPS data path
with Control Unit
+4
Adder
Adder
M
U
X
2-bit Left
S hifter
PC
R ead
Addres s
Ins truction
Ins truction
Memory
[25-21]
[20-16]
M
U
[15-11] X
R egis ter
R ead 1
R agis ter
R ead 2
Content
R eg. 1
Content
R eg. 2
16 bit
M
U
X
B
OP
Branch
[31-26]
Write
Data
R ead Data
M
U
X
Data
Memory
S ign
E xtens ion 32 bit
Des tination
R egis ter
MemWR
MemR D
MemToR eg
Control
Unit
ALU_op
[5-0]
Cristina Silvano – Politecnico di Milano
R es ult
R ead
Addres s
Write
Addres s
ALU_opB
R eg WR
[15-0]
Zero
AL U
R eg is ter F ile
Write
R egis ter
Write
Data
A
- 15 -
ALU
Control
Unit
Spring 2021
MIPS PIPELINING
Cristina Silvano – Politecnico di Milano
- 16 -
Spring 2021
Pipelining




Performance optimization technique based on the overlap of the
execution of multiple instructions deriving from a sequential
execution flow.
Pipelining exploits the parallelism among instructions in a sequential
instruction stream.
Basic idea:
The execution of an instruction is divided into different phases
(pipelines stages), requiring a fraction of the time necessary to
complete the instruction.
The stages are connected one to the next to form the pipeline:
instructions enter in the pipeline at one end, progress through the
stages, and exit from the other end, as in an assembly line.
Cristina Silvano – Politecnico di Milano
- 17 -
Spring 2021
Pipelining



Advantage: technique transparent for the programmer.
Technique similar to a assembly line: a new car exits
from the assembly line in the time necessary to
complete one of the phases.
An assembly line does not reduce the time necessary to
complete a car, but increases the number of cars
produced simultaneously and the frequency to complete
cars.
Cristina Silvano – Politecnico di Milano
- 18 -
Spring 2021
Sequential vs. Pipelining Execution
I1
IF
ID
EX
I2
MEM
WB
IF
ID
EX
10 ns
MEM
Time
IF
ID
EX
MEM
WB
I2
2 ns
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
2 ns
IF
ID
EX
MEM
2 ns
I4
I5
Cristina Silvano – Politecnico di Milano
…
10 ns
I1
I3
WB
2 ns
- 19 -
WB
Spring 2021
Pipelining




The time to advance the instruction of one stage in the
pipeline corresponds to a clock cycle.
The pipeline stages must be synchronized: the duration
of a clock cycle is defined by the time requested by the
slower stage of the pipeline (i.e. 2 ns).
The goal is to balance the length of each pipeline stage
If the stages are perfectly balanced, the ideal speedup
due to pipelining is equal to the number of pipeline
stages.
Cristina Silvano – Politecnico di Milano
- 20 -
Spring 2021
Performance Improvement

Ideal case (asymptotically): If we consider the multicycle unpipelined CPU3 composed of 5 cycles of 2 ns and
the pipelined CPU2 with 5 stages of 2 ns :
• The latency (total execution time) of each instruction
is not varied (10 ns)
• The throughput (number of instructions completed in
the time unit) is improved of 5 times:
(1 instruction completed every 10 ns) vs.
(1 instruction completed every 2 ns)
Cristina Silvano – Politecnico di Milano
- 21 -
Spring 2021
Pipeline Execution of MIPS Instructions
IF
Instruction Fetch
ID
Instruction Decode
Cristina Silvano – Politecnico di Milano
EX
Execution
- 22 -
ME
Memory Access
WB
Write Back
Spring 2021
Pipeline Execution of MIPS Instructions
IF
Instruction Fetch
ID
Instruction Decode
ALU Instructions: op $x,$y,$z
Instr. Fetch
& PC Increm.
Read of Source
Regs. $y and $z
EX
Execution
ME
Memory Access
WB
Write Back
# $x  $y + $z
Write Back
Destinat. Reg. $x
ALU Op.
($y op $z)
Load Instructions: lw $x,offset($y) # $x  M[$y + offset]
Instr. Fetch
& PC Increm.
Read of Base
Reg. $y
ALU Op.
Read Mem.
($y+offset)
M($y+offset)
Write Back
Destinat. Reg. $x
Store Instructions: sw $x,offset($y) # M[$y + offset] $x
Instr. Fetch
& PC Increm.
Read of Base Reg.
ALU Op.
Write Mem.
$y & Source $x
($y+offset)
M($y+offset)
Conditional Branches: beq $x,$y,offset
Instr. Fetch
& PC Increm.
Read of Source
Regs. $x and $y
Cristina Silvano – Politecnico di Milano
ALU Op. ($x-$y)
& (PC+4+offset)
- 23 -
Write
PC
Spring 2021
Implementation of MIPS pipeline

The division of the execution of each instruction in 5
stages implies that in each clock cycle 5 instructions are
in execution
 the implementation of pipelined CPU with 5 stages
must be composed of 5 modules corresponding to 5
execution stages
 we need pipeline registers to separate the different
stages
Cristina Silvano – Politecnico di Milano
- 24 -
Spring 2021
Implementation of MIPS pipeline
ID — Instruction Decode
M
U
X
IF /ID
+4
EX — Execution
ID/E X
MEM — Memory Access
WB —
Write Back
ME M/WB
E X/ME M
Adder
Adder
2-bit Left
S hifter
WR
PC
R ead
Addres s
Ins truction
[25-21]
[20-16]
R egis ter
R ead 1
R egis ter
R ead 2
M
[15-11] U
X
R egis ter
Write
Write
Data
[15-0]
IF — Instruction Fetch
Content
regis ter 1
AL U
RF
Ins truction
Memory
16 bit
Cristina Silvano – Politecnico di Milano
M
U
X
Content
regis ter 2
S ign
extens ion
WR
OP
Zero
R es ult
R ead
Addres s
Write
Addres s
Write
Data
RD
R ead Data
Data
Memory
32 bit
- 25 -
Spring 2021
M
U
X
Resources used during the pipeline execution
I1
I2
I3
IM
2 ns
REG
IM
2 ns
I4
A
L
U
DM
REG
REG
A
L
U
DM
REG
IM
REG
A
L
U
DM
REG
REG
A
L
U
DM
REG
REG
A
L
U
DM
2 ns
I5
IM
2 ns
IM
Time
REG
IM = Instruction Memory
REG = Register File
DM = Data Memory
Cristina Silvano – Politecnico di Milano
- 26 -
Spring 2021
The Problem of Pipeline Hazards
Cristina Silvano – Politecnico di Milano
- 27 -
Spring 2021
The Problem of Pipeline Hazards



A hazard (conflict) is created whenever there is a
dependence between instructions, and instructions are
close enough that the overlap caused by pipelining would
change the order of access to the operands involved in
the dependence.
Hazards prevent the next instruction in the pipeline from
executing during its designated clock cycle.
Hazards reduce the performance from the ideal speedup
gained by pipelining.
Cristina Silvano – Politecnico di Milano
- 28 -
Spring 2021
Three Classes of Hazards
1) Structural Hazards: Attempt to use the same resource
from different instructions simultaneously
• Example: Single memory for instructions and data
2) Data Hazards: Attempt to use a result before it is ready
• Example: Instruction depending on a result of a
previous instruction still in the pipeline
3) Control Hazards: Attempt to make a decision on the
next instruction to execute before the condition is
evaluated
• Example: Conditional branch execution
• Control hazards will be studied in the next lesson
Cristina Silvano – Politecnico di Milano
- 29 -
Spring 2021
Structural Hazards

No structural hazards in MIPS architecture:
• Instruction Memory separated from Data Memory
• Register File used in the same clock cycle: Read access by an
instruction and write access by another instruction
I1
I2
I3
IM
2 ns
REG
IM
2 ns
I4
I5
Cristina Silvano – Politecnico di Milano
A
L
U
DM
REG
REG
A
L
U
DM
REG
IM
REG
A
L
U
DM
REG
REG
A
L
U
DM
REG
REG
A
L
U
DM
2 ns
IM
2 ns
IM
- 30 -
Time
REG
Spring 2021
Data Hazards


If the instruction executed in the pipeline are
dependent, data hazards can arise when instructions are
too close
Example:
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15,100($2)
Cristina Silvano – Politecnico di Milano
#
#
#
#
#
Reg. $2 written by sub
1° operand ($2) depends on sub
2° operand ($2) depend on sub
1° ($2) & 2° ($2) depend on sub
Base reg. ($2) depends on sub
- 31 -
Spring 2021
Data Hazards: Example
sub
$2, $1, $3
IF
and $12, $2, $5
or
$13, $6, $2
add $14, $2, $2
sw
ID
EX
ME
WB
IF
ID
EX
ME
WB
IF
ID
EX
ME
WB
IF
ID
EX
ME
WB
IF
ID
EX
ME
WB
$15,100($2)
Cristina Silvano – Politecnico di Milano
- 32 -
Spring 2021
The Solution of Data Hazards
Cristina Silvano – Politecnico di Milano
- 33 -
Spring 2021
Data Hazards: Possible Solutions

Compilation Techniques:
a) Insertion of nop (no operation) instructions
b) Instructions scheduling to avoid that correlating
instructions are too close
•
•

The compiler tries to insert independent instructions among
correlating instructions
When the compiler does not find independent instructions, it
insert nops.
Hardware Techniques:
c) Insertion of stalls or “bubbles” in the pipeline
d) Data forwarding or bypassing
Cristina Silvano – Politecnico di Milano
- 34 -
Spring 2021
a) Insertion of nops: Example
sub
$2, $1, $3
nop
IF
ID
EX
ME
WB
IF
ID
EX
ME
WB
IF
ID
EX
ME
WB
IF
ID
EX
ME
WB
IF
ID
EX
ME
WB
IF
ID
EX
ME
WB
IF
ID
EX
ME
WB
IF
ID
EX
ME
nop
nop
and $12, $2, $5
or
$13, $6, $2
add $14, $2, $2
sw
$15,100($2)
Cristina Silvano – Politecnico di Milano
- 35 -
Spring 2021
WB
b) Scheduling: Example

Example:
sub
$2, $1, $3
sub
$2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
add $4, $10, $11
and $7, $8, $9
lw $16, 100($18)
sw $15,100($2)
and
add $4, $10, $11
or $13, $6, $2
and $7, $8, $9
add
lw $16, 100($18)
sw $15,100($2)
Cristina Silvano – Politecnico di Milano
- 36 -
$12, $2, $5
$14, $2, $2
Spring 2021
c) Insertion of Stalls: Example
sub
$2, $1, $3
and $12, $2, $5
or
IF
ID
EX
ME
WB previous instructions should continue…
IF
stall
stall
stall
ID
EX
ME
WB
stall
stall
stall
IF
ID
EX
ME
WB
IF
ID
EX
ME
WB
IF
ID
EX
ME
$13, $6, $2
add $14, $2, $2
sw
WB
$15,100($2)
Cristina Silvano – Politecnico di Milano
- 37 -
Spring 2021
d) Forwarding


Data forwarding uses temporary results stored in the
pipeline registers instead of waiting for the write back of
results in the RF.
We need to add multiplexers at the inputs of ALU to
fetch inputs from pipeline registers to avoid the insertion
of stalls in the pipeline.
Cristina Silvano – Politecnico di Milano
- 38 -
Spring 2021
Forwarding: Example
EX/EX
path
ID
EX
ME
WB
IF
ID
EX
ME
WB
$13, $6, $2
IF
ID
EX
ME
WB
add $14, $2, $2
MEM/ID
path
IF
ID
EX
ME
WB
IF
ID
EX
ME
sub
$2, $1, $3
IF
MEM/EX
path
and $12, $2, $5
or
sw
WB
$15,100($2)
Cristina Silvano – Politecnico di Milano
- 39 -
Spring 2021
Forwarding Paths
EX/EX path
I1
I2
IM
REG
RD
IM
I3
I4
A
L
U
DM
REG
RD
A
L
U
IM
REG
RD
IM
I5

MEM/EX path
REG
WR
DM
MEM/ID path
REG
WR
A
L
U
DM
REG
RD
A
L
U
IM
REG
RD
REG
WR
DM
A
L
U
REG
WR
DM
REG
WR
Three data forwarding paths:
 EX/EX path
 MEM/EX path
 MEM/ID path
Cristina Silvano – Politecnico di Milano
- 40 -
Spring 2021
Implementation of MIPS with
Forwarding Unit
PC
Memoria
Istruzioni
ID/EX
Instruction
IF/ID
Reg.
WB path
M
u
x
M
u
x
MEM/WB
Memoria
Dati
ALU
M
u
x
M
u
x
M
u
x
IF/ID.RegisterRs
Rs
IF/ID.RegisterRt
Rt
IF/ID.RegisterRt
Rt
IF/ID.RegisterRd
Rd
M
u
x
EX/MEM.RegisterRd
Forwarding
unit
MEM/ID path
MEM/EX path
Cristina Silvano – Politecnico di Milano
EX/MEM
- 41 -
MEM/WB.RegisterRd
EX/EX path
Spring 2021
Data Hazards: Load/Use Hazard
L1: lw $s0, 4($t1)
# $s0 <- M [4 + $t1]
L2: add $s5, $s0, $s1 # 1° operand depends from L1
lw
add
$s0, 4($t1)
CK1
CK2
CK3
CK4
CK5
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
$s5,$s0,$s1
Cristina Silvano – Politecnico di Milano
- 42 -
CK6
CK7
WB
Spring 2021
Data Hazards: Load/Use Hazard

lw
add
With forwarding using the MEM/EX path: 1 stall needed
$s0, 4($t1)
CK1
CK2
CK3
CK4
CK5
IF
ID
EX
MEM
WB
ID
EX
$s5,$s0,$s1
Cristina Silvano – Politecnico di Milano
IF
- 43 -
CK6
CK7
MEM
WB
Spring 2021
Data Hazards: Load/Store Hazard
L1: lw $s0, VECTA($t1)
L2: sw $s0, VECTB($t1)
lw $s0, VECTA($t1)
# $s0 <- M [VECTA + $t1]
# M [VECTB + $t1] <- $s0
CK1
CK2
CK3
CK4
CK5
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
sw $s0, VECTB ($t1)
Cristina Silvano – Politecnico di Milano
- 44 -
CK6
CK7
WB
Spring 2021
Data Hazards: Load/Store Hazard

With forwarding by introducing the MEM/MEM path:
solved
lw $s0, VECTA($t1)
CK1
CK2
CK3
CK4
CK5
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
sw $s0, VECTB ($t1)
Cristina Silvano – Politecnico di Milano
- 45 -
CK6
CK7
WB
Spring 2021
Forwarding Paths
EX/EX path
I1
I2
IM
REG
RD
IM
I4

DM
REG
RD
A
L
U
IM
I3
I5
A
L
U
REG
RD
IM
Ordine di esecuzione
delle istruzioni
MEM/EX path
REG
WR
Tempo
MEM/ID path
DM
REG
WR
A
L
U
DM
REG
RD
A
L
U
IM
Four data forwarding paths:
 EX/EX path
 MEM/EX path
 MEM/ID path
 MEM/MEM path (for LOAD/STOREs)
REG
RD
MEM/MEM path
REG
WR
DM
A
L
U
REG
WR
DM
REG
WR
Spring 2021
MIPS Optimized Pipeline
Cristina Silvano – Politecnico di Milano
- 47 -
Spring 2021
MIPS Optimized Pipeline


Register File used in 2 stages: Read access during ID and
write access during WB
What happens if read and write refer to the same
register in the same clock cycle?
•


It is necessary to insert one stall
Optimized Pipeline: we assume the RF read occurs in
the second half of clock cycle and the RF write in the
first half of clock cycle
What happens if read and write refer to the same
register in the same clock cycle?
•
It is not necessary to insert one stall
Cristina Silvano – Politecnico di Milano
- 48 -
Spring 2021
Resources Used in the Optimized Pipeline
I1
I2
I3
IM
2 ns
REG
IM
2 ns
DM
REG
REG
A
L
U
DM
REG
IM
REG
A
L
U
DM
REG
REG
A
L
U
DM
REG
REG
A
L
U
DM
2 ns
I4
I5
A
L
U
Ordine di esecuzione
delle istruzioni
IM
2 ns
IM
Tempo
REG
IM = Instruction Memory
REG = Register File
DM = Data Memory
Cristina Silvano – Politecnico di Milano
- 49 -
Spring 2021
Data Hazards in the Optimized Pipeline:
Example
sub
$2, $1, $3
and $12, $2, $5
or
$13, $6, $2
IM
2 ns
REG
IM
2 ns
add $14, $2, $2
sw
A
L
U
DM
REG
REG
A
L
U
DM
REG
IM
REG
A
L
U
DM
REG
REG
A
L
U
DM
REG
REG
A
L
U
DM
2 ns
$15,100($2)
Ordine di esecuzione
delle istruzioni
IM
2 ns
IM
Tempo
REG
It is necessary to insert two stalls
Cristina Silvano – Politecnico di Milano
- 50 -
Spring 2021
Forwarding Paths in the Optimized
Pipeline
EX/EX path
I1
I2
I3
I4
I5

IM
REG
RD
IM
A
L
U
DM
REG
RD
A
L
U
IM
REG
RD
IM
MEM/EX path
REG
WR
DM
REG
WR
A
L
U
DM
REG
RD
A
L
U
IM
REG
RD
MEM/MEM path
REG
WR
DM
A
L
U
REG
WR
DM
REG
WR
Only three data forwarding paths:
 EX/EX path
 MEM/EX path
 MEM/MEM path (for LOAD/STOREs)
- 51 -
Spring 2021
Data hazards: RAW, WAW, WAR
Cristina Silvano – Politecnico di Milano
- 52 -
Spring 2021
Data Hazards

Data hazards analyzed up to now are:
1) RAW (READ AFTER WRITE) hazard: instruction n+1
tries to read a source register before the previous
instruction n has written it in the RF.
• Example:
add $r1, $r2, $r3
sub $r4, $r1, $r5
•
By using forwarding, it is always possible to solve this
conflict without introducing stalls, except for the
load/use hazards where it is necessary to add one
stall
Cristina Silvano – Politecnico di Milano
- 53 -
Spring 2021
Data Hazards

Other types of data hazards in the pipeline:
2) WAW (WRITE AFTER WRITE) hazard
3) WAR (WRITE AFTER READ) hazard

WAW and WAR hazards occur more easily when
instructions are executed out-of-order such as in
multi-cycle operations to execute floating point
arithmetic or to access the data memory (load/store)
Cristina Silvano – Politecnico di Milano
- 54 -
Spring 2021
Data Hazards: WAW (WRITE AFTER WRITE)

WAW (WRITE AFTER WRITE) hazard: Instruction n+1
tries to write a destination operand before it has been
written by the previous instruction n  write operations
executed in the wrong order (out-of-order)
• WAW hazards could not occur in the MIPS pipeline
because all the register write operations occur in the
WB stage.
• WAW hazards could occur in the MIPS pipeline when
extending to handle multi-cycle operations to
execute or to access the data memory because in this
case instructions can complete in a different order
than they were issued.
Cristina Silvano – Politecnico di Milano
- 55 -
Spring 2021
Data Hazards: WAW (WRITE AFTER WRITE)

Example: If we assume the register write in the ALU
instructions occurs in the fourth stage and that load
instructions require two stages (MEM1 and MEM2) to
access the data memory, we can have:
lw
add
CK1
CK2
CK3
CK4
IF
ID
EX
MEM1
IF
ID
EX
$r1, 0($r2)
$r1,$r2,$r3
Cristina Silvano – Politecnico di Milano
- 56 -
CK5
MEM2
CK6
CK7
WB
WB
Spring 2021
Data Hazards: WAW (WRITE AFTER WRITE)

Example: If we assume the floating point ALU operations
require a multi-cycle execution, we can have:
mul
$f6,$f2,$f2
add
$f6,$f2,$f2
CK1
CK2
CK3
CK4
CK5
CK6
CK7
CK8
IF
ID
MUL1
MUL2
MUL3
MUL4
MEM
WB
IF
ID
AD1
AD2
MEM
WB
Cristina Silvano – Politecnico di Milano
- 57 -
Spring 2021
Data Hazards: WAR (WRITE AFTER READ)

WAR (WRITE AFTER READ) hazard: Instruction n+1 tries
to write a destination operand before it has been read
from the previous instruction n
 instruction n reads the wrong value. For example:
sw $y, 0($x)
addi $x, $x, 4
•
•
# sw has to read $x
# addi writes Sx
WAR hazards could not occur in the MIPS pipeline because
Read Operands always occur in the ID stage and write
results in the WB stage.
As before, if we assume the register write in the ALU
instructions occurs in the fourth stage and that we need
two stages to access the data memory, some instructions
could read operands too late in the pipeline.
Cristina Silvano – Politecnico di Milano
- 58 -
Spring 2021
Performance evaluation in pipelining
Cristina Silvano – Politecnico di Milano
- 59 -
Spring 2021
Performance Evaluation in Pipelining


Pipelining increases the CPU instruction throughput
(number of instructions completed per unit of time), but
it does not reduce the execution time (latency) of a single
instruction.
Pipelining usually slightly increases the latency of each
instruction due to the imbalance among the pipeline
stages and overhead in the control of the pipeline.
•
•
•
Imbalance among pipeline stages reduces performance since the
clock can run no faster than the time needed for the slowest pipe
stage.
Pipeline overhead arises from pipeline register delay and clock
skew.
All instructions should be the same number of pipeline stages
Cristina Silvano – Politecnico di Milano
- 60 -
Spring 2021
Performance Metrics
IC = Instruction Count
# Clock Cycles = IC + # Stall Cycles + 4
CPI = Clock Per Instruction = # Clock Cycles / IC =
(IC + # Stall Cycles + 4) / IC
MIPS = fclock / (CPI * 10 6)
Prof. Cristina Silvano –Politecnico di Milano
- 61 -
Spring 2021
Example
IC = Instruction Count = 5
# Clock Cycles = IC + # Stall Cycles + 4 = 5 + 3 + 4 = 12
CPI = Clock Per Instruction = # Clock Cycles / IC = 12 / 5 = 2.4
MIPS = fclock / (CPI * 10 6) = 500 MHz / 2.4 * 10 6 = 208.3
sub
$2, $1, $3
and $12, $2, $5
or
C1
C2
IF
ID
IF
C3
C4
C5
C6
C7
C8
C9
EX
ME
WB
stall
stall
stall
stall
stall
ID
EX
ME
WB
stall
IF
ID
EX
ME
WB
IF
ID
EX
ME
WB
IF
ID
EX
ME
$13, $6, $2
add $14, $2, $2
sw
C10 C11 C12
WB
$15,100($2)
Prof. Cristina Silvano –Politecnico di Milano
- 62 -
Spring 2021
Performance Metrics (2)

Let us consider n iterations of a loop composed of m
instructions per iteration requiring k stalls per iteration
IC per_iter = m
# Clock Cycles
CPI
per_iter
per iter
= (IC
= IC
per iter
per_iter
+ # Stall Cycles
+ # Stall Cycles
per_iter
per_iter
+4) /IC
+4
per_iter
= (m + k + 4) / m
MIPS per_iter = fclock / (CPI
Prof. Cristina Silvano –Politecnico di Milano
per_iter
* 10 6)
- 63 -
Spring 2021
Asymptotic Performance Metrics

Let us consider n iterations of a loop composed of m
instructions per iteration requiring k stalls per iteration
ICAS = Instruction Count AS = m * n
# Clock Cycles = IC
CPI
AS
AS
+ # Stall CyclesAS + 4
= lim
n -> 
( IC
= lim
n -> 
( m *n + k * n + 4 ) / ( m * n )
AS
+ # Stall CyclesAS + 4) /IC
AS
= (m + k) / m
MIPS AS = fclock / (CPIAS* 10 6)
Prof. Cristina Silvano –Politecnico di Milano
- 64 -
Spring 2021
Performance Issues in Pipelining

The ideal CPI on a pipelined processor would be 1, but
stalls cause the pipeline performance to degrade form the
ideal performance, so we have:
Ave. CPI Pipe = Ideal CPI + Pipe Stall Cycles per Instruction
= 1 + Pipe Stall Cycles per Instruction

Pipeline Stall Cycles per Instruction are due to:
Structural Hazards + Data Hazards + Control Hazards +
Memory Stalls (we will see in the next lessons)
Cristina Silvano – Politecnico di Milano
- 65 -
Spring 2021
Reference

Appendix A of the textbook:
J. Hennessey, D. Patterson,
“Computer Architecture: A Quantitative Approach”
4th Edition, Morgan-Kaufmann Publishers.
Cristina Silvano – Politecnico di Milano
- 66 -
Spring 2021
Download