Lecture Set 6

advertisement
Computer Systems Organization:
Lecture 6
Ankur Srivastava
University of Maryland, College Park
Based on Slides from Mary Jane Irwin ( www.cse.psu.edu/~mji )
www.cse.psu.edu/~cg431
[Adapted from Computer Organization and Design,
Patterson & Hennessy, © 2005, UCB]
ENEE350
Single Cycle Datapath with Control Unit
0
Add
Add
Shift
left 2
4
ALUOp
1
PCSrc
Branch
MemRead
MemtoReg
MemWrite
Instr[31-26] Control
Unit
ALUSrc
RegWrite
RegDst
Instruction
Memory
PC
Read
Address
Instr[31-0]
ovf
Instr[25-21] Read Addr 1
Register Read
Instr[20-16] Read Addr 2 Data 1
File
0
Write Addr
Read
1
Instr[15
-11]
Instr[15-0]
Write Data
zero
ALU
1
32
Instr[5-0]
ENEE350
Data
Memory Read Data
1
Write Data
0
0
Data 2
Sign
16 Extend
Address
ALU
control
What is Pipelining
Instruc
Fetch
Reg
Decode
ALU
Instr
Mem
Exec
Data
Mem
Mem
Reg
WB
5 Basic Steps in MIPS
1. Instruction Fetch from Memory
2. Instruction Decode by the control and Read the Rs and Rt Registers, Perform
Sign Extension
3. Execute the Computation in the ALU, Calculate the target Branch Address
4. Memory Access: Read of Write data from/into memory
5. Write Back the result from the ALU or MemeorY Read into the register file
ENEE350
What is Pipelining
Instruc
Fetch
Reg
Decode
ALU
Instr
Mem
Exec
Data
Mem
Mem
Reg
WB
Pipelining: Start the next instruction before the current one has completed.
Hence when the first instruction is being decoded, start fetching the next
instructions and so on.
ENEE350
A Pipelined MIPS Processor

Start the next instruction before the current one has
completed

improves throughput - total amount of work done in a given time

instruction latency (execution time, delay time, response time time from the start of an instruction to its completion) is not
reduced
Cycle 1 Cycle 2
lw
sw
R-type
IFetch
Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Dec
Exec
Mem
WB
IFetch
Dec
Exec
Mem
WB
IFetch
Dec
Exec
Mem
WB
- clock cycle (pipeline stage time) is limited by the slowest stage
- for some instructions, some stages are wasted cycles
ENEE350
Single Cycle, Multiple Cycle, vs. Pipeline
Single Cycle Implementation:
Cycle 1
Cycle 2
Clk
lw
sw
Pipeline Implementation:
lw
IFetch
sw
Dec
Exec
Mem
WB
IFetch
Dec
Exec
Mem
WB
Dec
Exec
Mem
R-type IFetch
ENEE350
WB
Waste
What is Pipelining
Instruc
Fetch
Reg
Decode
ALU
Instr
Mem
Exec
Data
Mem
Mem
Reg
WB
In Order to Implement Pipelining, we will need to redesign the datapath so that we
can start executing the next instruction when the first one is still being
processed.
The first thing to do is to design the data-path such that each of the 5 steps could
work independently
ENEE350
What is Pipelining
Reg
ALU
Instr
Mem
Data
Mem
Reg
This is achieved by having what are called as pipeline registers in between the 5
stages. The clock delay is long enough to finish any of these steps.
ENEE350
What is Pipelining
Reg
ALU
Instr
Mem
Data
Mem
Reg
Let us suppose Instruction Fetch : 200ps, Instruction Decode: 100ps, ALU
Operation: 200ps, Mem Access: 200ps, Register Write Back: 100ps.
Single Cycle Architecture: Clock Delay: 800ps
Pipeline Clock Delay: 200ps
ENEE350
What is Pipelining
lw
IFetch
sw
R-type
Dec
Exec
Mem
WB
IFetch
Dec
Exec
Mem
WB
IFetch
Dec
Exec
Mem
WB
Execution time on a single cycle processor: 800*3 = 2400
Execution time on a pipelined processor: 200*5 + 200 + 200 = 1400
ENEE350
Single Cycle Datapath with Control Unit
0
Add
Add
Shift
left 2
4
ALUOp
1
PCSrc
Branch
MemRead
MemtoReg
MemWrite
Instr[31-26] Control
Unit
ALUSrc
RegWrite
RegDst
Instruction
Memory
PC
Read
Address
Instr[31-0]
ovf
Instr[25-21] Read Addr 1
Register Read
Instr[20-16] Read Addr 2 Data 1
File
0
Write Addr
Read
1
Instr[15
-11]
Instr[15-0]
Write Data
zero
ALU
1
32
Instr[5-0]
ENEE350
Data
Memory Read Data
1
Write Data
0
0
Data 2
Sign
16 Extend
Address
ALU
control
MIPS Pipeline Datapath Modifications

What do we need to add/modify in our MIPS datapath?
State registers between each pipeline stage to isolate them
For simplicity the entire datapath is not shown


IF:IFetch
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
ENEE350
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
PC
Read
Address
Read Addr 1
IFetch/Dec
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
Shift
left 2
4
Corrected Datapath to Save RegWrite Addr

Need to preserve the destination register address in
the pipeline state registers
IF/ID
ID/EX
EX/MEM
Add
Shift
left 2
4
PC
Instruction
Memory
Read
Address
Read Addr 1
Read Addr 2Data 1
File
Write Addr
16
Sign
Extend
Read
Data 2
32
MEM/WB
Data
Memory
Register Read
Write Data
ENEE350
Add
ALU
Address
Write Data
Read
Data
MIPS Pipeline Control Path Modifications

All control signals can be determined during Decode

and held in the state registers between pipeline stages
ID/EX
EX/MEM
IF/ID
Control
Add
MEM/WB
Shift
left 2
4
PC
Instruction
Memory
Read
Address
Read Addr 1
Data
Memory
Register Read
Read Addr 2Data 1
File
Write Addr
Write Data
16
ENEE350
Add
Sign
Extend
Read
Data 2
32
ALU
Address
Write Data
Read
Data
Pipelining the MIPS ISA

What makes it easy

all instructions are the same length (32 bits)
- can fetch in the 1st stage and decode in the 2nd stage

few instruction formats (three) with symmetry across formats
- can begin reading register file in 2nd stage

memory operations can occur only in loads and stores
- can use the execute stage to calculate memory addresses


each MIPS instruction writes at most one result (i.e.,
changes the machine state) and does so near the end of the
pipeline (MEM and WB)
What makes it hard



ENEE350
structural hazards: what if we had only one memory?
control hazards: what about branches?
data hazards: what if an instruction’s input operands depend
on the output of a previous instruction?
Why Pipeline? For Performance!
Time (clock cycles)
Inst 3
IM
Reg
IM
Reg
IM
Reg
ALU
Inst 2
Reg
ALU
Inst 1
IM
ALU
O
r
d
e
r
Inst 0
ALU
I
n
s
t
r.
IM
Reg
DM
Time to fill the pipeline
ENEE350
Reg
DM
Reg
DM
Reg
DM
ALU
Inst 4
Once the
pipeline is full,
one instruction
is completed
every cycle, so
CPI = 1
Reg
DM
Reg
Can Pipelining Get Us Into Trouble?

Yes: Pipeline Hazards

structural hazards: attempt to use the same resource by two
different instructions at the same time

data hazards: attempt to use data before it is ready
- An instruction’s source operand(s) are produced by a prior
instruction still in the pipeline

control hazards: attempt to make a decision about program
control flow before the condition has been evaluated and the
new PC target address calculated
- branch instructions

ENEE350
Can always resolve hazards by waiting

pipeline control must detect the hazard

and take action to resolve hazards
A Single Memory Would Be a Structural Hazard
Time (clock cycles)

ENEE350
Mem
Mem
Reg
Mem
Reg
Mem
Reg
ALU
Inst 4
Reg
ALU
Inst 3
Mem
Reg
ALU
Inst 2
Reg
Reading data from
memory
Mem
ALU
O
r
d
e
r
Inst 1
Mem
ALU
I
n
s
t
r.
lw
Mem
Reg
Mem
Reading instruction
from memory
Reg
Mem
Reg
Reg
Fix with separate instr and data memories (I$ and D$)
How About Register File Access?
Time (clock cycles)
Reg
Inst 1
IM
Reg
ALU
IM
Reg
ALU
IM
Reg
Inst 2
add $2,$1,
clock edge that controls
register writing
ENEE350
DM
Reg
DM
Reg
DM
ALU
O
r
d
e
r
add $1 ,IM
ALU
I
n
s
t
r.
Fix register file
access hazard by
doing reads in the
second half of the
cycle and writes in
the first half
Reg
DM
Reg
clock edge that controls
loading of pipeline state
registers
Register Usage Can Cause Data Hazards

Dependencies backward in time cause hazards
$8,$1,$9
Reg
IM
Reg
IM
Reg
ALU
or
IM
ALU
and $6,$1,$7
Reg
ALU
sub $4,$1,$5
IM
ALU
add $1,$x,$y
IM
Reg
DM

ENEE350
Read before write data hazard
DM
Reg
DM
Reg
DM
ALU
xor $4,$1,$5
Reg
Reg
DM
Reg
Loads Can Cause Data Hazards

Reg
IM
Reg
IM
Reg
IM
Reg
ALU
sub $4,$1,$5
IM
ALU
$1,4($2)
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
Dependencies backward in time cause hazards
IM
Reg
and $6,$1,$7
or
$8,$1,$9

ENEE350
Load-use data hazard
Reg
DM
Reg
DM
Reg
DM
ALU
xor $4,$1,$5
DM
Reg
DM
Reg
One Way to “Fix” a Data Hazard
IM
Reg
DM
Reg
IM
Reg
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
Can fix data
hazard by
waiting – stall –
but impacts CPI
IM
Reg
stall
stall
sub $4,$1,$5
ENEE350
ALU
and $6,$1,$7
DM
Reg
DM
Reg
Another Way to “Fix” a Data Hazard
and $6,$1,$7
or
$8,$1,$9
ENEE350
Reg
IM
Reg
IM
Reg
IM
Reg
DM
Reg
DM
Reg
DM
Reg
DM
ALU
xor $4,$1,$5
IM
ALU
sub $4,$1,$5
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
Fix data hazards
by forwarding
results as soon as
they are available
to where they are
needed
Reg
DM
Reg
Forwarding with Load-use Data Hazards
and $6,$1,$7
or
$8,$1,$9

ENEE350
Reg
IM
Reg
IM
Reg
IM
Reg
DM
Reg
DM
Reg
DM
Reg
DM
ALU
xor $4,$1,$5
IM
ALU
sub $4,$1,$5
Reg
ALU
$1,4($2) IM
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
Reg
DM
Will still need one stall cycle even with forwarding
Reg
Branch Instructions Cause Control Hazards

Inst 3
ENEE350
IM
Reg
IM
Reg
IM
Reg
DM
Reg
DM
Reg
DM
ALU
Inst 4
Reg
ALU
lw
IM
ALU
O
r
d
e
r
beq
ALU
I
n
s
t
r.
Dependencies backward in time cause hazards
Reg
DM
Reg
One Way to “Fix” a Control Hazard
O
r
d
e
r
beq
IM
Reg
ALU
I
n
s
t
r.
DM
Fix branch
hazard by
waiting –
stall – but
affects CPI
Reg
stall
stall
stall
ENEE350
Reg
IM
Reg
DM
ALU
Inst 3
IM
ALU
lw
Reg
DM
Summary

All modern day processors use pipelining

Pipelining doesn’t help latency of single task, it helps
throughput of entire workload

Potential speedup: a CPI of 1 and fast a CC

Pipeline rate limited by slowest pipeline stage


Unbalanced pipe stages makes for inefficiencies

The time to “fill” pipeline and time to “drain” it can impact
speedup for deep pipelines and short code runs
Must detect and resolve hazards

ENEE350
Stalling negatively affects CPI (makes CPI less than the ideal
of 1)
Download