CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining

advertisement
CS 152 Computer Architecture and Engineering
Lecture 4 - Pipelining
Dr. George Michelogiannakis
EECS, University of California at Berkeley
CRD, Lawrence Berkeley National Laboratory
http://inst.eecs.berkeley.edu/~cs152
1/28/2016
CS152, Spring 2016
Last time in Lecture 3
 Microcoding became less attractive as gap between RAM
and ROM speeds reduced
 Complex instruction sets difficult to pipeline, so difficult to
increase performance as gate count grew
 Load-Store RISC ISAs designed for efficient pipelined
implementations
– Very similar to vertical microcode
– Inspired by earlier Cray machines (more on these later)
 Iron Law explains architecture design space
– Trade instructions/program, cycles/instruction, and time/cycle
1/28/2016
CS152, Spring 2016
2
Question of the Day
 Why a five stage pipeline?
1/28/2016
CS152, Spring 2016
3
An Ideal Pipeline
stage
1




stage
2
stage
3
stage
4
All objects go through the same stages
No sharing of resources between any two stages
Propagation delay through all pipeline stages is equal
The scheduling of an object entering the pipeline is not
affected by the objects in other stages
These conditions generally hold for industrial assembly
lines, but instructions depend on each other!
1/28/2016
CS152, Spring 2016
4
Pipelined RISC-V
 To pipeline RISC-V:
 First build RISC-V without pipelining with CPI=1
 Next, add pipeline registers to reduce cycle time while
maintaining CPI=1
1/28/2016
CS152, Spring 2016
5
Lecture 3: Unpipelined Datapath for RISC-V
PCSel
br
rind
jabs
pc+4
RegWriteEn
MemWrite
WBSel
0x4
Add
Add
clk
PC
clk
Br Logic Bcomp?
we
rs1
rs2
rd1
wa
wd rd2
addr
inst
Inst.
Memory
ALU
GPRs
clk
we
addr
rdata
Data
Memory
Imm
Select
wdata
ALU
Control
OpCode WASel
1/28/2016
ImmSel
CS152, Spring 2016
Op2Sel
6
Lecture 3: Hardwired Control Table
Opcode
ALU
ImmSel
Op2Sel
MemWr
RFWen
pc+4
pc+4
pc+4
pc+4
BEQtrue
BrType12
*
*
no
no
*
*
br
BEQfalse
BrType12
*
*
*
*
*
no
*
no
no
no
yes
yes
*
PC
PC
*
rd
rd
pc+4
jabs
rind
Op2Sel= Reg / Imm
ALU
ALU
Mem
*
rd
rd
rd
*
BsType12
JALR
yes
yes
yes
no
PCSel
SW
*
*
no
no
no
yes
WASel
LW
JAL
Func
Op
+
+
WBSel
*
IType12
IType12
ALUi
Reg
Imm
Imm
Imm
FuncSel
WBSel = ALU / Mem / PC
PCSel = pc+4 / br / rind / jabs
Correction since L3: “J” has been removed
1/28/2016
CS152, Spring 2016
7
Pipelined Datapath
0x4
Add
addr
PC
rdata
Inst.
Memory
fetch
phase
IR
we
rs1
rs2
rd1
wa
wd rd2
GPRs
ALU
Data
Memory
Imm
Select
decode & Reg-fetch
phase
we
addr
rdata
wdata
execute
phase
memory
phase
write
-back
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
However, CPI will increase unless instructions are pipelined
1/28/2016
CS152, Spring 2016
8
“Iron Law” of Processor Performance
Time = Instructions
Cycles
Time
Program
Program * Instruction * Cycle
 Instructions per program depends on source code,
compiler technology, and ISA
 Cycles per instructions (CPI) depends on ISA and
µarchitecture
 Time per cycle depends upon the µarchitecture and base
technology
Lecture 2
Lecture 3
Lecture 4
1/28/2016
Microarchitecture
Microcoded
Single-cycle unpipelined
Pipelined
CS152, Spring 2016
CPI
>1
1
1
cycle time
short
long
short
9
CPI Examples
Microcoded machine
7 cycles
5 cycles
Inst 1
10 cycles
Inst 2
Time
Inst 3
3 instructions, 22 cycles, CPI=7.33
Unpipelined machine
Inst 1
Inst 2
Inst 3
3 instructions, 3 cycles, CPI=1
Pipelined machine
Inst 1
Inst 2
Inst 3
1/28/2016
3 instructions, 3 cycles, CPI=1
5-stage pipeline CPI≠5!!!
CS152, Spring 2016
10
Technology Assumptions
• A small amount of very fast memory (caches)
backed up by a large, slower memory
• Fast ALU (at least for integers)
• Multiported Register files (slower!)
Thus, the following timing assumption is reasonable
tIM ~= tRF ~= tALU ~= tDM ~= tRW
A 5-stage pipeline will be focus of our detailed design
- some commercial designs have over 30 pipeline
stages to do an integer add!
1/28/2016
CS152, Spring 2016
11
5-Stage Pipelined Execution
0x4
Add
addr
rdata
PC
we
rs1
rs2
rd1
wa
wd rd2
GPRs
IR
Inst.
Memory
I-Fetch
(IF)
ALU
Data
Memory
Imm
Select
wdata
Decode, Reg. Fetch Execute
(EX)
(ID)
time
instruction1
instruction2
instruction3
instruction4
instruction5
1/28/2016
we
addr
rdata
t0
IF1
t1
ID1
IF2
t2
EX1
ID2
IF3
t3
MA1
EX2
ID3
IF4
CS152, Spring 2016
t4
WB1
MA2
EX3
ID4
IF5
Write
-Back
(WB)
Memory
(MA)
t5
t6
t7
....
WB2
MA3 WB3
EX4 MA4 WB4
ID5 EX5 MA5 WB5
12
5-Stage Pipelined Execution
Resource Usage Diagram
0x4
Add
addr
rdata
PC
we
rs1
rs2
rd1
ws
wd rd2
GPRs
IR
Inst.
Memory
Resources
I-Fetch
(IF)
1/28/2016
we
addr
rdata
ALU
Data
Memory
Imm
Select
wdata
Decode, Reg. Fetch Execute
(ID)
(EX)
time
IF
ID
EX
MA
WB
t0
I1
t1
I2
I1
t2
I3
I2
I1
t3
I4
I3
I2
I1
CS152, Spring 2016
t4
I5
I4
I3
I2
I1
Write
-Back
(WB)
Memory
(MA)
t5
t6
t7
....
I5
I4
I3
I2
I5
I4
I3
I5
I4
I5
13
Pipelined Execution:
ALU Instructions
0x4
IR
Add
PC
addr
inst IR
Inst
Memory
we
rs1
rs2
rd1
wa
wd rd2
GPRs
IR
IR
A
ALU
Y
B
we
addr
rdata
Data
Memory
wdata
Imm
Select
R
wdata
MD1
MD2
Not quite correct!
We need an Instruction Reg (IR) for each stage
1/28/2016
CS152, Spring 2016
14
Pipelined RISC-V Datapath
without jumps
F
D
E
M
IR
IR
W
IR
0x4
Add
RegWriteEn
PC
addr
inst IR
Inst
Memory
we
rs1
rs2
rd1
wa
wd rd2
ALU
Control
MemWrite
A
ALU
GPRs
Y
we
addr
rdata
B
Data
Memory
wdata
Imm
Select
R
wdata
MD1
ImmSel
WBSel
MD2
Op2Sel
Control Points Need to
Be Connected
1/28/2016
CS152, Spring 2016
15
Instructions interact with each other in pipeline
 An instruction in the pipeline may need a resource
being used by another instruction in the pipeline
 structural hazard
 An instruction may depend on something
produced by an earlier instruction
– Dependence may be for a data value
 data hazard
– Dependence may be for the next instruction’s address
 control hazard (branches, exceptions)
1/28/2016
CS152, Spring 2016
16
Resolving Structural Hazards
 Structural hazard occurs when two instructions need same
hardware resource at same time
– Can resolve in hardware by stalling newer instruction till older
instruction finished with resource
 A structural hazard can always be avoided by adding more
hardware to design
– E.g., if two instructions both need a port to memory at same time, could
avoid hazard by adding second port to memory
 Our 5-stage pipeline has no structural hazards by design
– Thanks to RISC-V ISA, which was designed for pipelining
1/28/2016
CS152, Spring 2016
17
Data Hazards
x1 …
x4  x1 …
0x4
IR
Add
PC
addr
inst IR
Inst
Memory
we
rs1
rs2
rd1
wa
wd rd2
GPRs
A
ALU
Y
B
rdata
R
wdata
wdata
MD1
1/28/2016
we
addr
Data
Memory
Imm
Select
...
x1  x0 + 10
x4  x1 + 17
...
IR
IR
MD2
x1 is stale. Oops!
CS152, Spring 2016
18
How Would You Resolve This?
 What can you do if your project buddy has not completed
their part, which you need to start?
 Three options
–
–
–
–
1/28/2016
Wait (stall)
Speculate on the value of the deliverable
Bypass: ask them for what you need before his/her final deliverable
(ask him/her to work harder?)
CS152, Spring 2016
19
Resolving Data Hazards (1)
Strategy 1:
Wait for the result to be available by freezing
earlier pipeline stages  interlocks
1/28/2016
CS152, Spring 2016
20
Feedback to Resolve Hazards
FB1
FB2
stage
1
FB3
stage
2
FB4
stage
3
stage
4
 Later stages provide dependence information to
earlier stages which can stall (or kill) instructions
• Controlling a pipeline in this manner works provided
the instruction at stage i+1 can complete without
any interference from instructions in stages 1 to i
(otherwise deadlocks may occur)
1/28/2016
CS152, Spring 2016
21
Interlocks to resolve Data Hazards
Stall Condition
bubble
0x4
Add
PC
addr
inst IR
Inst
Memory
...
x1  x0 + 10
x4  x1 + 17
...
1/28/2016
we
rs1
rs2
rd1
wa
wd rd2
GPRs
IR
A
ALU
Y
B
we
addr
rdata
Data
Memory
Imm
Select
R
wdata
wdata
MD1
CS152, Spring 2016
IR
IR
MD2
22
Stalled Stages and Pipeline Bubbles
time
t0 t1 t2 t3 t4 t5
(I1) x1 (x0) + 10 IF1 ID1 EX1 MA1 WB1
(I2) x4 (x1) + 17
IF2 ID2 ID2 ID2 ID2
(I3)
IF3 IF3 IF3 IF3
(I4)
stalled stages
(I5)
Resource
Usage
IF
ID
EX
MA
WB
time
t0 t1
I1
I2
I1
t2
I3
I2
I1
t3
I3
I2
I1
t4
I3
I2
I1
t5
I3
I2
-
t6
CS152, Spring 2016
....
EX2 MA2 WB2
ID3 EX3 MA3 WB3
IF4 ID4 EX4 MA4 WB4
IF5 ID5 EX5 MA5 WB5
t6
I4
I3
I2
-
- 
1/28/2016
t7
t7
I5
I4
I3
I2
-
....
I5
I4
I3
I2
I5
I4
I3
I5
I4
I5
pipeline bubble
23
Interlock Control Logic
stall
wa
Cstall
rs2 ?
rs1
bubble
0x4
Add
PC
addr
inst IR
Inst
Memory
we
rs1
rs2
rd1
wa
wd rd2
GPRs
IR
IR
IR
A
ALU
Y
B
we
addr
rdata
Data
Memory
Imm
Select
R
wdata
wdata
MD1
MD2
Compare the source registers of the instruction in the decode
stage with the destination register of the uncommitted
instructions.
1/28/2016
CS152, Spring 2016
24
Interlock Control Logic
ignoring jumps & branches
stall
wa
we
Cstall
rs1 ?
rs2
re1
re2
Cre
0x4
Add
PC
addr
inst IR
Inst
Memory
Cdest
Cdest
bubble
we
rs1
rs2
rd1
wa
wd rd2
GPRs
wa
wa we
we
IR
IR
IR
A
ALU
Y
B
we
addr
rdata
Data
Memory
wdata
Imm
Select
R
wdata
MD1
MD2
Should we always stall if an rs field matches some rd?
not every instruction writes a register => we
not every instruction reads a register => re
1/28/2016
CS152, Spring 2016
25
Source & Destination Registers
func7
rs2
rs1
func3
rd
opcode
immediate12
rs1
func3
rd
opcode
imm
rs1
func3 imm opcode
rs2
Jump Offset[19:0]
ALU
ALUI
LW
SW
Bcond
JAL
JALR
1/28/2016
rd
rd <= rs1 func10 rs2
rd <= rs1 op imm
rd <= M [rs1 + imm]
M [rs1 + imm] <= rs2
rs1,rs2
true: PC <= PC + imm
false: PC <= PC + 4
x1 <= PC, PC <= PC + imm
rd <= PC, PC <= rs1 + imm
CS152, Spring 2016
ALU
ALUI/LW/JALR
SW/Bcond
opcode
0
source(s) destination
rs1, rs2
rd
rs1
rd
rs1
rd
rs1, rs2
rs1, rs2
-
rs1
rd
rd
26
Deriving the Stall Signal
Cre
Cdest
re1 = Case opcode
ALU, ALUi,
ws = rd
we = Case opcode
ALU, ALUi, LW, JALR =>on
...
=>off
LW, SW, Bcond,
=>on
JALR
=>off
JAL
re2 = Case opcode
ALU, SW, Bcond =>on
->off
...
Cstall
1/28/2016
stall = ((rs1D =wsE).weE +
(rs1D =wsM).weM +
(rs1D =wsW).weW) . re1D +
((rs2D =wsE).weE +
(rs2D =wsM).weM +
(rs2D =wsW).weW) . re2D
CS152, Spring 2016
27
Hazards due to Loads & Stores
Stall Condition
0x4
bubble
Add
PC
addr
inst IR
Inst
Memory
...
M[x1+7] <= x2
x4 <= M[x3+5]
...
1/28/2016
What if
x1+7 = x3+5 ?
we
rs1
rs2
rd1
wa
wd rd2
GPRs
IR
IR
IR
A
ALU
Y
B
we
addr
rdata
Data
Memory
wdata
Imm
Select
R
wdata
MD1
MD2
Is there any possible data hazard
in this instruction sequence?
CS152, Spring 2016
28
Load & Store Hazards
...
M[x1+7] <= x2
x4 <= M[x3+5]
...
x1+7 = x3+5 => data hazard
However, the hazard is avoided because our memory
system completes writes in one cycle !
Load/Store hazards are sometimes resolved in the
pipeline and sometimes in the memory system itself.
More on this later in the course.
1/28/2016
CS152, Spring 2016
29
CS152 Administrivia
 Quiz 1 on Feb 17 will cover PS1, Lab1, lectures 1-5, and
associated readings.
1/28/2016
CS152, Spring 2016
30
Resolving Data Hazards (2)
Strategy 2:
Route data as soon as possible after it is calculated
to the earlier pipeline stage  bypass
1/28/2016
CS152, Spring 2016
31
Bypassing
time
(I1) x1  x0 + 10
(I2) x4  x1 + 17
(I3)
(I4)
(I5)
t0
IF1
t1 t2 t3 t4 t5
ID1 EX1 MA1 WB1
IF2 ID2 ID2 ID2 ID2
IF3 IF3 IF3 IF3
stalled stages
t6
t7
....
EX2 MA2 WB2
ID3 EX3 MA3
IF4 ID4 EX4
IF5 ID5
Each stall or kill introduces a bubble in the pipeline
=> CPI > 1
A new datapath, i.e., a bypass, can get the data from
the output of the ALU to its input
time
t0
(I1) x1  x0 + 10
(I2) x4  x1 + 17
(I3)
(I4)
(I5)
1/28/2016
t1
IF1
t2 t3
ID1 EX1
IF2 ID2
IF3
CS152, Spring 2016
t4
MA1
EX2
ID3
IF4
t5
WB1
MA2
EX3
ID4
IF5
t6
t7
....
WB2
MA3 WB3
EX4 MA4 WB4
ID5 EX5 MA5 WB5
32
Adding a Bypass
stall
x4 <= x1...
0x4
x1 <= ...
E
bubble
M
IR
Add
IR
IR
W
ASrc
PC
addr
inst IR
Inst
Memory
D
we
rs1
rs2
rd1
wa
wd rd2
GPRs
A
ALU
B
1/28/2016
rdata
Data
Memory
wdata
Imm
Select
R
wdata
MD1
(I1)
(I2)
Y
we
addr
MD2
When does this bypass help?
...
x1 <= M[x0 + 10]
JAL 500
x1 <= x0 + 10
x4 <= x1 + 17
x4 <= x1 + 17
x4 <= x1 + 17
yes
no
no
CS152, Spring 2016
33
The Bypass Signal
Deriving it from the Stall Signal
stall = ( ((rs1D =wsE).weE + (rs1D =wsM).weM + (rs1D =wsW).weW).re1D
+((rs2D =wsE).weE + (rs2D =wsM).weM + (rs2D =wsW).weW).re2D )
we = Case opcode
ALU, ALUi, LW,, JAL JALR => on
... => off
ws = rd
ASrc = (rs1D=wsE).weE.re1D
Is this correct?
No because only ALU and ALUi instructions can benefit from this
bypass
Split weE into two components: we-bypass, we-stall
1/28/2016
CS152, Spring 2016
34
Bypass and Stall Signals
Split weE into two components: we-bypass, we-stall
we-bypassE = Case opcodeE
ALU, ALUi => on
...
=> off
ASrc
we-stallE = Case opcodeE
LW, JAL, JALR=> on
JAL
=> on
...
=> off
= (rs1D =wsE).we-bypassE . re1D
stall = ((rs1D =wsE).we-stallE +
(rs1D=wsM).weM + (rs1D=wsW).weW). re1D
+((rs2D = wsE).weE + (rs2D = wsM).weM + (rs2D = wsW).weW). re2D
1/28/2016
CS152, Spring 2016
35
Fully Bypassed Datapath
PC for JAL, ...
stall
0x4
bubble
Add
PC
addr
ASrc
inst IR
Inst
Memory
Is there still
a need for the
stall signal ?
1/28/2016
D
we
rs1
rs2
rd1
wa
wd rd2
M
IR
ALU
Y
B
BSrc
IR
IR
A
GPRs
Imm
Select
E
W
we
addr
rdata
Data
Memory
wdata
R
wdata
MD1
MD2
stall = (rs1D=wsE). (opcodeE=LWE).(wsE!=0 ).re1D
+ (rs2D=wsE). (opcodeE=LWE).(wsE!=0 ).re2D
CS152, Spring 2016
36
Pipeline CPI Examples
Measure from when first instruction finishes
to when last instruction in sequence finishes.
Time
Inst 1
Inst 2
Inst 3
3 instructions finish in 3 cycles
CPI = 3/3 =1
Inst 1
Inst 2
Bubble
Inst 3
3 instructions finish in 4 cycles
CPI = 4/3 = 1.33
Inst 1
Bubble 1
Inst 2
Inst
Bubble
3 2
Inst 3
1/28/2016
3 instructions finish in 5cycles
CPI = 5/3 = 1.67
CS152, Spring 2016
37
Resolving Data Hazards (3)
Strategy 3: Speculate on the dependence!
Two cases:
Guessed correctly  do nothing
Guessed incorrectly  kill and restart
…. We’ll later see examples of this approach in
more complex processors.
1/28/2016
CS152, Spring 2016
38
Speculation that load value=zero
PC for JAL, ...
stall
0x4
Add
PC
addr
E
bubble
IR
ASrc
inst IR
Inst
Memory
D
we
rs1
rs2
rd1
wa
wd rd2
IR
IR
0
A
ALU
Y
B
BSrc
we
addr
rdata
Data
Memory
wdata
CS152, Spring 2016
R
wdata
MD1
MD2
Guess_zero= (rs1D=wsE). (opcodeE=LWE).(wsE!=0 ).re1D
Also need to add circuitry to remember that this was a guess and
flush pipeline if load not zero!
Not worth doing in practice – why?
1/28/2016
W
Guess_zero
GPRs
Imm
Select
M
39
Control Hazards
What do we need to calculate next PC?
 For Jumps
– Opcode, PC and offset
 For Jump Register
– Opcode, Register value, and PC
 For Conditional Branches
– Opcode, Register (for condition), PC and offset
 For all other instructions
– Opcode and PC ( and have to know it’s not one of above )
1/28/2016
CS152, Spring 2016
40
PC Calculation Bubbles
(I1) x1  x0 + 10
(I2) x3  x2 + 17
(I3)
(I4)
Resource
Usage
IF
ID
EX
MA
WB
time
t0 t1 t2 t3
IF1 ID1 EX1 MA1
IF2 IF2 ID2
IF3
time
t0 t1
I1
I1
t2
I2
I1
t3
I2
I1
t4 t5
WB1
EX2 MA2
IF3 ID3
IF4
t6
t4
I3
t6
I4
I2
I1
t5
I3
I2
-
CS152, Spring 2016
....
WB2
EX3 MA3 WB3
IF4 ID4 EX4 MA4 WB4
I3
I2
- 
1/28/2016
t7
t7
....
I4
I3
-
I4
I3
I4
-
I4
pipeline bubble
41
Speculate next address is PC+4
PCSrc (pc+4 / jabs / rind/ br)
stall
Add
bubble
0x4
Add
Jump?
PC
104
I1
I2
I3
I4
1/28/2016
addr
inst
M
IR
IR
I1
IR
Inst
Memory
096
100
104
304
E
I2
ADD
J 304
ADD
ADD
kill
A jump instruction kills (not stalls)
the following instruction
CS152, Spring 2016
How?
42
Pipelining Jumps
PCSrc (pc+4 / jabs / rind/ br)
stall
To kill a fetched
instruction -- Insert
a mux before IR
Add
bubble
0x4
Add
Jump?
304
104
I1
I2
I3
I4
1/28/2016
addr
inst
bubble
Inst
Memory
096
100
104
304
IR
bubble
I2
ADD
J 304
ADD
ADD
kill
M
IR
IR
II21
I1
Any
interaction
between
stall and
jump?
IRSrcD
PC
E
IRSrcD = Case opcodeD
JAL
 bubble
...
IM
CS152, Spring 2016
43
Jump Pipeline Diagrams
(I1)
(I2)
(I3)
(I4)
096:
100:
104:
304:
ADD
J 304
ADD
ADD
Resource
Usage
IF
ID
EX
MA
WB
time
t0 t1 t2
IF1 ID1 EX1
IF2 ID2
IF3
time
t0 t1
I1
I2
I1
t2
I3
I2
I1
t3
MA1
EX2
IF4
t4
WB1
MA2
ID4
t5
t6
t3
I4
I2
I1
t4
I5
I4
I2
I1
t5
t6
t7
....
I5
I4
I2
I5
I4
-
I5
I4
I5
CS152, Spring 2016
....
WB2
EX4 MA4 WB4
- 
1/28/2016
t7
pipeline bubble
44
Pipelining Conditional Branches
PCSrc (pc+4 / jabs / rind / br)
stall
Add
0x4
bubble
Add
E
M
IR
IR
I1
BEQ?
Taken?
IRSrcD
PC
104
I1
I2
I3
I4
1/28/2016
096
100
104
304
addr
inst
bubble
A
IR
Inst
Memory
ALU
Y
I2
ADD
BEQ x1,x2 +200
ADD
ADD
Branch condition is not known until the
execute stage
what action should be taken in the
decode stage ?
CS152, Spring 2016
45
Pipelining Conditional Branches
PCSrc (pc+4 / jabs / rind / br)
stall
?
Add
bubble
0x4
Bcond?
E
Add
M
IR
IR
I2
I1
Taken?
IRSrcD
PC
108
I1
I2
I3
I4
1/28/2016
096
100
104
304
addr
inst
Inst
Memory
bubble
IR
A
ALU
Y
I3
If the branch is taken
ADD
- kill the two following instructions
BEQ x1,x2 +200
- the instruction at the decode stage is
ADD
not valid  stall signal is not valid
ADD
CS152, Spring 2016
46
Pipelining Conditional Branches
stall
Add
PCSrc (pc+4/jabs/rind/br)
bubble
0x4
Bcond?
E
IRSrcE
Add
Jump?
M
IR
IR
I2
I1
Taken?
PC
PC
108
I1:
I2:
I3:
I4:
1/28/2016
096
100
104
304
IRSrcD
addr
inst
Inst
Memory
bubble
IR
A
ALU
Y
I3
If the branch is taken
ADD
- kill the two following instructions
BEQ x1,x2 +200
- the instruction at the decode stage is
ADD
not valid  stall signal is not valid
ADD
CS152, Spring 2016
47
Branch Pipeline Diagrams
(resolved in execute stage)
(I1)
(I2)
(I3)
(I4)
(I5)
time
t0 t1 t2
096: ADD
IF1 ID1 EX1
100: BEQ +200
IF2 ID2
104: ADD
IF3
108:
304: ADD
Resource
Usage
IF
ID
EX
MA
WB
time
t0 t1
I1
I2
I1
t2
I3
I2
I1
t3
MA1
EX2
ID3
IF4
t4
WB1
MA2
IF5
t5
t6
t3
I4
I3
I2
I1
t4
I5
I2
I1
t5
t6
t7
....
I5
I2
I5
-
I5
-
I5
CS152, Spring 2016
....
WB2
ID5 EX5 MA5 WB5
- 
1/28/2016
t7
pipeline bubble
48
Question of the Day
 Why a five stage pipeline?
1/28/2016
CS152, Spring 2016
49
Acknowledgements
 These slides contain material developed and copyright by:
–
–
–
–
–
–
Arvind (MIT)
Krste Asanovic (MIT/UCB)
Joel Emer (Intel/MIT)
James Hoe (CMU)
John Kubiatowicz (UCB)
David Patterson (UCB)
 MIT material derived from course 6.823
 UCB material derived from course CS252
1/28/2016
CS152, Spring 2016
50
Download