PowerPoint - Computer Science

advertisement
RISC Pipeline
Kevin Walsh
CS 3410, Spring 2010
Computer Science
Cornell University
See: P&H Chapter 4.6
A Processor
memory
+4
inst
register
file
+4
=?
PC
control
offset
new
pc
alu
cmp
addr
din
dout
memory
target
imm
extend
2
A Processor
memory
inst
register
file
alu
+4
addr
p
PC
din
control
new
pc
Instruction
Fetch
imm
extend
Instruction
Decode
dout
memory
compute
jump/branch
targets
Execute
Memory
WriteBack
3
Basic Pipeline
Five stage “RISC” load-store architecture
1. Instruction fetch (IF)
– get instruction from memory, increment PC
2. Instruction Decode (ID)
– translate opcode into control signals and read registers
3. Execute (EX)
– perform ALU operation, compute jump/branch targets
4. Memory (MEM)
– access memory if needed
5. Writeback (WB)
– update register file
Slides thanks to Sally McKee & Kavita Bala
4
Pipelined Implementation
Break instructions across multiple clock cycles
(five, in this case)
Design a separate stage for the execution
performed during each clock cycle
Add pipeline registers to isolate signals between
different stages
5
register
file
B
alu
D
memory
D
A
Pipelined Processor
+4
IF/ID
M
B
ID/EX
Execute
EX/MEM
Memory
ctrl
Instruction
Decode
Instruction
Fetch
dout
compute
jump/branch
targets
ctrl
extend
din
memory
imm
new
pc
control
ctrl
inst
PC
addr
WriteBack
MEM/WB
6
IF
Stage 1: Instruction Fetch
Fetch a new instruction every cycle
• Current PC is index to instruction memory
• Increment the PC at end of cycle (assume no branches for now)
Write values of interest to pipeline register (IF/ID)
• Instruction bits (for later decoding)
• PC+4 (for later computing branch targets)
7
IF
instruction
memory
mc
00 = read word
1
PC+4
+4
inst
addr
WE
PC
pcreg
new
pc
pcsel
Rest of pipeline
1
pcrel
pcabs
IF/ID
8
ID
Stage 2: Instruction Decode
On every cycle:
• Read IF/ID pipeline register to get instruction bits
• Decode instruction, generate control signals
• Read from register file
Write values of interest to pipeline register (ID/EX)
• Control information, Rd index, immediates, offsets, …
• Contents of Ra, Rb
• PC+4 (for computing branch targets later)
9
ctrl PC+4 imm
inst
PC+4
Stage 1: Instruction Fetch
WE
Rd register
D
file
A
A
IF/ID
ID/EX
decode
extend
Rest of pipeline
B
Ra Rb
B
result
ID
dest
10
EX
Stage 3: Execute
On every cycle:
•
•
•
•
Read ID/EX pipeline register to get values and control bits
Perform ALU operation
Compute targets (PC+4+offset, etc.) in case this is a branch
Decide if jump/branch should be taken
Write values of interest to pipeline register (EX/MEM)
• Control information, Rd index, …
• Result of ALU operation
• Value in case this is a memory store instruction
11
ctrl
pcabs
ctrl
pcrel
B
imm
B
D
alu
+
Rest of pipeline
PC+4
Stage 2: Instruction Decode
A
pcsel
EX
branch?
pcreg
||
ID/EX
EX/MEM
12
MEM
Stage 4: Memory
On every cycle:
• Read EX/MEM pipeline register to get values and control bits
• Perform memory load/store if needed
– address is ALU result
Write values of interest to pipeline register (MEM/WB)
• Control information, Rd index, …
• Result of memory operation
• Pass result of ALU operation
13
ctrl
ctrl
B
Stage 3: Execute
din
dout
addr
memory
Rest of pipeline
M
D
D
MEM
mc
EX/MEM
MEM/WB
14
WB
Stage 5: Write-back
On every cycle:
• Read MEM/WB pipeline register to get values and control bits
• Select value and write to register file
15
ctrl
M
Stage 4: Memory
D
WB
result
dest
MEM/WB
16
IF/ID
D
M
B
D
A
B
ID/EX
addr
din dout
OP
Rd
OP
EX/MEM
Rd
mem
PC+4
Rd
OP
PC+4
+4
PC
B
Ra Rb
imm
inst
inst
mem
A
Rd
D
MEM/WB
17
Example
add
nand
lw
add
sw
r3,
r6,
r4,
r5,
r7,
r1, r2;
r4, r5;
20(r2);
r2, r5;
12(r3);
18
IF/ID
nand
lw r4,
add
sw
r7,
r3,
r5,
r6,20(r2)
12(r3)
r1,
r2,
r4, r5
r2
ID/EX
D
addr
din dout
M
B
B
D
A
nand
lw
add
sw
r4,
r7,
r3,
r5,
r6,
20(r2)
12(r3)
r1,
r2,
r4,r5
r2r5
OP
Rd
OP
EX/MEM
Rd
mem
PC+4
imm
0
36A
9
12
18
7B
41
Rb
77
22
OP
PC+4
+4
PC
r0
r1
r2
Rd
r3
Dr4
r5
r6
Ra
r7
nand
lw
add
sw
r4,
r7,
r3,
r5,
r6,
20(r2)
12(r3)
r1,
r2,
r4,r5
r2r5
Rd
0:add
1:nand
inst
2:lw
3:add
mem
4:sw
nand
lw
add
sw
r4,
r7,
r3,
r5,
r6,
20(r2)
12(r3)
r1,
r2,
r4,r5
r2r5
inst
nand
lw
add
sw
r4,
r7,
r3,
r5,
r6,
20(r2)
12(r3)
r1,
r2,
r4,r5
r2r5
MEM/WB
19
Time Graphs
Clock cycle
add
nand
1
2
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
lw
add
sw
Latency:
Throughput:
Concurrency:
3
4
5
6
7
8
9
EX MEM WB
CPI =
20
Pipelining Recap
Powerful technique for masking latencies
• Logically, instructions execute one at a time
• Physically, instructions execute in parallel
– Instruction level parallelism
Abstraction promotes decoupling
• Interface (ISA) vs. implementation (Pipeline)
21
Download