ECE369: Fundamentals of Computer Architecture

advertisement
ECE369
Pipelining
ECE369
1
addm
(rs), rt
# Memory[R[rs]] = R[rt] + Memory[R[rs]];
Assume that we can read and write the memory in the same cycle (like the register file, but this is
likely not efficient to do in a real machine). All instructions use the same format (shown below), but
not all instructions use all of the fields. Assume that each unused field is set to 0.
Field
Bits
op
31-26
rs
25-21
rt
20-16
ECE369
rd
15-11
imm
10-0
2
Instr
addm
RegDst RegWrite MemRead MemWrite ALUsrc MemToALU DataSrc PCSrc
x
0
1
1
0
0
ECE369
1
ALUOp
Add
0
3
Pipelining
One CPU manufacturer has proposed the 10-stage pipeline shown below.
Here are the correspondences between this and the MIPS pipeline:
• Instructions are fetched in the FET stage.
• Register reading is performed in the REG stage.
• ALU operations and memory accesses are both done in the EXE stage.
• Branches are resolved in the DET stage.
• WRB is the writeback stage.
• Write and Read on Memory or Register File can occur in the same cycle
Without forwarding, how many stall cycles are needed for the following
code? Show your work to get credit.
lw
$t0, 0($a0)
add
$v1, $t0, $t0
ECE369
4
Solution
ECE369
5
Assume that the initial value of R3 is R2+396,
How many cycles does this loop take to execute?
Loop:
LW
ADDI
SW
ADDI
SUB
BNEZ
R1, 0(R2)
R1, R1,#1
R1, 0(R2)
R2, R2, #4
R4, R3, R2
R4, Loop
-no forwarding or bypassing hardware.
-all memory and register writes occur during the first half and reads occur during
the second half of the clock cycle. (a register read and a register write in the same
cycle forwards through the register file).
-branching is handled by flushing the pipeline and branches are resolved in
Memory stage.
ECE369
6
branches are resolved in MEM. Second iterations starts 17
clock cycles after the first instructions. Last iterations takes
18 cycles. Loop executes 99 times. =>
98*17+18=1684cycles.
ECE369
7
Assume that the initial value of R3 is R2+396,
How many cycles does this loop take to execute?
Loop:
LW
ADDI
SW
ADDI
SUB
BNEZ
R1, 0(R2)
R1, R1,#1
R1, 0(R2)
R2, R2, #4
R4, R3, R2
R4, Loop
-with forwarding and bypassing hardware.
-all memory and register writes occur during the first half and reads occur during
the second half of the clock cycle. (a register read and a register write in the same
cycle forwards through the register file).
-Assume that branch is resolved in Memory stage and handled by predicting it as
not taken. {Use (m) for branch mis-prediction in the table}
ECE369
8
branches are resolved in MEM. Second iterations starts 10
clock cycles after the first instructions. Last iterations takes
11 cycles. Loop executes 99 times. =>
98*10+11=991cycles.
ECE369
9
Assume that the initial value of R3 is R2+396,
How many cycles does this loop take to execute?
Loop:
LW
ADDI
SW
ADDI
SUB
BNEZ
R1, 0(R2)
R1, R1,#1
R1, 0(R2)
R2, R2, #4
R4, R3, R2
R4, Loop
Assuming the MIPS pipeline with a single cycle delayed branch and normal
forwarding and bypassing hardware,
• Schedule the instructions in the loop including the branch delay slot.
• You may reorder the instructions and modify the individual instruction
operands, but do not undertake other loop transformations that change the
number or opcode of the instructions in the loop.
• Show a pipeline timing diagram and compute the number of cycles needed to
execute the entire loop.
ECE369
10
Loop:
=98*6+10=598 clocks
ECE369
LW
ADDI
SW
ADDI
SUB
BNEZ
R1, 0(R2)
R1, R1,#1
R1, 0(R2)
R2, R2, #4
R4, R3, R2
R4, Loop
11
Download