(15 pts) Exercise 4-81

advertisement
(15 pts) Exercise 4-81
•
•
1
IF
A.) Draw a pipeline stage diagram for the following sequence of instructions.
You don’t need fancy pictures – just text for each stage: ID, MEM, etc.
SHOW the cycle number for each column, starting at cycle #1.
lw
$v0, 0($a0)
lw
$v1, 0($v0)
add $a0, $a0, $v1
sub $t0, $t0, $a0
2
3
4
5
6
7
ID EX MEM WB
IF ID ** EX MEM WB
IF ** ID ** EX
IF ** ID
8
9
10
MEM WB
EX MEM WB
(wait for $v0, needed in EX)
(wait for $v1, needed in EX)
(no stall b/c forwarding)
B.) Now assume that the above sequence of instructions is repeated 100 times
(so the processor does a load, another load, an add, a sub, then back to a
load, another load, an add, …). There are no branches, just these 4
instructions repeated 100 times. Do NOT reorder the code to improve
efficiency.
HINT: this is more than a “30 second” question. Think carefully about
when each instruction will execute.
i.) What is the total number of cycles needed to execute those 400
instructions?
No additional stalls needed when going from the sub of one iteration to the first lw of the next.
The sequence shown (one iteration) takes a total of 10 cycles. Each additional iteration will add
6 cycles onto this (4 cycles to start the 4 instructions, plus 2 stalls).
So the total is 10 + 99 * 6 = 10 + 594 = 604
ii.) What is the average CPI for executing that sequence?
CPI = cycles / inst = 604 / 400 = approx 1.5
(15 pts) Exercise 4-82
•
A.) Do the same thing as with Exercise 4-81, except assume that there is NO
forwarding.
lw
lw
add
sub
1
IF
2
ID
IF
3
EX
**
$v0,
$v1,
$a0,
$t0,
0($a0)
0($v0)
$a0, $v1
$t0, $a0
4
5
MEM WB
** ID
IF
6
EX
**
7
8
MEM WB
** ID
IF
9
EX
**
10
11
12
13
14
(wait for $v0 to reach WB, need in ID)
MEM WB (wait for $v1 to reach WB, need in ID)
** ID EX MEM WB (wait for $t0)
You might also say that for instruction #2, the ID starts in
cycle 3 – at that point we realize that it needs to stall. We
write ID in cycle 5 because ID needs to repeat at that point to
read $v0 from the register file.
(this is UNLIKE when we have forwarding, when after a stall
normally we are proceeding and getting a needed value via
forwarding, rather than from the register file)
B.) Again assume that the above sequence of instructions is repeated 100
times (so the processor does a load, another load, an add, a sub, then
back to a load, another load, an add, …). There are no branches, just
these 4 instructions repeated 100 times. Do NOT reorder the code to
improve efficiency. As with immediately above, assume there is NO
forwarding.
HINT: again this is more than a “30 second” question. Think carefully
about when each instruction will execute.
i.) What is the total number of cycles needed to execute those 400
instructions?
No additional stalls needed when going from the sub of one iteration to the first of the next (the
first lw, which needs $a0, won’t start until $a0 has already been written back by the add
instruction).
The sequence shown (one iteration) takes a total of 14 cycles. Each additional iteration will add
10 cycles onto this (4 cycles to start the 4 instructions, plus 6 stalls).
So the total is 14 + 99 * 10 = 14 + 990 = 1004
ii.) What is the average CPI for executing that sequence?
CPI = cycles / inst = 1004 / 400 = approx 2.5
(5 pts) Exercise #4-86
•
Can you rewrite this code to eliminate stalls? Circle YES or NO
(as usual, assume we do have forwarding)
YES
1.
2.
3.
4.
•
lw
lw
sub
sw
$a0,
$v0,
$t1,
$v0,
0($t1)
0($a0)
$s2, $t3
4($s1)
If any improvement is possible (to reduce stalls), show the new version
here:
Move #3 to #2. The sw won’t stall b/c it doesn’t need $v0 until
the MEM stage – so all stalls removed.
NOTE – when re-ordering, must maintain program
correctness! For instance, load of $v0 must execute before
the store of $v0, because that is order of original program.
The subtract uses different registers, so can move freely.
(10 pts) Exercise 4-87: True or False? (circle for each)
1. A pipelined implementation will have a faster clock rate than a
comparable single cycle implementation ( TRUE or FALSE )
true
2. Pipelining increases performance by splitting up each instruction into
stages, thereby decreasing the time needed to execute each
instruction ( TRUE or FALSE )
False – individual instructions take longer to finish!
3. “Backwards” branches are likely to be taken ( TRUE or FALSE )
True– backwards is most likely a loop
4. Dynamic branch prediction is done by the compiler ( TRUE or FALSE )
False – done at runtime
5. Most modern processors use pipelining ( TRUE or FALSE )
true
Download