Exam1Sol

advertisement
CSCI4250 Exam 1 Solution
Question 1 (4 points). Consider enhancing a computer by adding vector hardware.
When a computation is run in vector mode on the vector hardware, it is 100 times faster
than in the normal mode. We call the percentage of the original time that could be spent
using the vector mode the percentage of vectorization.
1a) What percentage of vectorization is necessary to achieve a speedup of 10? (Just set
up the equation to show how to get the answer.)
Speedup = (Old Execution Time)/(New Execution Time)
10 = ____________________1_____________________
Time in vector mode*1/100+ Time in normal mode
10 = _____1______
f/100+(1-f)
f = 10/11
1b) What percentage of the enhanced computation time is spent in vector mode if a
speedup of 10 is attained? (Just set up the equation to show how to get the answer.)
Percentage time = _Time in vectorization____
Total time
T = f/100___ = 1/110
1-f+f/100
11/110
= 1
11
Question 2 (6 points). (It is sufficient to derive the formulas, and not necessary to
calculate the final numerical results.) Suppose we build an optimizing compiler that
discards 50% of the ALU instructions (but cannot reduce other instructions). Assume that
the original total instruction count is 3*109 and that the original ALU instruction count is
2*109. Let the clock rate be 1-GHz, let each ALU instruction take 1 clock cycle, and let
each non-ALU instruction take 2 clock cycles.
2a) Calculate the original MIPS rate and execution time.
Execution Time = Instruction Count * Cycles per instructions * Clock cycle time
= (2*109 instructions *1 cycle/instruction + 1*109 instructions *2 cycles/instruction)
10 9 cycle/s
= 4s
MIPS rate = Instruction Count /106 = 3000 = 750 MIPS
Execution Time
4
2b) Calculate the new MIPS rate and execution time when we use the optimizing
compiler.
Execution Time = Instruction Count * Cycles per instructions * Clock cycle time
= (1*109 instructions *1 cycle/instruction + 1*109 instructions *2cycles/instruction)
10 9 cycle/s
= 3s
MIPS rate = Instruction Count /106 = 2000 = 666 MIPS
Execution Time
3
2c) Discuss the results in (a) and (b). Are there any contradictions?
At first, it would seem that a decrease in MIPS rate would lead to an increase in
execution time, which is clearly not the case here. However, one notes the optimizing
compiler eliminated the most lightweight instruction and so the MIPS rate will decrease
as the instructions that take multiple instructions become more dominant. This reinforces
the idea that MIPS rate is not a reliable indicator of performance in applications.
Question 3 (6 points). Consider a new addressing mode that allows one source operand
to be in memory. To reduce complexity, you restrict all memory addressing to be register
indirect only (i.e., no displacement). So, “ADD R1, R2, (R3)” adds the contents of
register R2 to the contents stored at address R3 in memory. However “ADD R1, (R2),
(R3)” and “ADD R1, R2, 4(R3)” are illegal instructions.
3a) Give an advantage of this new addressing mode.
This would greatly reduce execution time for working with arrays as it would not be
necessary to load the value before using it in execution. The loop overhead for setting up
these computations would be decreased. There are several instances where the new
addressing mode performs an operation in 1 instruction that used to take 2 instructions,
e.g.,
LD R1, 0(R2)
ADD R3, R3, R1
becomes
ADD R3, R3, 0(R2)
3b) Consider “ADD R1, (R2), (R3).” Suggest a simple code sequence of two instructions
to simulate this illegal instruction using the new addressing mode and only registers R1,
R2, and R3.
LD R1, (R2)
ADD R1, R1, (R3)
3c) Consider “ADD R1, R2, 4(R3).” Suggest a simple code sequence of two instructions
to simulate this illegal instruction using the new addressing mode and only registers R1,
R2 and R3.
ADDI R1, R3, #4
ADD R1, R2, (R1)
Question 4 (6 points). Consider single precision IEEE 754 representation with a
“truncate” policy. This is a 32-bit representation with 1 sign bit, 8 exponent bits encoded
using a bias of 127 and the remaining 23 bits used to encode the fractional part of the
mantissa.
2a) In class, you learned that the quantity of one-tenth has the binary representation of
0.000110011001100… Give the IEEE representation of two-tenths.
0.210= 2* .110
.110= 1.10011001100…*2-4, thus .210=1.100110011…*2-3.
Thus sign field is 0, exponent field is -3+bias=-3+127=124. Giving our representation as:
0
01111100
10011001100110011001100
2b) Give the IEEE representation of two.
210 = 21*1
Thus sign field is 0, exponent field is 1+bias=1+127=128. Giving our representation as:
0
10000000
00000000000000000000000
2c) Give the IEEE representation of two and two-tenths, i.e., 2.2.
Normalize and add
.210=1.100110011…*2-3=.0001100110011… *21
2=1*21
Thus 2.2=2+.2=1.0001100110011…*21, giving our representation as,
0
10000000
00011001100110011001100
Question 5 (6 points). A distinguished computer architect suggested these two factors to
improve performance:
 How fast you can crank up the clock;
 How many instructions you need to perform a task.
5a) Suppose that you can double the clock rate of your processor. Explain why you may
not see a performance improvement, if other parts of the machine are not changed.
Other parts of the machine may become a bottleneck for execution. In particular, the
speed of transferring information from memory becomes critical, as the processor may
have to stall to wait for retrieving data from memory.
5b) You are tempted to define a new instruction for a complex task that occurs
frequently. Suppose the new instruction requires many cycles for execution. Use
exceptions to explain why this new instruction may hurt performance.
This becomes very important on machines with out of order execution. If an instruction
that takes a large number of instructions to complete is put into the pipeline and other
instructions finish and overwrite the operands it reads from, it will become impossible to
have a precise exception, that is it is no longer possible to restart the instruction and
produce the proper result.
5c) What is a precise exception for a pipeline.
A precise exception is one in which all of the instructions prior to the exception have
been completed and do not need to be restarted and the state of the machine is such that it
is possible to restart execution of all instructions in the pipeline after execution.
Question 6 (6 points). Consider the simple 5-stage integer pipeline.
6a) What is data forwarding?
Forwarding is making the data available to subsequent instructions as soon as the
computation is complete and allowing instructions to receive this data in the beginning of
the EX stage instead of retrieving it in ID. Thus, the results of the ALU and MEM
register are given as possible source operands to the ALU.
6b) Write a MIPS assembly code to explain how data forwarding may eliminate the data
hazard stall cycle.
ADD R1, R1, 8
ADD R2, R3, R1
Without forwarding, the result of R1 is not written until clock cycle five and thus the
second ADD does not enter EX until cycle 6. With forwarding, the R1 result is piped
back to the input and the ADD is able to enter EX at cycle 4 with no stalls.
6c) Write a MIPS assembly code to show an example where forwarding cannot
completely eliminate the data hazard stall cycles.
LD R1, 0(R2)
ADD R1, R1, R1
Here the data is not available until after the MEM stage in cycle 4, and so this cannot be
forwarded to the ADD’s EX stage in cycle 4, this must stall and the EX of the ADD can
proceed in cycle 5. Data cannot be forwarded backwards in time.
Question 7 (6 points). Consider the simple 5-stage integer pipeline, where a branch
instruction causes a one-cycle delay.
7a) Construct an example using MIPS assembly code to show how you may schedule the
branch delay slot to always eliminate the one-cycle branch delay.
This can be done if there is an independent instruction before the branch instruction,
consider the following loop:
L1:
SUB R1, R1, -4
ADD R3, R3, R1
ADD R2, R2, R1
BNEZ R1, L1
Here either ADD instruction can be scheduled in the branch delay slot because they are
independent of the branch instruction.
7b) There are cases where scheduling the branch delay slot may not always eliminate the
one-cycle branch delay. Construct an example using MIPS assembly code to illustrate
one such case. What assumption must be made to ensure that the program will work
correctly?
If there are no readily available independent instructions, then the instruction to which the
branch may be chosen in the branch delay slot. In order this to be viable, it must be
possible to either stop execution of this instruction or remove the effects that the
instruction has on the state of the machine.
For example convert:
L1:
SUB R1, R2, R3
ADD R5, R5, 4
ADD R2, R5, R3
BNEZ R2, L1
to
Where the last SUB is in the branch delay slot.
SUB R1, R2, R3
L1:
ADD R5, R5, 4
ADD R2, R5, R3
BNEZ R2, L1
SUB R1, R2, R3
Download