cs470TakeHomeMidterm, part B

advertisement
CEC470, Spring 2013
Homework #1, part B
Name: _______________________
Open book, open notes; but do your own work, please. You’re on your honor here.
[50 pts total] Answer all parts (a) through (j), inclusive, below.
I have defined a new computer program to be used as a benchmark for performance measurements. It's
called the "ERAU Run". It contains the mix of instructions in the table, below (the numbers are not at all
intended to be real world  this is a torture for undergraduates, the numbers don’t have to be sensible ;-)
The table also shows the cycles per instruction and reflects the fact that our hardware includes a floating
point co-processor.
Instruction
Floating point multiply or divide
Floating point add or subtract
All others (non floating point)
Cycles
Required Per
Instruction
8
12
5
ERAU Run
Instruction
Count
7,000,000
2,000,000
10,000,000
The gcc compiler we use is intended for use on many different machines; some with floating point coprocessors, some without; so the user can, via compile time option, request that the code be compiled for a
floating point architecture or a non-floating point architecture  in which case the compiler must translate
each floating point instruction into a set of equivalent integer instructions that run on the main CPU only.
Assuming that our hardware is clocked at 100MHz:
(a) [3 pts] What is the average CPI for the FP version of the ERAU Run?
(b) [3 pts] How long will it take to execute this FP version of ERAU Run on our (pretty slow)
hardware?
(c) [3 pts] What is the MIPS rate for our hardware with the FP co-processor?
If we compile the ERAU Run with the no-FP-hardware option and then measure the running time of the
resultant (integer equivalent) program, we get a time of 4.25 seconds.
(d) [3 pts] How much faster is our program when we compile with the floating point co-processor
option than when we compile without it?
(e) [8 pts] What is the average number of integer instructions being produced by the compiler to
replace each floating point instruction? (Average over all FP instructions; you have no way of
telling how many integer instructions to replace an FP add versus how many for an FP divide, for
example.) As per the third row of the table above, continue to assume that the CPI for all integer
instructions is 5
(f) [2 pts] What is the MIPS rate of our hardware executing this integer-equivalent version of the
ERAU Run?
You now have three commonly used quantitative performance indices (elapsed time, MIPS, CPI) to look at.
(g) [2 pts] Which one provides the most accurate picture here and why?
As a result of your brilliant ERAU education, you come up with two improvements to the design of the FP
co-processor of the chip that our hardware uses. One of them reduces the number of cycles it takes to do
the FP multiply or divide from 8 to 7; the other reduces the number of cycles for the FP add or subtract
from 12 to 10.
CEC470, Spring 2013
Homework #1, part B
Name: _______________________
(h) [8 pts] Let’s assume you only want to make one of your two possible enhancements. Purely on a
performance basis – ignoring cost, that’s part (j), below – which one would you recommend and
why? (Justify your answer quantitatively.)
(i) [8 pts] The current FP co-processor chip is 1 in2 and contains 106 transistors (yes, I know, that’s
unrealistically low these days). The chip is being manufactured on 10 inch diameter wafers. If the
manufacturing process results in a defect density of 0.3 defects per in2, what is the net expected
number of usable dies from the wafer? Use my formula, not the book’s here. (You don’t know the
value for the book’s α factor anyway, I haven’t supplied one.)
(j) [10 pts] The potential improvement to the FP multiply/divide circuits would require 100,000 new
transistors; the improvement to the FP add/subtract circuits would take 150,000. Now what is
your final recommendation? Justify it quantitatively, of course. (Remember to consider that it
may not be cost effective to do either alternative.)
Download