Summer CDA5155 Homework 1 Due: May 28th, 2010, 11:59pm You are not allowed to take or give help in completing this assignment. Submit the PDF version of the submission in e-Learning website before the deadline. Please include the sentence in bold on top of your submission: “I have neither given nor received any unauthorized aid on this assignment”. Total Points: 70 pts 1. [10 points] Using the following table, solve the following questions: Chip Num of Cores Memory performance Processor performance Athlon 64 X2 4800 + 2 3423 20178 Pentium EE 840 2 3228 18893 Pentium D 820 2 3000 15220 Athlon 64 X2 3800 + 2 2941 17129 Pentium 4 1 2731 7621 Athlon 64 3000+ 1 2953 7628 Pentium $ 570 1 3501 11210 Processor X 1 7000 5000 a. Create a table similar to the given table, except express the results as normalized to the Pentium 4 for both memory performance and processor performance. b. Calculate the arithmetic mean of the performance of each processor using both the original performance and your normalized performance in part a). c. Given the answer from part b), are there any conflicting conclusions you can make? 2. [15 points] Your company’s internal studies show that a single-core system is sufficient for the demand on your processing power. You are exploring, however, whether you could save power by using two cores. a. Assume that your application is 90% parallelizable. By how much could you decrease the frequency and get the same performance? b. Assume that the voltage may be decreased linearly with the frequency. Using the equation in Section 1.5, how much dynamic power would the dual-core system require as compared to the single-core system? c. Now assume that the voltage may not decrease below 30% of the original voltage. This voltage is referred to as the “voltage floor,” and any voltage lower than that will lose the state. Using the equation in Section 1.5, how much dynamic power would the dual‐core system require from part (a) compared to the single‐core system when taking into account the voltage floor? 3. [10 points] You are designing a 32-bit instruction-set architecture which needs to support 100 opcodes, three source operands and two destination operands. All the source and destination operands are registers. Moreover, all the operands should be able to access all the registers. What is the maximum size of the register file that this architecture can use (show your computations)? 4. [15 points] In the load-store architecture of MIPS, operands of arithmetic and logical instruction must be from registers. For a typical integer program, the instruction distribution and CPI of 4 groups are given in the following table. Type Frequency CPI ALU 50% 1 Load 25% 2 Store 15% 2 Branch 10% 4 a. Calculate the average CPI of the integer program. b. Now, assume that a set of new memory-register type of arithmetic and logical instructions are added into the ISA. Each memory-register ALU instruction combines one Load and one original ALU instruction together. It takes 4 cycles to execution this new type of instruction. Assume 60% of the load instructions can be combined for the program; calculate the new CPI of the integer program. c. Assume the modification makes the overall cycle time increased by 5%. Is this modification really worthwhile? 5. [20 points] Assume that values A, B, C and D reside in memory. Also assume that instruction operation codes are represented in 8 bits, memory addresses are 64 bits and register addresses are 8 bits. Assume all the data are 32-bits, and the instruction lengths are in the table. a. Write the code sequence for D=A+B*(A+C) for the following instruction set architectures: 1) Stack; 2) Accumulator; 3) Register (Register-memory); 4) Register (Load-Store). (You can refer to class slides, or Figure B.1-B.2 on page B-4 of the Appendix B ) b. Compute the total instruction number and code size for each sequence you get. c. Compute how many bytes are transferred to or from the memory in executing the code sequences, including fetching instructions, read data, write data. ISA Stack Accumulator Register-memory Load-Store Instruction Length (bits) 8 or 72 72 32 or 80 or 88 32 or 80