CDA3101 Assignment 1 Due 2/4 Submissions are due by the beginning of class on the specified due date. Handwritten or typed solutions are acceptable. If you do write your solutions by hand, be sure to write clearly. If the grader cannot read your answer, they cannot give you the points. Late submissions will be accepted with a 10% penalty for each day they are late (up to 48 hours). You must show how you arrived at the answer and circle your final answer! 1. Consider three different processors P1, P2, and P3 executing the same instruction set with the clock rates and CPIs given in the following table. Processor P1 P2 P3 Clock Rate 2.2 GHz 1.7 GHz 2.9 GHz CPI 1.6 1.2 2.4 a. Which processor has the highest performance (as defined in class)? b. If the processors each execute a program in 12 seconds, find the number of instructions and the number of cycles for each processor. c. We are trying to reduce the current time of 12s by 30% but this leads to an increase of 20% in the CPI. For each processor, what clock rate should we have to achieve this time reduction? d. Using the results above, explain why it is inappropriate to compare the performance of each processor using the Clock Rate as a lone metric. What are the three key factors that affect performance? 2. The following table shows the number of instructions for a program. Arithmetic 600 Store 40 Load 120 Branch 40 Total 800 a. Assuming that arithmetic instructions take 1 cycle, load and store 5 cycles, and branch 2 cycles, what is the execution time of the program in a 2.2 GHz processor? b. What is the CPI for the program? c. If the number of load instructions can be reduced by one-half, what is the speedup and the new CPI? d. What is Amdahl’s law? Explain how the solution to part c supports this observation. 3. Translate the following MIPS assembly fragment below into C. Assume the C-level integer i is held in register $t1, $s2 holds the C-level integer called result and $s0 holds the base address of the integer array MemArray. LOOP: L1: addi sll addu lw bne addi addi slti bne $t1, $0, 0 $t2, $t1, 2 $t2, $t2, $s0 $t2, 0($t2) $t2, $0, L1 $s2, $t1, 0 $t1, $t1, 1 $t2, $t1, 100 $t2, $0, LOOP 4. Translate each of the following C statements below into MIPS assembly. Assume that the variables f, g, and h are assigned to registers $s0, $s1, and $s2, respectively. Assume that the base address of the arrays A and B are in registers $s6 and $s7, respectively. a. f = g – h + B[4]; b. f = g * A[B[3]]; 5. For each of the following actions, indicate the translation phase during which the action takes place (preprocessing, compiling, assembling, linking, or loading). a. Translating i = i + 1 to addi $t0, $t0, 1. b. Including the contents of <stdio.h>. c. Placing the symbolic names printf in the symbol table and call printf in the relocation table. d. Allocating space for a.out in main memory. e. Detecting the syntax error a * b =c;. f. Creating a.out from main.o and frac.o. g. Expansion of #define PI 3.14159 in the program text. h. Detecting the semantic error a = b, where a in an int and b is a char array. i. Translating addi $t0, $t0, 1 to 00100001000010000000000000000001. j. Updating the symbol table entry for printf (patching external reference). 6. In class, we said that the logic equation for the result of an adder can be expressed in the following way: ̅̅̅̅̅̅̅̅̅̅) + (a̅ ∙ b ∙ CarryIn ̅̅̅̅̅̅̅̅̅̅) + (a ∙ b ∙ CarryIn) Sum = (a̅ ∙ b̅ ∙ CarryIn) + (a ∙ b̅ ∙ CarryIn Using only AND, OR, and NOT gates, design the hardware that will implement Sum. 7. Using the truth table below, write a logic equation for D in terms of the input values A, B, and C. Your logic equation must be in canonical form as a sum-of-products. A 0 0 0 0 1 1 1 1 B 0 0 1 1 0 0 1 1 C 0 1 0 1 0 1 0 1 D 1 0 0 1 0 1 1 0 8. (EXTRA CREDIT) The tables below shows the instruction type breakdown of two applications A and B executed on 1, 2, 4, or 8 processors. Using this data, you will be exploring the speed-up of applications on parallel processors. Application A Processors Instructions Arithmetic 1 2800 2 1400 4 700 8 350 Per Load/Store 1360 680 340 170 Processor Branch 256 128 64 32 Arithmetic 1 1 1 1 CPI Load/Store 4 4 4 4 Branch 2 2 2 2 Per Load/Store 1280 800 600 500 Processor Branch 256 128 64 32 Arithmetic 1 1 1 1 CPI Load/Store 4 6 9 13 Branch 2 2 2 2 Application B Processors Instructions Arithmetic 1 2560 2 1350 4 800 8 600 a. The table above shows the number of instructions required per processor to complete a program on a multiprocessor with 1, 2, 4, and 8 processors. For each of the configurations of applications A and B: What is the total number of instructions executed per processor? What is the total number of instructions executed across all processors? b. Given the CPI values above, find the execution time of each application on 1, 2, 4, and 8 processors. Assume each processor has a 2.2 GHz clock frequency and keep in mind that the processors run in parallel. c. If the CPI of the arithmetic instructions was doubled, what would the impact be on the execution time of each program using 1, 2, 4, and 8 processors? d. Based on your solutions, is it always advantageous to further parallelize an application? What might account for the trends observed in parallelizing each application? Specifically, why might it be that when we increase the number of processors we observe continual execution time improvements for A, but not B?