2019-2020-1 Principles Of Computer Organization Midterm – A Ⅰ、Choose the only one correct answer and fill in the brackets (Total 10 points, 5 questions, 2 points per question) 1. ( C ) are composed of hundred to thousand processors and terabytes of memory and having the highest performance and cost. (A) Desktop computers (B) Low-end servers (C) Supercomputers (D) Embedded computers 2. ( C ) Assume a color display using 8 bits for each of the primary colors (red, green, blue) per pixel and a frame size of 1024 × 768, what is the minimum size in bytes of the frame buffer to store a frame? (A) 768 K 3. ( D (B) 1024K (C) 2304K (D) 3072K ) Computer X’s performance is 2 times as fast as the performance of computer Y, which runs a given application in 10 seconds. How long will computer X take to run that application? (A) 8 seconds 4. ( A (B) 7 seconds (C) 6 seconds (D) 5 seconds ) Assume that registers $s0 and $s1 hold the values 0x80000000 and 0xD0000000, respectively. What is the value of $t0 for the assembly code "add $t0, $s0, $s1" ? (A) 0x50000000 5. ( C (B) 0x80000000 (C) 0xD0000000 (D) 0xF0000000 ) Suppose register $s0 has the number 0xF1234567 and that register $s1 has the binary number 0x12345678.What are the values of registers $t0 after this instruction? slt $t0, $s0, $s1 (A) 3 (B) 2 (C) 1 1 (D) 0 II. Consider two different processors P1, and P2 executing the same instruction set with the clock rates and CPIs given in the following table. (10 points) Processor Clock Rate CPI 1.5 P1 2.1 GHz P2 2.5 GHz 1.0 (1) Which processor has the highest performance expressed in instructions per second? (4 points) (2) We are trying to reduce the time by 20% but this leads to an increase of 20% in the CPI. What clock rate should we have to get this time reduction? (6 points) (1) P2 has the highest performance performance of P1 (instructions/sec) = 2.1 × 109/1.5 = 1.4 × 109 performance of P2 (instructions/sec) = 2.5 × 109/1.0 = 2.5 × 109 (2) Execution time = Instruction Count * CPI / Clock Rate Execution time (new) / Execution time(old) = ( CPI(new) / Clock Rate(new) ) / ( CPI(old) / Clock Rate(old) ) = CPI(new) * Clock Rate(old) / ( Clock Rate(new) * CPI (old) ) Execution time (new) = 0.8 * Execution time(old) CPI(new) = 1.2 * CPI(old) Clock Rate(new) = 1.2 * Clock Rate(old) / 0.8 (3 points) Clock Rate(P1 new) = 1.2 * 2.1 / 0.8 = 3.15 (GHz) (3 points) Clock Rate(P2 new) = 1.2 * 2.5 / 0.8 = 3.75 (GHz) 2 III. Suppose we have developed new versions of a processor with the following characteristics. (10 points) Version Voltage Clock Rate Version 1 5.0V 0.5 GHz Version 2 3.3V 1.0 GHz (1) How much has the capacitive load varied between versions if the dynamic power has been reduced by 10%? (2 points) (2) How much has the dynamic power been reduced if the capacitive load does not change? (2points) (3) Assuming that the capacitive load of version 2 is 80% the capacitive load of version 1, find the voltage for version 2 if the dynamic power of version 2 is reduced by 40% from version 1. (6 points) (1) Dynamic Power = V2 × clock rate × C / 2 . DP2 = 0.9 DP1 C2/C1 = 0.9 × 52 × 0.5 × 109/3.32 × 1 × 109 = 1.033 (2) DP2/DP1 = 0.8712 => Reduction of 12.88% (3) DP2 = V22 × 1 × 109 × 0.8 × C1 = 0.6 × DP1 DP1 = 52 × 0.5 × 109 × C1 V22 × 1 × 109 × 0.8 × C1 = 0.6 × 52 × 0.5 × 109 × C1 V2 = ( (0.6 × 52 × 0.5 × 109)/(1 × 109 × 0.8) )1/2 = 3.062 (V) 3 IV. The table below shows the instruction type breakdown of a given application executed on 1, or 2 processors. Using this data, you will be exploring the speed­ up of applications on parallel processors. Processors No. Instructions per Processor CPI Arithmetic Load/Store Branch Arithmetic Load/Store Branch 2560 1280 256 1 1 4 2 2 1 6 2 1350 800 128 (1) The table above shows the number of instructions required per processor to complete a program on a multiprocessor with 1, or 2 processors. What is the total number of instructions executed per processor? (4 points) (2) Given the CPI values on the right of the table above, find the total execution time for this program on 1, and 2 processors. Assume that each processor has a 2 GHz clock frequency. (6 points) (1) Processors Instructions per processor 4096 2278 1 2 (2) Processors Execution time (µs) 1 2 4.096 3.203 V. The following table shows manufacturing data for various processors. Wafer Diameter Dies per Wafer Defects per Unit Area Cost per Wafer 15 cm 90 10 0.018 defects/cm2 (1) Find the yield. ( 3 points ) (2) Find the cost per die. (2 points ) (3) If the number of dies per wafer is increased by 10% and the defects per area unit increases by 15%, find the die area and yield. (5 points) (1) Wafer area = × 7.52 = 176.7 cm2 Die area = 176.7/90 = 1.96 cm2 Yield =1/(1+(0.018*1.96/2))2 = 0.965 (2) Cost per die = 10/(0.965*90)=0.115 (3) Dies per wafer = 1.1 × 90 = 99 Defects per area = 1.15 × 0.018 = 0.0207 defects/cm2 Die area = wafer area/Dies per wafer = 176.7/99 = 1.785 cm2 Yield =1/(1+(0.0207*1.785/2))2 = 0.964 4 VI. The following table shows results for SPEC CPU2006 benchmark programs running on an AMD Barcelona. (10 points) Name Intr. Count(109) Execution Time (seconds) Reference Time (seconds) a. perl 2118 500 9770 b. mcf 336 1200 9120 (1) Find the CPI if the clock cycle time is 0.333 ns. (6 points) (2) Find the SPECratio. (2 points) (3) For these two benchmarks, find the geometric mean of the SPECratio. (2 points) (1) CPI = clock rate × CPU time/instr. count clock rate = 1/cycle time = 3 GHz CPI(perl) = 3 × 109 × 500/2118 × 109 = 0.7 CPI(mcf) = 3 × 109 × 1200/336 × 109 = 10.7 (2) SPECratio(perl) = 9770/5 00 = 19.54 SPECratio(mcf) = 9120/1200 = 7.6 (3) (19.54 × 7.6)1/2 = 12.19 VII. Consider a computer running programs with CPU times shown in the following table. (9 points) FP Instr. INT Instr. L/S Instr. Branch Instr. Total Time 35 s 85 s 50 s 30 s 200 s (1) How much is the total time reduced if the time for FP operations is reduced by 20%? ( 3 points) (2) How much is the time for INT operations reduced if the total time is reduced by 20%? ( 3 points) (3) Can the total time can be reduced by 20% by reducing only the time for branch instructions? (3 points) (1) Tfp = 35 × 0.8 = 28 s, Tnew_total = 28 + 85 + 50 + 30 = 193 s. Reduction: 3.5% (2) Tnew_total = 200 × 0.8 = 160 s, Tfp + Tl/s + Tbranch = 115 s, Tnew_int = 45 s. Reduction time INT: 47% (3) Tgoal = 200 × 0.8 = 160 s, Tfp + Tint + Tl/s = 170 s. NO 5 VIII. In the following problem, we will be investigating memory operations in the context of an MIPS processor. The table below shows the values of an array stored in memory. Assume the base address of the array is stored in register $s6 and offset it with respect to the base address of the array. (9 points) Address 12 08 04 00 Data 1 6 4 2 (1) For the memory locations in the table above, write C code to sort the data from lowest to highest, placing the lowest value in the smallest memory location shown in the figure. Assume that the data shown represents the C variable called Array, which is an array of type int, and that the first number in the array shown is the first element in the array. Assume that this particular machine is a byte-addressable machine and a word consists of four bytes. ( 5 points) (2) For the memory locations in the table above, write MIPS code to sort the data from lowest to highest, placing the lowest value in the smallest memory location. Use a minimum number of MIPS instructions. Assume the base address of Array is stored in register $s6. ( 4 points) (1) (5 points) temp = Array[3]; Array[3] = Array[2]; Array[2] = Array[1]; Array[1] = Array[0]; Array[0] = temp; (2) (4 points) temp = Array[3]; Array[3] = Array[2]; Array[2] = Array[1]; Array[1] = Array[0]; Array[0] = temp; lw lw sw lw sw lw sw sw 6 $t0, $t1, $t1, $t1, $t1, $t1, $t1, $t0, 12($s6) 8($s6) 12($s6) 4($s6) 8($s6) 0($s6) 4($s6) 0($s6) IX. For this problem, the table holds some C code. You will be asked to translate these C code statements to MIPS assembly code. Use a minimum number of instructions. Assume that the values of a, b, i, and j are in registers $s0, $s1, $t0, and $t1, respectively. ( 6 points) for(i=0; i<10; i++) a += b; addi $t0, $0, 0 LOOP: TEST: beq $0, $0, TEST add $s0, $s0, $s1 addi $t0, $t0, 1 slti $t2, $t0, 10 bne $t2, $0, LOOP 7 X. For the following problems, the table holds C code functions. Assume that the first function listed in the table is called first. You will be asked to translate these C code routines into MIPS assembly. (16 points) compare: int compare( int a, int b) { if( sub(a,b) >= 0 ) return 1 ; else return 0 ; } int sub(int a , int b ) { return a-b ; } addi $sp, $sp, –4 sw $ra, 0($sp) add $s0, $a0, $0 add $s1, $a1, $0 exit: sub: jal sub addi $t1, $0, 1 beq $v0, $0, exit slt $t2, $0, $v0 bne $t2, $0, exit addi $t1, $0, $0 add $v0, $t1, $0 lw $ra, 0($sp) addi $sp, $sp, 4 jr $ra sub $v0, $a0, $a1 jr $ra 8