Test 1 ECE 484/584 -- Fall 2004 Chapters 1 - 4.4 Name: ___________________________ Student Number: ______________________ 1) Given below is a segment of MIPS assembly code and the absolute addresses in hexadecimal where each of the instructions is stored in memory. Assemble this code segment producing the hexadecimal encoding for each instruction. (12 points total -- 4 points each) 0x00100010 0x00100014 0x00100018 0x0010001C addi slt bne loop: addi slt bne … $t0, $t1, 8 $s0, $t0, $t1 $s0, $zero, loop 001000 01001 01000 0000 0000 0000 1000 000000 01000 01001 10000 00000 101010 0000101 10000 00000 1111 1111 1111 1101 0x21280008 0x0109802A 0x1600FFFD 2) A computer designer is interested in improving the performance of looping operations used on Machine A. Machine A executes a program in 10 seconds. This particular program spends 20% of its time performing a loop that adds elements of an array using an array index. The computer designer can use special instructions and addressing modes to accelerate the execution of loops. How much improvement in loop execution must be provided by Machine B in order to obtain an overall execution time of 5 seconds for this program? (10 points) new _ exe _ time 5 sec onds old _ exe _ time _ affected old _ exe _ time _ unaffected amount _ of _ improvement 2 sec onds 8 amount _ of _ improvement amount_of_improvemnt = 2/-3 =IMPOSSIBLE!!!!!!!!!!!!! 3) Give three (3) reasons why the metric called Millions of Instructions Per Second (MIPS) does not make for a good evaluation metric for computer performance. (9 points – 3 points each) 1) instruction counts differ between machines (instruction capabilities differ between machines); therefore, it is hard to compare different machines 2) instruction counts differ between programs on the same machine; therefore, MIPS rating of machine differ program to program 3) can vary inversely to performance 4) inversely proportional to execution time 4) Suppose we have two different computers that we want to compare. The two computers have different implementations of the same instruction set architecture, will execute the same program, and use the same compiler. Machine A has a clock cycle time of 1.0 nanoseconds (ns) and a CPI of 2.4. Machine B has a clock cycle time of 1.5 ns and a CPI of 2.1. Which machine is faster and by how much? (10 points) Since both machines have the same the same ISA, the same program, and the same compiler, both will execute the same number of instructions. Machine A clock time = 1.0 ns CPI = 2.4 IC = X Machine B clock time = 1.5 ns CPI = 2.1 IC = X CPU_exe_time = (IC)(CPI)(clock_time) CPU_exe_A = (X)(2.4)(1.0ns) = 2.4X ns CPU_exe_B = (X)(2.1)(1.5ns) = 3.15X ns Perf_A/Perf_B = CPU_exe_time_B/CPU_exe_time_A = (3.15X ns)/(2.4X ns) = 1.31 Machine A is 1.31 times faster than Machine B. 5) The following subroutine will return the total of all the values stored in an array given as inputs the address of the first element of the array and the number of elements in the array. Write the correct MIPS assembly for the calling function and the subroutine. Assume that variable result is allocated to register $s1, that the base address of array A is assigned to register $s0, variable i is allocated to register $t0, variable total is allocated to register $s1, that all input and output parameters are allocated according to the conventions used in MIPS assembly, and finally that any temporary registers needed will be allocated to a temporary register ($t0 - $t9). Preserve only those registers that must be preserved across the subroutine call. (25 points) int array_total (int address, int number) { int i, total; i = 0; total = 0; do { total = total + A[i]; i = i+1; } while (i < number); return (total); } void main (void) { int result, A[100]; … result = array_total(&A[0], 100); … } array_total: loop: addi sw $sp, $sp, -4 $s1, 0($sp) add add $t0, $zero, $zero $s1, $zero, $zero add add add lw add addi slt bne add lw addi jr $t1, $t0, $t0 $t1, $t1, $t1 #t1=i*4 $t1, $t1, $a0 #t1=&A[i] $t2, 0($t1) $s1, $s1, $t2 $t0, $t0, 1 #i=i+1 $t3, $t0, $a1 $t3, $zero, loop $v0, $zero, $s1 $s1, 0($sp) $sp, $sp, 4 $ra … add $a0, $s0, $zero addi $a1, $zero, 100 jal array_total add $s1, $v0, $zero … NOTE: You could also include a test to see if the number of elements is greater than 0 at the beginning of the subroutine. However, I did not require this. main: 6) Define benchmark and discuss the advantages and the disadvantages of using synthetic benchmarks in the evaluation of computer performance. (10 points) A benchmark is a program specifically chosen to measure performance of a computer. Advantages: 1) can represent a wide range of applications 2) already exist 3) give a common evaluation source 4) easy to hand compile Disadvantages: 1) not always representative of real workload 2) many architectures and compilers have been optimized for these programs and indicate false performance for real world workloads. 3) evaluation is complex if benchmark is a suite of programs 7) Given 16 bits to represent data in memory, what is the range of signed integers that can be represented if two’s complement representation is used? Show the 16-bit hexadecimal representation for each of the following decimal numbers using this type of representation. (10 points) Range (2 points): -32768 - + 32767 0 (2 points) 0x0000 24 (2 points) 0x0018 -2 (2 points) 0xFFFE 1112 (2 points) 0x0458 8) Given 16 bits to represent data in memory, what is the range of signed integers that can be represented if sign-and-magnitude representation is used? Show the 16-bit hexadecimal representation for each of the following decimal numbers using this type of representation. (10 points) Range (2 points): -32767 - + 32767 0 (2 points) 0x0000 OR 0x8000 24 (2 points) 0x0018 -2 (2 points) 0x8002 1112 (2 points) 0x0458 9) Sign-extend the 16-bit representations for decimal number -2 given in problems 7 and 8 to 32-bits. Show your result in hexadecimal. (6 points – 3 points each) Two’s complement: 0xFFFFFFFE Sign-and-magnitude: 0x80000002 10) Given the following data collected using a benchmark suite consisting of three different applications, which machine is faster and by how much? Application A represents 65% of the total workload, application B represents 20 % of the entire workload, and application C represents 15% of the entire workload. (10 points) Machine 1 10 s 7s 4s Application A Application B Application C Machine 2 5s 9s 6s Total execution time for Machine 1: .6510 .27 .154 8.5 Total execution time for Machine 2: .655 .29 .156 5.95 Performance_2/Performance_1 = execution_time_1/execution_time_2 = 8.5/5.95 = 1.43 Machine 2 is 1.43 times faster than Machine 1. 11) List the four (4) fundamental principles of hardware design and give one (1) example of how each is used in the design of the MIPS architecture. (20 points) 1) simplicity favors regularity – all MIPS instructions are 32 bits long all MIPS arithmetic instructions require 3 operands 2) smaller is faster – register set is limited to 32 registers RISC architecture limits instruction set to speed up execution 3) good designs require good compromises -- introduction of I and J formats cost verses performance tradeoffs 4) make the common case fast – support for immediate addressing 12) For each of the assembly statements given below, if it is a pseudo-instruction in MIPS assembly, rewrite it using actual MIPS instructions. If it is a real MIPS instruction indicate that it is already a real instruction. (3 points each) A) move add B) clear add C) subi addi $t5, $t3 #$t5 = $t3 $t5, $t3, $zero $t5 #$t5 = 0 $t5, $zero, $zero $t0, $t1, 4 $t0, $t1, -4 #$t0 = $t1 - 4 D) bgt slt bne E) bgt $t5, $t3, L1 #if ($t5 > $t3) go to L1 $t0, $t3, $t5 $t0, $zero, L1 $t5, $t3, L1 #if ($t5 > $t3) go to L1 NOTE: Everyone got credit for this one since I messed it up!!!!!!! slt $t0, $t3, $t5 bne $t0, $zero, L1 F) ble slt bne beq $t5, $t3, L1 #if ($t5 <= $t3) go to L1 $t0, $t5, $t3 $t0, $zero, L1 $t3, $t5, L1 Another solution: slt $t0, $t3, $t5 beq $t0, $zero, L1 13) REQUIRED FOR GRADUATE STUDENTS – EXTRA CREDIT FOR UNDERGRADUATE STUDENTS Given the following C code segment, write the corresponding assembly instructions assuming the instructions are to be executed on a machine having a stack architecture. Please elaborate on any assumptions you make and clearly define the operation of any instructions you use. (7 points) C = A + B; push push add pop addressB addressA addressC For the same C code segment, write the corresponding assembly instructions assuming the instructions are to be executed on a machine having an accumulator architecture. . Please elaborate on any assumptions you make and clearly define the operation of any instructions you use. (7 points) load add store addressB addressA addressC