Test #1

advertisement
Test 1
ECE 484/584 -- Fall 2004
Chapters 1 - 4.4
Name: ___________________________
Student Number: ______________________
1) Given below is a segment of MIPS assembly code and the absolute addresses in hexadecimal where
each of the instructions is stored in memory. Assemble this code segment producing the hexadecimal
encoding for each instruction. (12 points total -- 4 points each)
0x00100010
0x00100014
0x00100018
0x0010001C
addi
slt
bne
loop:
addi
slt
bne
…
$t0, $t1, 8
$s0, $t0, $t1
$s0, $zero, loop
001000 01001 01000 0000 0000 0000 1000
000000 01000 01001 10000 00000 101010
0000101 10000 00000 1111 1111 1111 1101
 0x21280008
 0x0109802A
 0x1600FFFD
2) A computer designer is interested in improving the performance of looping operations used on Machine
A. Machine A executes a program in 10 seconds. This particular program spends 20% of its time
performing a loop that adds elements of an array using an array index. The computer designer can use
special instructions and addressing modes to accelerate the execution of loops. How much improvement in
loop execution must be provided by Machine B in order to obtain an overall execution time of 5 seconds
for this program? (10 points)
new _ exe _ time 
5 sec onds 
old _ exe _ time _ affected
 old _ exe _ time _ unaffected
amount _ of _ improvement
2 sec onds
8
amount _ of _ improvement
amount_of_improvemnt = 2/-3 =IMPOSSIBLE!!!!!!!!!!!!!
3) Give three (3) reasons why the metric called Millions of Instructions Per Second (MIPS) does not make
for a good evaluation metric for computer performance. (9 points – 3 points each)
1) instruction counts differ between machines (instruction capabilities differ between machines);
therefore, it is hard to compare different machines
2) instruction counts differ between programs on the same machine; therefore, MIPS rating of
machine differ program to program
3) can vary inversely to performance
4) inversely proportional to execution time
4) Suppose we have two different computers that we want to compare. The two computers have different
implementations of the same instruction set architecture, will execute the same program, and use the same
compiler. Machine A has a clock cycle time of 1.0 nanoseconds (ns) and a CPI of 2.4. Machine B has a
clock cycle time of 1.5 ns and a CPI of 2.1. Which machine is faster and by how much? (10 points)
Since both machines have the same the same ISA, the same program, and the same compiler, both will
execute the same number of instructions.
Machine A
clock time = 1.0 ns
CPI = 2.4
IC = X
Machine B
clock time = 1.5 ns
CPI = 2.1
IC = X
CPU_exe_time = (IC)(CPI)(clock_time)
CPU_exe_A = (X)(2.4)(1.0ns) = 2.4X ns
CPU_exe_B = (X)(2.1)(1.5ns) = 3.15X ns
Perf_A/Perf_B = CPU_exe_time_B/CPU_exe_time_A = (3.15X ns)/(2.4X ns) = 1.31
Machine A is 1.31 times faster than Machine B.
5) The following subroutine will return the total of all the values stored in an array given as inputs the
address of the first element of the array and the number of elements in the array. Write the correct MIPS
assembly for the calling function and the subroutine. Assume that variable result is allocated to register
$s1, that the base address of array A is assigned to register $s0, variable i is allocated to register $t0,
variable total is allocated to register $s1, that all input and output parameters are allocated according to the
conventions used in MIPS assembly, and finally that any temporary registers needed will be allocated to a
temporary register ($t0 - $t9). Preserve only those registers that must be preserved across the subroutine
call. (25 points)
int
array_total (int address, int number) {
int i, total;
i = 0;
total = 0;
do {
total = total + A[i];
i = i+1;
} while (i < number);
return (total);
}
void
main (void) {
int result, A[100];
…
result = array_total(&A[0], 100);
…
}
array_total:
loop:
addi
sw
$sp, $sp, -4
$s1, 0($sp)
add
add
$t0, $zero, $zero
$s1, $zero, $zero
add
add
add
lw
add
addi
slt
bne
add
lw
addi
jr
$t1, $t0, $t0
$t1, $t1, $t1
#t1=i*4
$t1, $t1, $a0
#t1=&A[i]
$t2, 0($t1)
$s1, $s1, $t2
$t0, $t0, 1
#i=i+1
$t3, $t0, $a1
$t3, $zero, loop
$v0, $zero, $s1
$s1, 0($sp)
$sp, $sp, 4
$ra
…
add
$a0, $s0, $zero
addi
$a1, $zero, 100
jal
array_total
add
$s1, $v0, $zero
…
NOTE: You could also include a test to see if the number of elements is greater than 0 at the beginning of
the subroutine. However, I did not require this.
main:
6) Define benchmark and discuss the advantages and the disadvantages of using synthetic benchmarks in
the evaluation of computer performance. (10 points)
A benchmark is a program specifically chosen to measure performance of a computer.
Advantages:
1) can represent a wide range of applications
2) already exist
3) give a common evaluation source
4) easy to hand compile
Disadvantages:
1) not always representative of real workload
2) many architectures and compilers have been optimized for these programs and indicate
false performance for real world workloads.
3) evaluation is complex if benchmark is a suite of programs
7) Given 16 bits to represent data in memory, what is the range of signed integers that can be represented if
two’s complement representation is used? Show the 16-bit hexadecimal representation for each of the
following decimal numbers using this type of representation. (10 points)
Range (2 points):
-32768 - + 32767
0 (2 points)
0x0000
24 (2 points)
0x0018
-2 (2 points)
0xFFFE
1112 (2 points)
0x0458
8) Given 16 bits to represent data in memory, what is the range of signed integers that can be represented if
sign-and-magnitude representation is used? Show the 16-bit hexadecimal representation for each of the
following decimal numbers using this type of representation. (10 points)
Range (2 points):
-32767 - + 32767
0 (2 points)
0x0000 OR 0x8000
24 (2 points)
0x0018
-2 (2 points)
0x8002
1112 (2 points)
0x0458
9) Sign-extend the 16-bit representations for decimal number -2 given in problems 7 and 8 to 32-bits.
Show your result in hexadecimal. (6 points – 3 points each)
Two’s complement:
0xFFFFFFFE
Sign-and-magnitude:
0x80000002
10) Given the following data collected using a benchmark suite consisting of three different applications,
which machine is faster and by how much? Application A represents 65% of the total workload,
application B represents 20 % of the entire workload, and application C represents 15% of the entire
workload. (10 points)
Machine 1
10 s
7s
4s
Application A
Application B
Application C
Machine 2
5s
9s
6s
Total execution time for Machine 1:
.6510  .27  .154  8.5
Total execution time for Machine 2:
.655  .29  .156  5.95
Performance_2/Performance_1 = execution_time_1/execution_time_2 = 8.5/5.95 = 1.43
Machine 2 is 1.43 times faster than Machine 1.
11) List the four (4) fundamental principles of hardware design and give one (1) example of how each is
used in the design of the MIPS architecture. (20 points)
1) simplicity favors regularity – all MIPS instructions are 32 bits long
all MIPS arithmetic instructions require 3 operands
2) smaller is faster – register set is limited to 32 registers
RISC architecture limits instruction set to speed up execution
3) good designs require good compromises -- introduction of I and J formats
cost verses performance tradeoffs
4) make the common case fast – support for immediate addressing
12) For each of the assembly statements given below, if it is a pseudo-instruction in MIPS assembly,
rewrite it using actual MIPS instructions. If it is a real MIPS instruction indicate that it is already a real
instruction. (3 points each)
A) move
add
B) clear
add
C) subi
addi
$t5, $t3
#$t5 = $t3
$t5, $t3, $zero
$t5
#$t5 = 0
$t5, $zero, $zero
$t0, $t1, 4
$t0, $t1, -4
#$t0 = $t1 - 4
D) bgt
slt
bne
E) bgt
$t5, $t3, L1
#if ($t5 > $t3) go to L1
$t0, $t3, $t5
$t0, $zero, L1
$t5, $t3, L1
#if ($t5 > $t3) go to L1
NOTE: Everyone got credit for this one since I messed it up!!!!!!!
slt
$t0, $t3, $t5
bne
$t0, $zero, L1
F) ble
slt
bne
beq
$t5, $t3, L1
#if ($t5 <= $t3) go to L1
$t0, $t5, $t3
$t0, $zero, L1
$t3, $t5, L1
Another solution:
slt
$t0, $t3, $t5
beq
$t0, $zero, L1
13) REQUIRED FOR GRADUATE STUDENTS – EXTRA CREDIT FOR UNDERGRADUATE
STUDENTS
Given the following C code segment, write the corresponding assembly instructions assuming the
instructions are to be executed on a machine having a stack architecture. Please elaborate on any
assumptions you make and clearly define the operation of any instructions you use. (7 points)
C = A + B;
push
push
add
pop
addressB
addressA
addressC
For the same C code segment, write the corresponding assembly instructions assuming the instructions are
to be executed on a machine having an accumulator architecture. . Please elaborate on any assumptions
you make and clearly define the operation of any instructions you use. (7 points)
load
add
store
addressB
addressA
addressC
Download