CMU 18-447 Introduction to Computer Architecture, Spring 2013 HW 1 Solutions: Instruction Set Architecture (ISA) Instructor: Prof. Onur Mutlu TAs: Justin Meza, Yoongu Kim, Jason Lin 1 The SPIM Simulator [5 points] There isn’t a solution per se to this. The expectation is that you are familiar with SPIM/XSPIM, understand its strengths and limitations, and are using it to debug your labs and homeworks. 2 3 Big versus Little Endian Addressing [5 points] 1. a0 0 d6 1 8f 2 34 3 2. 34 0 8f 1 d6 2 a0 3 Instruction Set Architecture (ISA) [25 points] Code size: Each instruction has an opcode and a set of operands • The opcode is always 1 byte (8 bits). • All register operands are 1 byte (8 bits). • All memory addresses are 2 bytes (16 bits). • All data operands are 4 bytes (32 bits). • All instructions are an integral number of bytes in length. Memory Bandwidth: Memory bandwidth consumed = amount of code transferred (code size) + amount of data transferred Amount of data transferred = number of data references * 4 bytes We will call the amount of code transferred as I-bytes and the amount of data transferred as D-bytes. (a), (b) Instruction Set Architecture Zero-address Opcode Operands I-bytes D-bytes PUSH PUSH ADD POP PUSH PUSH ADD POP PUSH PUSH SUB POP B C 3 3 1 3 3 3 1 3 3 3 1 3 30 4 4 A A C B A B D 1/4 Total Bytes 4 4 4 4 4 4 4 36 66 Instruction Set Architecture One-address Opcode Operands I-bytes D-bytes Total Bytes LOAD ADD STORE ADD STORE LOAD SUB STORE B C A C B A B D 3 3 3 3 3 3 3 3 24 4 4 4 4 4 4 4 4 32 56 5 5 5 5 5 5 5 5 5 45 12 12 12 12 12 12 12 12 12 108 153 7 7 7 21 12 12 12 36 57 4 4 4 4 4 4 4 4 32 4 4 Two-address SUB ADD ADD SUB ADD ADD SUB ADD SUB Three-address Memory-Memory ADD ADD SUB Three-address Load-Store LD LD ADD ST ADD ST SUB ST A, A A, B A, C B, B B, A B, C D, D D, A D, B A, B, C B, A, C D, A, B R1, R2, R1, R1, R3, R3, R3, R3, B C R1, R2 A R1, R2 B R1, R3 D 4 4 4 20 52 (c) The three-address memory-memory machine is the most efficient as measured by code size - 21 bytes. (d) The three-address load-store machine is the most efficient as measured by total memory bandwidth consumption (amount of code transferred + amount of data transferred) - 52 bytes. 4 The MIPS ISA [40 points] 4.1 Warmup: Computing a Fibonacci Number [15 points] NOTE: More than one correct solution exists, this is just one potential solution. fib: addi sw add sw add sw $sp, $16, $16, $17, $17, $18, $sp, -16 0($sp) $4, $0 4($sp) $0, $0 8($sp) // // // // // // allocate stack space save r16 r16 for arg n save r17 r17 for a, init to 0 save r18 2/4 addi $18, $0, 1 // r18 for b, init to 1 sw $31, 12($sp) // save return address add $2, $17, $18 // c = a + b branch: slti $3, $16, bne $3, $0, add $2, $17, add $17, $18, add $18, $2, addi $16, $16, j branch done: lw $31, lw $18, lw $17, lw $16, addi $sp, jr $31 4.2 2 // done $18 // $0 // $0 // -1 // 12($sp) 8($sp) 4($sp) 0($sp) $sp, 16 // // // // // // use r3 as temp c a b n = = = = a + b b c n - 1 restore r31 restore r18 restore r17 restore r16 restore stack pointer return to caller MIPS Assembly for REP MOVSB [25 points] Assume: $1 = ECX, $2 = ESI, $3 = EDI (a) beq $1, $0, AfterLoop CopyLoop: lb $4, 0($2) sb $4, 0($3) addiu $2, $2, 1 addiu $3, $3, 1 addiu $1, $1, -1 bne $1, $0, CopyLoop AfterLoop: Following instructions // If counter is zero, skip // // // // // // Load 1 byte Store 1 byte Increase source pointer by 1 byte Increase destination pointer by 1 byte Decrement counter If not zero, repeat (b) The size of the MIPS assembly code is 4 bytes × 7 = 28 bytes, as compared to 2 bytes for x86 REP MOVSB. (c) The count (value in ECX) is 0xdeadbeef = 3735928559. Therefore, loop body is executed (3735928559 * 6) = 22415571354 times. Total instructions executed = 22415571354 + 1 (beq instruction outside of the loop) = 22415571355. (d) The count (value in ECX) is 0x01000000 = 16777216. Therefore, loop body is executed (16777216 * 6) = 100663292 times. Total instructions executed = 100663292 + 1 (beq instruction outside of the loop) = 100663293. 3/4 5 Data Flow Programs [15 points] 6 Performance Metrics [10 points] • No, the lower frequency processor might have much higher IPC (instructions per cycle). More detail: A processor with a lower frequency might be able to execute multiple instructions per cycle while a processor with a higher frequency might only execute one instruction per cycle. • No, because the former processor may execute many more instructions. More detail: The total number of instructions required to execute the full program could be different on different processors. 7 Performance Evaluation [15 points] cycle ∗ 500, 000, 000 second = 5000 MIPS • ISA A: 10 instructions cycle cycle • ISA B : 2 instructions ∗ 600, 000, 000 second = 1200 MIPS cycle • Don’t know. The best compiled code for each processor may have a different number of instructions. 4/4