CMU 18-447 Introduction to Computer Architecture, Spring 2013

advertisement
CMU 18-447 Introduction to Computer Architecture, Spring 2013
HW 1 Solutions: Instruction Set Architecture (ISA)
Instructor: Prof. Onur Mutlu
TAs: Justin Meza, Yoongu Kim, Jason Lin
1
The SPIM Simulator [5 points]
There isn’t a solution per se to this. The expectation is that you are familiar with SPIM/XSPIM, understand
its strengths and limitations, and are using it to debug your labs and homeworks.
2
3
Big versus Little Endian Addressing [5 points]
1.
a0
0
d6
1
8f
2
34
3
2.
34
0
8f
1
d6
2
a0
3
Instruction Set Architecture (ISA) [25 points]
Code size:
Each instruction has an opcode and a set of operands
• The opcode is always 1 byte (8 bits).
• All register operands are 1 byte (8 bits).
• All memory addresses are 2 bytes (16 bits).
• All data operands are 4 bytes (32 bits).
• All instructions are an integral number of bytes in length.
Memory Bandwidth:
Memory bandwidth consumed = amount of code transferred (code size) + amount of data transferred
Amount of data transferred = number of data references * 4 bytes
We will call the amount of code transferred as I-bytes and the amount of data transferred as D-bytes.
(a), (b)
Instruction Set Architecture
Zero-address
Opcode
Operands
I-bytes
D-bytes
PUSH
PUSH
ADD
POP
PUSH
PUSH
ADD
POP
PUSH
PUSH
SUB
POP
B
C
3
3
1
3
3
3
1
3
3
3
1
3
30
4
4
A
A
C
B
A
B
D
1/4
Total Bytes
4
4
4
4
4
4
4
36
66
Instruction Set Architecture
One-address
Opcode
Operands
I-bytes
D-bytes
Total Bytes
LOAD
ADD
STORE
ADD
STORE
LOAD
SUB
STORE
B
C
A
C
B
A
B
D
3
3
3
3
3
3
3
3
24
4
4
4
4
4
4
4
4
32
56
5
5
5
5
5
5
5
5
5
45
12
12
12
12
12
12
12
12
12
108
153
7
7
7
21
12
12
12
36
57
4
4
4
4
4
4
4
4
32
4
4
Two-address
SUB
ADD
ADD
SUB
ADD
ADD
SUB
ADD
SUB
Three-address
Memory-Memory
ADD
ADD
SUB
Three-address
Load-Store
LD
LD
ADD
ST
ADD
ST
SUB
ST
A, A
A, B
A, C
B, B
B, A
B, C
D, D
D, A
D, B
A, B, C
B, A, C
D, A, B
R1,
R2,
R1,
R1,
R3,
R3,
R3,
R3,
B
C
R1, R2
A
R1, R2
B
R1, R3
D
4
4
4
20
52
(c) The three-address memory-memory machine is the most efficient as measured by code size - 21 bytes.
(d) The three-address load-store machine is the most efficient as measured by total memory bandwidth
consumption (amount of code transferred + amount of data transferred) - 52 bytes.
4
The MIPS ISA [40 points]
4.1
Warmup: Computing a Fibonacci Number [15 points]
NOTE: More than one correct solution exists, this is just one potential solution.
fib:
addi
sw
add
sw
add
sw
$sp,
$16,
$16,
$17,
$17,
$18,
$sp, -16
0($sp)
$4, $0
4($sp)
$0, $0
8($sp)
//
//
//
//
//
//
allocate stack space
save r16
r16 for arg n
save r17
r17 for a, init to 0
save r18
2/4
addi $18, $0, 1
// r18 for b, init to 1
sw
$31, 12($sp) // save return address
add $2, $17, $18 // c = a + b
branch:
slti $3, $16,
bne $3, $0,
add $2, $17,
add $17, $18,
add $18, $2,
addi $16, $16,
j
branch
done:
lw
$31,
lw
$18,
lw
$17,
lw
$16,
addi $sp,
jr
$31
4.2
2
//
done
$18 //
$0 //
$0 //
-1 //
12($sp)
8($sp)
4($sp)
0($sp)
$sp, 16
//
//
//
//
//
//
use r3 as temp
c
a
b
n
=
=
=
=
a + b
b
c
n - 1
restore r31
restore r18
restore r17
restore r16
restore stack pointer
return to caller
MIPS Assembly for REP MOVSB [25 points]
Assume: $1 = ECX, $2 = ESI, $3 = EDI
(a) beq
$1, $0, AfterLoop
CopyLoop:
lb
$4, 0($2)
sb
$4, 0($3)
addiu $2, $2, 1
addiu $3, $3, 1
addiu $1, $1, -1
bne
$1, $0, CopyLoop
AfterLoop:
Following instructions
// If counter is zero, skip
//
//
//
//
//
//
Load 1 byte
Store 1 byte
Increase source pointer by 1 byte
Increase destination pointer by 1 byte
Decrement counter
If not zero, repeat
(b) The size of the MIPS assembly code is 4 bytes × 7 = 28 bytes, as compared to 2 bytes for x86 REP
MOVSB.
(c) The count (value in ECX) is 0xdeadbeef = 3735928559. Therefore, loop body is executed (3735928559
* 6) = 22415571354 times. Total instructions executed = 22415571354 + 1 (beq instruction outside of
the loop) = 22415571355.
(d) The count (value in ECX) is 0x01000000 = 16777216. Therefore, loop body is executed (16777216 * 6)
= 100663292 times. Total instructions executed = 100663292 + 1 (beq instruction outside of the loop)
= 100663293.
3/4
5
Data Flow Programs [15 points]
6
Performance Metrics [10 points]
• No, the lower frequency processor might have much higher IPC (instructions per cycle).
More detail: A processor with a lower frequency might be able to execute multiple instructions per cycle
while a processor with a higher frequency might only execute one instruction per cycle.
• No, because the former processor may execute many more instructions.
More detail: The total number of instructions required to execute the full program could be different on
different processors.
7
Performance Evaluation [15 points]
cycle
∗ 500, 000, 000 second
= 5000 MIPS
• ISA A: 10 instructions
cycle
cycle
• ISA B : 2 instructions
∗ 600, 000, 000 second
= 1200 MIPS
cycle
• Don’t know.
The best compiled code for each processor may have a different number of instructions.
4/4
Download