Here are the notes for lecture 2

advertisement
THE INSTRUCTION SET
Summary: In this section we will examine common RISC instructions and
addressing modes, give examples of assembly-language usage, and show
how instructions can be encoded by bits in a way that will simplify the
hardware design in the following sections.
Instruction set design is perhaps the most central aspect of computer
design: design decisions here centrally affect all other hardware/software
endeavors. However, it’s a two-way road, since compiler, O.S. and hardware
design requires certain instructions to be included in the set.
RISC – reduced instruction set computer – architectures are much simpler
than CISC (complex instruction set computers), and have largely taken the
lead in performance. Why?
 Simpler instruction = simpler hardware
 Faster clock speeds since each instruction (usually) does only one thing
 One clock cycle per instruction (ideally!)
 Pipelining is simpler, easier to improve performance
 Smart compiler design is easier when instructions are simpler
MOST RISC MACHINES ARE FAIRLY SIMILAR (though recently some
CISC instructions are reappearing on RISC machines).
A few register quirks with our MIPS architecture:
 Integer Register 0 is read-only with a fixed value of 0
 Special registers HI, LO used for results of multiply and divide
 Register 31 is hardwired to be return-from-subroutine/interrupt addr.
Classes of instructions, with MIPS assembly syntax examples:
(abbreviated MIPS instruction set is in text’s back cover, full set p. A55++)
 Arithmetic/logic instructions – take data from one or two specified
registers and apply an arithmetic or logic function, put the result in a
specified register (same or different from source register).
1. These instructions use REGISTER ADDRESSING or
IMMEDIATE ADDRESSING.
2. add $3,$2,$27 (add contents of regs. 27 and 2, put sum in reg. 3)
1
NOTE that destination register is FIRST operand of instruction,
add is the opcode (operational code).
3. Immediate example: ori $21,$22,128 (or reg. 22 with value 128=
0000 0000 1000 0000 binary, put result in reg. 21).
4. Opcodes: add, addi, addu, addiu, sub, subi, subu, mult, multu, div,
divu, mfhi, mflo, mfc0, and, andi, or, ori, sll, srl
 Data transfer (or memory) instructions – move a specified register
location to or from RAM memory, at a specified address.
1. Can move byte, short, or word in MIPS architecture
2. Always uses BASE/DISPLACEMENT ADDRESSING
3. Specified offset is added to register content to form mem address
4. Write an unsigned byte to memory (store byte unsigned):
sbu $13, 201($22) (store contents of reg. 13 to memory address
formed by adding decimal 201 to content of reg. 22).
5. Read a signed word (32 bits) from memory (load word):
lw $18, 0($22) (copy data in mem loc given by reg 22 to reg 18)
6. Opcodes: lw, sw, lb, sb, lbu, sbu, lh, sh, lhu
7. Difference between lb and lbu?? Lb causes upper bits (31..8) to
duplicate bit 7. Why???? Unsigned loads leave any leftover higher
bits untouched.
8. NOTE that lui $14, 1000 (load upper immediate) takes the given
immediate value and loads it into the upper 16 bits of specified reg.
This can be treated as an ALU operation for our purposes.
 Branch and jump-related instructions
1. Most RISC machines do not have dedicated flag bits. Instead, for
conditional tests, two registers are compared to see whether their
contents are identical. (For some architecture, one register’s contents
are compared with zero).
2. Conditional branch uses PC-RELATIVE ADDRESSING:
Bne $0, $3, 24 (branch to PC + 24 if $0 and $3 aren’t same-valued)
3. Two conditional branch opcodes: beq, bne, but real MIPS has more.
4. To help in setting up for cond. branch, conditional test instructions are
provided:
slt $1, $2, $3 (set reg 1 to zero if reg 2 less than reg 3). Note that
registers are treated as SIGNED, use sltu for unsigned numbers.
Thus, a conditional test and branch operation requires two instructs.
Opcodes: slt, sltu, slti, sltiu.
2
5. Finally, a series of JUMP instructions are provided, using DIRECT
(more correctly, pseudodirect) ADDRESSING, where the jump
destination address is given in the instruction. There are several types:
 j 2354033
(jumps to the specified decimal address)
 jr $31
(jumps to address specified in register =rts?)
 jal 2354033
(jump and link = jump subroutine, return address is
put into R31).
Note that the real MIPS also has branch-and-link instruction.
So, let’s consider a simple ‘C’ translation into MIPS code:
int i;
int array[1000];
for(i=0; i<1000; i++)
array[i]=0;
Remember how for loop works?
1) initialization
2) test, exit loop if failure
3) body
4) goto 2
How to do this in MIPS assembly language?
Assign registers to i (let’s use $21) and pointer to array (use $22=8000)
add $21,$0, $0
addi $22, $0, 8000
addi $23, $0, 1000
loop: slt $24,$21,$23
bne $24, $0, exit
sw $0, 0($22)
addi $22, $22, 4
addi $21, $21, 1
j loop
exit:
# initialize i register R21
# initialize array pointer R22
# needed to count UP to 1000
# use $24 as flag
# if $24 = 0 means i=1000 and we’re done
# zero next array element
# update pointer to next element- WHY 4??
3
Let’s consider addressing in more detail:
Most big architectures encode data and instructions as 32-bit values. But
often information is in 8-bit ASCII (Amer. Standard Code for Information
Interchange), and it would be wasteful to use 32 bits for each 8-bit value.
Thus, memory is addressed by BYTES (8 bits), and 4 ASCII characters are
packed into each 32-bit word:
Addr.
0012
0008
0004
0000
Mem byte indices
12
13
14
08
09
10
04
05
06
00
01
02
15
11
07
03
Word number read from memory
3
2
1
0
What would happen if the processor did a lb for addr 6? The CPU reads 4
bytes at a time, so it will do a memory read at address 4, and will grab the 3 rd
byte in the word to load into the least significant byte of the destination reg!
Suppose that these bytes are parts of instructions. Instructions are 32 bits
wide. So after a reset, the CPU may start executing instruction 0000
(comprising bytes 0, 1, 2, 3). Where will the next instruction be read from?
0004. Thus, the program counter normally increments by 4 each instruction.
It turns out that ALL instructions MUST START ON 4-byte BLOCK
BOUNDARIES (0,4,8,12,16…)! Thus, trying to read an instruction at 0009
will result in a SEGMENTATION FAULT because part of the instruction
would be in one word and part of the instruction would be in the next.
So: instructions and words must be loaded at 4-byte block boundaries, bytes
can be loaded from any address, shorts (16 bits) must be from odd addresses.
To add to the confusion, some machines reverse the bytes within a word. For
the word at address 0, if byte 0 (at address 0) is the least significant byte, the
machine is called “little-endian”, and if byte 0 is loaded as the most
significant byte in the work, the machine is “big-endian” (MIPS can be
either!). This difference is the reason that raw data files transferred between
different computers may be read incorrectly on the new computer.
4
HOW CAN WE ENCODE THE MIPS INSTRUCTIONS AS BITS in a
consistent way? The more consistent we can be, the simpler (and faster) the
hardware. Remember, each instruction consists of a single 32-bit word.
The problem is that there are several addressing modes, with different (and
competing) requirements! So let’s consider them one by one.
REGISTER ADDRESSING (R-format)– needs 3 registers specified, rd
(destination), rs (source 1) and rt (source 2). It also needs bits to specify the
opcode, and bits that tell the ALU what function to carry out, funct. Thus,
we have the fields:
Op
rs
rt
rd
shamt
6
5
5
5
5
funct
6
with the numbers being the bits required to encode the word (leftmost bit is
the most significant). To encode 32 registers requires 5 bits. Shamt is the
number of bits to shift for the logical shift instructions (sll, srl).
IMMEDIATE ADDRESSING (I-format)– requires a 16-bit immediate
operand, in addition to source and destination registers. This does not leave
room for a function field, so each immediate instruction must use up one of
the 64 possible opcode patterns:
Op
rs
rd
6
5
5
immediate value
16
Note that the op and rs fields are in the same place, but rd is not. This will
add hardware to decode from which bits rd comes in the instruction, as we
will see shortly!
BASE ADDRESSING has the same form as immediate addressing, with 16
bits reserved for the offset. In this case, rs is the memory address pointer,
and rd/rt is the data source/destination.
PC-RELATIVE also has the same form as immediate addressing, with a 16bit program-counter (PC) offset, and two source registers to be compared.
5
JUMP ADDRESSING (J-TYPE) – needs as many bits allocated to the jump
address as possible, to allow as large a jump range as possible.
Op
6
jump target address
26
And again, that’s all the complexity there is! We can do a quick example of
decoding an instruction (as a disassembler would do):
What instruction is given by 10101110000010000000000100000000?
1. Extract the format type by decoding the opcode (MSB 6 bits):
101011 = 43, which is sw, using the I-format
2. Extract the other fields, knowing the format:
101011 = 43
opcode
10000 = 16
address pointer
01000 = 8
destination pointer
0000 0001 0000 0000 = 256 offset
So the instruction is:
sw $8, 256($16)
Example Problems
1) (3.1) Describe what the following program does:
begin:
addi $t0, $zero, 0
addi $t1, $zero, 1
loop:
slt
$t2, $a0, $t1
bne $t2, $zero, finish
add $t0, $t0, $t1
addi $t1, $t1, 2
6
finish:
j
add
loop
$v0, $t0, $zero
solution: the program computes the sum of odd integers up to the largest odd
number smaller than or equal to n. (ceiling(n/2))2
2) (3.4) Show the single MIPS instruction or minimal sequence of
instructions for this C statement:
a = b + 100;
Assume that a corresponds to register $t0, and b corresponds to register $t1
Solution:
addi $t0, $t1, 100
# register $t0 = $t1 + 100
3) (3.10) For each pseudoinstruction in the following table, produce a
minimal sequence of actual MIPS instructions to accomplish the same thing.
a) move $t5, $t3
add $t5, $t3, $zero
b) clear $t5
add $t5, $zero, $zero
c) li $t5, small
addi $t5, $zero, small
d) li $t5, big
lui
$t5, upper_half (big)
ori $t5, $t5, lower_half (big)
e) lw $t5, big($t3)
lui
$at, upper_half (big)
ori $at, $at, lower_half (big)
add $at, $at, $t3
lw
$t5, 0($at)
f) addi $t5, $t3, big
lui
$at, upper_half (big)
ori $at, $at, lower_half (big)
7
add
$t5, $t3, $at
g) beq $t5, small, L
addi $at, , $zero, small
beq $t5, $at, L
h) beq $t5, big, L
lui
$at, upper_half (big)
ori $at, $at, lower_half (big)
beq $at, $zero, L
i) ble $t5, $t3, L
slt
$at, $t3, $t5
beq $at, $zero, L
j) bgt $t5, $t3, L
slt
$at, $t3, $t5
bne $at, $zero, L
k) bge $t5, $t3, L
slt
$at, $t5, $t3
beq $at, $zero, L
Homework:
1) (3.5) Show the single MIPS instruction or minimal sequence of
instructions for this C statement:
x[10] = x[11] + c;
Assume that c corresponds to register $t0, and the array x has a base address
of 4, 000, 000
2) (3.12) Given your understanding of PC-relative addressing, explain why
an assembler might have problems directly implementing the branch
instruction in the following code sequence:
here:
.
beq
$t1, $t2, there
8
.
there:
add $t1, $t1, $t1
How might the code be rewritten to solve these problems?
3) (3.16) Suppose we have made the following measurements of average
CPI for instructions:
arithmetic
1.0 clock cycles
data transfer
1.4 clock cycles
conditional branch
1.7 clock cycles
Jump
1.2 clock cycles
Compute the effective CPI for MIPS for gcc and spice. Use the following
table:
gcc
spice
Arithmetic
48%
50%
Data Transfer
33%
41%
Conditional Branch
17%
8%
Jump
2%
1%
9
Download