THE INSTRUCTION SET Summary: In this section we will examine common RISC instructions and addressing modes, give examples of assembly-language usage, and show how instructions can be encoded by bits in a way that will simplify the hardware design in the following sections. Instruction set design is perhaps the most central aspect of computer design: design decisions here centrally affect all other hardware/software endeavors. However, it’s a two-way road, since compiler, O.S. and hardware design requires certain instructions to be included in the set. RISC – reduced instruction set computer – architectures are much simpler than CISC (complex instruction set computers), and have largely taken the lead in performance. Why? Simpler instruction = simpler hardware Faster clock speeds since each instruction (usually) does only one thing One clock cycle per instruction (ideally!) Pipelining is simpler, easier to improve performance Smart compiler design is easier when instructions are simpler MOST RISC MACHINES ARE FAIRLY SIMILAR (though recently some CISC instructions are reappearing on RISC machines). A few register quirks with our MIPS architecture: Integer Register 0 is read-only with a fixed value of 0 Special registers HI, LO used for results of multiply and divide Register 31 is hardwired to be return-from-subroutine/interrupt addr. Classes of instructions, with MIPS assembly syntax examples: (abbreviated MIPS instruction set is in text’s back cover, full set p. A55++) Arithmetic/logic instructions – take data from one or two specified registers and apply an arithmetic or logic function, put the result in a specified register (same or different from source register). 1. These instructions use REGISTER ADDRESSING or IMMEDIATE ADDRESSING. 2. add $3,$2,$27 (add contents of regs. 27 and 2, put sum in reg. 3) 1 NOTE that destination register is FIRST operand of instruction, add is the opcode (operational code). 3. Immediate example: ori $21,$22,128 (or reg. 22 with value 128= 0000 0000 1000 0000 binary, put result in reg. 21). 4. Opcodes: add, addi, addu, addiu, sub, subi, subu, mult, multu, div, divu, mfhi, mflo, mfc0, and, andi, or, ori, sll, srl Data transfer (or memory) instructions – move a specified register location to or from RAM memory, at a specified address. 1. Can move byte, short, or word in MIPS architecture 2. Always uses BASE/DISPLACEMENT ADDRESSING 3. Specified offset is added to register content to form mem address 4. Write an unsigned byte to memory (store byte unsigned): sbu $13, 201($22) (store contents of reg. 13 to memory address formed by adding decimal 201 to content of reg. 22). 5. Read a signed word (32 bits) from memory (load word): lw $18, 0($22) (copy data in mem loc given by reg 22 to reg 18) 6. Opcodes: lw, sw, lb, sb, lbu, sbu, lh, sh, lhu 7. Difference between lb and lbu?? Lb causes upper bits (31..8) to duplicate bit 7. Why???? Unsigned loads leave any leftover higher bits untouched. 8. NOTE that lui $14, 1000 (load upper immediate) takes the given immediate value and loads it into the upper 16 bits of specified reg. This can be treated as an ALU operation for our purposes. Branch and jump-related instructions 1. Most RISC machines do not have dedicated flag bits. Instead, for conditional tests, two registers are compared to see whether their contents are identical. (For some architecture, one register’s contents are compared with zero). 2. Conditional branch uses PC-RELATIVE ADDRESSING: Bne $0, $3, 24 (branch to PC + 24 if $0 and $3 aren’t same-valued) 3. Two conditional branch opcodes: beq, bne, but real MIPS has more. 4. To help in setting up for cond. branch, conditional test instructions are provided: slt $1, $2, $3 (set reg 1 to zero if reg 2 less than reg 3). Note that registers are treated as SIGNED, use sltu for unsigned numbers. Thus, a conditional test and branch operation requires two instructs. Opcodes: slt, sltu, slti, sltiu. 2 5. Finally, a series of JUMP instructions are provided, using DIRECT (more correctly, pseudodirect) ADDRESSING, where the jump destination address is given in the instruction. There are several types: j 2354033 (jumps to the specified decimal address) jr $31 (jumps to address specified in register =rts?) jal 2354033 (jump and link = jump subroutine, return address is put into R31). Note that the real MIPS also has branch-and-link instruction. So, let’s consider a simple ‘C’ translation into MIPS code: int i; int array[1000]; for(i=0; i<1000; i++) array[i]=0; Remember how for loop works? 1) initialization 2) test, exit loop if failure 3) body 4) goto 2 How to do this in MIPS assembly language? Assign registers to i (let’s use $21) and pointer to array (use $22=8000) add $21,$0, $0 addi $22, $0, 8000 addi $23, $0, 1000 loop: slt $24,$21,$23 bne $24, $0, exit sw $0, 0($22) addi $22, $22, 4 addi $21, $21, 1 j loop exit: # initialize i register R21 # initialize array pointer R22 # needed to count UP to 1000 # use $24 as flag # if $24 = 0 means i=1000 and we’re done # zero next array element # update pointer to next element- WHY 4?? 3 Let’s consider addressing in more detail: Most big architectures encode data and instructions as 32-bit values. But often information is in 8-bit ASCII (Amer. Standard Code for Information Interchange), and it would be wasteful to use 32 bits for each 8-bit value. Thus, memory is addressed by BYTES (8 bits), and 4 ASCII characters are packed into each 32-bit word: Addr. 0012 0008 0004 0000 Mem byte indices 12 13 14 08 09 10 04 05 06 00 01 02 15 11 07 03 Word number read from memory 3 2 1 0 What would happen if the processor did a lb for addr 6? The CPU reads 4 bytes at a time, so it will do a memory read at address 4, and will grab the 3 rd byte in the word to load into the least significant byte of the destination reg! Suppose that these bytes are parts of instructions. Instructions are 32 bits wide. So after a reset, the CPU may start executing instruction 0000 (comprising bytes 0, 1, 2, 3). Where will the next instruction be read from? 0004. Thus, the program counter normally increments by 4 each instruction. It turns out that ALL instructions MUST START ON 4-byte BLOCK BOUNDARIES (0,4,8,12,16…)! Thus, trying to read an instruction at 0009 will result in a SEGMENTATION FAULT because part of the instruction would be in one word and part of the instruction would be in the next. So: instructions and words must be loaded at 4-byte block boundaries, bytes can be loaded from any address, shorts (16 bits) must be from odd addresses. To add to the confusion, some machines reverse the bytes within a word. For the word at address 0, if byte 0 (at address 0) is the least significant byte, the machine is called “little-endian”, and if byte 0 is loaded as the most significant byte in the work, the machine is “big-endian” (MIPS can be either!). This difference is the reason that raw data files transferred between different computers may be read incorrectly on the new computer. 4 HOW CAN WE ENCODE THE MIPS INSTRUCTIONS AS BITS in a consistent way? The more consistent we can be, the simpler (and faster) the hardware. Remember, each instruction consists of a single 32-bit word. The problem is that there are several addressing modes, with different (and competing) requirements! So let’s consider them one by one. REGISTER ADDRESSING (R-format)– needs 3 registers specified, rd (destination), rs (source 1) and rt (source 2). It also needs bits to specify the opcode, and bits that tell the ALU what function to carry out, funct. Thus, we have the fields: Op rs rt rd shamt 6 5 5 5 5 funct 6 with the numbers being the bits required to encode the word (leftmost bit is the most significant). To encode 32 registers requires 5 bits. Shamt is the number of bits to shift for the logical shift instructions (sll, srl). IMMEDIATE ADDRESSING (I-format)– requires a 16-bit immediate operand, in addition to source and destination registers. This does not leave room for a function field, so each immediate instruction must use up one of the 64 possible opcode patterns: Op rs rd 6 5 5 immediate value 16 Note that the op and rs fields are in the same place, but rd is not. This will add hardware to decode from which bits rd comes in the instruction, as we will see shortly! BASE ADDRESSING has the same form as immediate addressing, with 16 bits reserved for the offset. In this case, rs is the memory address pointer, and rd/rt is the data source/destination. PC-RELATIVE also has the same form as immediate addressing, with a 16bit program-counter (PC) offset, and two source registers to be compared. 5 JUMP ADDRESSING (J-TYPE) – needs as many bits allocated to the jump address as possible, to allow as large a jump range as possible. Op 6 jump target address 26 And again, that’s all the complexity there is! We can do a quick example of decoding an instruction (as a disassembler would do): What instruction is given by 10101110000010000000000100000000? 1. Extract the format type by decoding the opcode (MSB 6 bits): 101011 = 43, which is sw, using the I-format 2. Extract the other fields, knowing the format: 101011 = 43 opcode 10000 = 16 address pointer 01000 = 8 destination pointer 0000 0001 0000 0000 = 256 offset So the instruction is: sw $8, 256($16) Example Problems 1) (3.1) Describe what the following program does: begin: addi $t0, $zero, 0 addi $t1, $zero, 1 loop: slt $t2, $a0, $t1 bne $t2, $zero, finish add $t0, $t0, $t1 addi $t1, $t1, 2 6 finish: j add loop $v0, $t0, $zero solution: the program computes the sum of odd integers up to the largest odd number smaller than or equal to n. (ceiling(n/2))2 2) (3.4) Show the single MIPS instruction or minimal sequence of instructions for this C statement: a = b + 100; Assume that a corresponds to register $t0, and b corresponds to register $t1 Solution: addi $t0, $t1, 100 # register $t0 = $t1 + 100 3) (3.10) For each pseudoinstruction in the following table, produce a minimal sequence of actual MIPS instructions to accomplish the same thing. a) move $t5, $t3 add $t5, $t3, $zero b) clear $t5 add $t5, $zero, $zero c) li $t5, small addi $t5, $zero, small d) li $t5, big lui $t5, upper_half (big) ori $t5, $t5, lower_half (big) e) lw $t5, big($t3) lui $at, upper_half (big) ori $at, $at, lower_half (big) add $at, $at, $t3 lw $t5, 0($at) f) addi $t5, $t3, big lui $at, upper_half (big) ori $at, $at, lower_half (big) 7 add $t5, $t3, $at g) beq $t5, small, L addi $at, , $zero, small beq $t5, $at, L h) beq $t5, big, L lui $at, upper_half (big) ori $at, $at, lower_half (big) beq $at, $zero, L i) ble $t5, $t3, L slt $at, $t3, $t5 beq $at, $zero, L j) bgt $t5, $t3, L slt $at, $t3, $t5 bne $at, $zero, L k) bge $t5, $t3, L slt $at, $t5, $t3 beq $at, $zero, L Homework: 1) (3.5) Show the single MIPS instruction or minimal sequence of instructions for this C statement: x[10] = x[11] + c; Assume that c corresponds to register $t0, and the array x has a base address of 4, 000, 000 2) (3.12) Given your understanding of PC-relative addressing, explain why an assembler might have problems directly implementing the branch instruction in the following code sequence: here: . beq $t1, $t2, there 8 . there: add $t1, $t1, $t1 How might the code be rewritten to solve these problems? 3) (3.16) Suppose we have made the following measurements of average CPI for instructions: arithmetic 1.0 clock cycles data transfer 1.4 clock cycles conditional branch 1.7 clock cycles Jump 1.2 clock cycles Compute the effective CPI for MIPS for gcc and spice. Use the following table: gcc spice Arithmetic 48% 50% Data Transfer 33% 41% Conditional Branch 17% 8% Jump 2% 1% 9