What You Will Learn in this Set of Lectures • What is Reduced Instruction Set Computer (RISC) and Why • Instruction Set Architecture of MIPS, a RISC Machine • Alternatives to RISC (e.g. Complex Instruction Set Computers (CISC)) • Preparation for Learning How to Design a Data Path that Executes the MIPS Instruction Set in Next Lecture Set 1 Savio Chau What is RISC and Why? • RISC is an architecture design concept based on the principle that simpler hardware runs faster (e.g. MIPS). It uses smaller and regular instruction set to achieve performance, while relying on compiler technology to achieve functions used to done by complex instructions. • Opposite to RISC is Complex Instruction Set Computer (CISC) (e.g. Intel x86). CISC believes complex instructions implemented in hardware can achieve higher performance. Language directed architecture such as Burroughs’ B5500 (Algol) or B4500 (Cobol) are extreme cases. 350 300 RISC Performance 250 200 150 Intel x86 100 RISC introduction 50 0 1982 Courtesy D. Patterson 1984 1986 1988 2 Year 1990 1992 1994 Savio Chau The MIPS Instruction Set MIPS is a Reduced Instruction Set Computer (RISC), Characterized By: • A Relatively Small Number Of Instructions and Data Types • All Instructions Are Of The Same Length • There Are A Very Small Number Of Instruction Formats (3) • It is a Load- Store Machine: Computation Is Done On Data In Registers i. e., Operands of Arithmetic And Logical Operations Do Not Reside In Memory. Data Is Moved Between Memory And Registers Before Being Used and Back To Memory After Computation Is Finished By Load and Store Instructions • There Are A Small Number Of Addressing Modes - Three For Accessing Operands (Register- Direct, Based, Immediate) and One For Computing Jump Addresses (PC- Relative) Courtesy M. Louie 3 Savio Chau Instruction Set Architecture Design Principles • Four things need to be considered – What operations will be performed by this instruction set? • Op code – What kind of data this instruction set will operate on? • Data type – Where can you store the data? • Memory and register architecture – How to access those data? • Address modes • Four design principles (that lead to reduced instruction set architectures) – – – – Simplicity favors regularity Smaller is faster Good design demands good compromises Make the common case fast 4 Savio Chau What Operations Performed by MIPS ISA? • Basic Functions – Arithmetic and Logic Operations • Need 2 source data and 1 destination to store result – Data transfer to and from the memory • Load a data from a memory address (+ offset) and put it in a register • Store a data in a register into a memory address (+ offset) – Conditional branches • Need a way to determine a condition • Need a target memory address to branch to if condition is met (or not met), go to next instruction otherwise – Jump and subroutine linkage (procedure call) • Need a large range of target memory address to branch • Procedure call needs a return address • Additional functions: Examples – Move data between registers, to I/O, or to co-processor – Exception and Interrupt Instructions 5 Savio Chau What Data will be Operated on by MIPS ISA? • Data Types – Signed: Numbers that can be either positive or negative • Integers • Floating Point Numbers • Relative Address – Unsigned: just positive • Absolute Address • Logical data • Data Sizes – Word (32 bits) – Half-word (16 bits) – Byte (8 bits) 6 Savio Chau Where the MIPS ISA Stores Data? • In Registers — used by R-Type instructions) • Within the instruction — used by I-Type instructions • In Memory — always transferred to/from a register by a load/store (also I-Type) instruction first – The memory address is in a register – The memory address in a constant in the instruction – The memory address is sum of a register and a constant in the instruction – The memory address is relative to the PC (i.e., sum of the PC and a constant in the instruction) 7 Savio Chau MIPS R2000 / R3000 Programmable Storage • Memory: – 232 Addresses (32-bit Address) – 230 Memory Words; 1 Word = 4 Bytes – Byte Addressable • Registers: – 31 32-Bit General Purpose Registers R1 - R31 General Purpose Register 0 = Constant Value 0 Num Name Use Num Name R0 r0 numeric 0 R8 t0 R1 at assembler R9 t1 R2 v0 results R10 t2 R3 v1 results R11 t3 R4 a0 arguments R12 t4 R5 a1 arguments R13 t5 R6 a2 arguments R14 t6 R7 a3 arguments R15 t7 Use temporary temporary temporary temporary temporary temporary temporary temporary Num Name R16 s0 R17 s1 R18 s2 R19 s3 R20 s4 R21 s5 R22 s6 R23 s7 Use saved saved saved saved saved saved saved saved Num Name R24 t8 R25 t9 R26 k0 R27 k1 R28 gp R29 sp R30 fp R31 ra Use temporary temporary OS Kernal OS Kernal global ptr stack ptr frame ptr return addr Note: The indicated register usage is by convention only, not restricted by the MIPS architecture 8 Savio Chau MIPS R2000 / R3000 Programmable Storage • Registers (Continued) – 32 Floating Point Registers (F0-F31) Num FP0 FP2 FP4 FP6 Num FP8 FP10 FP12 FP14 • Double Precision = 16 FP Register Pairs • Single Precision = 16 FP Registers (Even Addresses) Num FP16 FP18 FP20 FP22 Num FP24 FP26 FP28 FP30 • Other Registers – PC = Program Counter Register – HI and LO: for 64- Bit Integer Arithmetic Results • (Multiplication) HI, LO = 64- Bit Integer Product • (Division) LO= Quotient, HI= Remainder – Exception Registers • EPC (Execption Program Counter): Address of instruction causing exception • Cause: Exception type and pending interrupt bits • BadVaddr (Bad Value Address): Address of data causing exceptions • Status: Interrupt mask and enable bits 9 Savio Chau How Does MIPS ISA Access the Data? Register (Direct) OP RS=$2 RT=$3 RD=$1 E.g., add $1, $2, $3 $1$2+$3 Immediate E.g., addi $1, $2, 100 $1$2 +100 Base + Index E.g., lw $1, 100($2) $1Mem[$2+100] PC-Relative E.g., bne $1, $2, 100 Goto Mem[PC+100] if $1=$2 Psuedo-Direct E.g., J 1000 Goto Mem[PC(31:30):1000] Register OP RS RT Immediate=100 OP RS=$2 RT Immediate=100 Register OP RS PC OP RT Memory Immediate = 100 Memory Address = 1000 PC 10 Memory Savio Chau The Most Basic Four MIPS Instructions Category Instruction Example Arithmetic Add add $1,$2,$3 & Logic Subtract sub $1,$2,$3 Data Transfer M eaning $1 = $2 + $3 Type Comments R $1 = destination register; $2 & $3 = sources register $1 = $2 – $3 R $1 = destination register; $2 & $3 = sources register Load word lw $S1, 10 ($S2) $S1 = Memory[$S2+10] I $S1 = destination register, $S2 = Register containing base address of source data, 10 = offset Store word sw $S1, 10 ($S2) Memory[$S2+10] = $S1 I $S1 = source register, $S2 = Register containing base address of source data, 10 = offset You will learn other instructions later. Download the following files from the web or see text Page A-55 to A-75 for more MIPS instruction formats: CSM151B_MIPS_opcodes CSM151B_MIPS_opcodes_by_name CSM151B_MIPS_opcodes_by_number CSM151B_MIPS_opcodes_pseudo 11 Savio Chau MIPS Instruction Formats R- Type Instructions General Format: Example: op RD, RS, RT add $8,$17,$18 Meaning: RD RS op RT Meaning: $8 $17 + $18 Machine Code Format of the Instuction: Format: Example: • The Meaning of Each Name of The Fields in MIPS Instructions: – – – – – – OP: Operation of Instruction RS: The First Register Source Operand RT: The Second Register Source Operand RD: The Destination Register Operand; It Gets The Result of the Operation SHAMT: Shift Amount for shift left logical or shift right logical instructions Funct (Function Field): Selects the Variant of the Operation in the OP Field Example: the OP field for all R-Type instructions is 0, but Funct field for add is 3210 (1000002), subtract is 3410 (1000102), and is 3610 (1001002), etc. 12 Savio Chau MIPS Instruction Formats I- Type Instructions General Format: Example: Alternative Format: Example: op RT, RS, Const addi $1,$2,100 op RT, offset(RS) sw $1, 100($2) Meaning: RT RS op Const Meaning: $1 $2 + 100 Mem[RS+offset] Meaning: RT Meaning: $1 Mem[$2+100] Machine Code Format of the Instuction: Format: Example: OP RS RT Immediate 6 Bits 101011 5 Bits 00010 5 Bits 00001 16 Bits 0000000001100010 2 1 43 100 • Meaning of Each Name of The Fields in MIPS Instructions: – – – – OP: Operation of Instruction (including information about the instruction type) RS: The First Register Source Operand RT: The Destination Register Operand; It Gets The Result of the Operation Immediate: Constant • Applications of I-Type Instructions – Immediate Addressing Format (e.g., addi, ori, andi. Immediate = constant) – Data Transfer Instructions (e.g., lw, sw. Immediate = address) – Conditional Branch (e.g. beq, bne. Immediate = displacement from PC) 13 Savio Chau MIPS Machine Language Examples See text Page 153 for more machine codes 14 Savio Chau Assembly and Machine Language Exercise 15 Savio Chau C, Assembly, and Machine Language Example Register is assigned by compiler, which How to select registers? usually follows MIPS convention ($8 = $t0) A[ i] = h + A[ i]; is compiled into: lw add sw $8, Astart($ 19) $8,$ 18,$ 8 $8, Astart($ 19) # Temporary reg $8 gets A[ i], Astart is a const. # Temporary reg $8 gets h + A[ i] # Stores h + A[ i] back into A[ i], Astart is a const. Machine Code (in decimal value) lw add sw Machine Code (in binary value) lw add sw 16 Savio Chau Multily and Divide Instructions (Psuedo-Instructions) Psuedo-Instructions are commonly used instructions that are not directly implemented in hardware but translated into one or more of the basic set of instructions • Multiply, Divide – MULT rs, rt – MULTU rs, rt – DIV rs, rt – DIVU rs, rt # Multiply # Unsigned Multiply # Divide # Unsigned Divide • Move Result From Multiply, Divide – MFHI rd # Move Hi reg to RD – MFLO rd # Move Lo reg to RD • Move To HI or LO – MTHI rd – MTLO rd Multiply instruction is accomplished by a series of shift and add instructions and the results are accumulated in the HI and LO registers 17 Savio Chau Adding Conditional Branch & Jump Instructions Category Conditional Branch Instruction Example branch on equal beq $1,$2,100 branch on not bne eq. $1,$2,100 set on less than slt $1,$2,$3 Uncondition al Jump jump j 10000 jump register jr $31 jump and link jal 10000 Meaning Type Comments if ($1 == $2) I Equal test; PC go to PC+4+100 relative branch if ($1!= $2) I Not equal test; go to PC+4+100 PC relative if ($2 < $3) $1=1; R Compare less else $1=0 than; 2’s comp. go to 10000 J Jump to target address go to address in R For switch, $31 procedure return $31 = PC + 4; J For procedure go to 10000 call See backup slides at the end of this presentation for more MIPS instructions Question: What is $31? Answer: By MIPS convention, $31 is the return address register after procedure call 18 Savio Chau MIPS Instruction Formats J - Type Instructions (e.g., j 10000) OP Address 6 Bits 26 Bits • J-Type Instructions uses pseudodirect addressing, where the jump address is the 26 lower bits (i.e., the address bits) of the instruction, left shifted by 2 bits (i.e., word addressable only), and then concatenated with the upper 4 bits of the PC OP Address Program Counter unchange PC[31:28] Address (PC[27:0]+4) replaced by Address4 0 0 Memory What about branch instructions? Ans: Branch instruction is an I-type instruction using PC-relative addressing Example: Program Counter bne $t0 $s1 1000 If $t0 $s1, then New PC = Old PC + 4 + 41000 19 0 0 Memory Savio Chau Branches/Jump Machine Language • Branches: – The target address is in PC: PC PC + 4 + (16-bit Address Field) 4 • Jumps: – The target address = instruction bits <25:0> 4 (i.e., left shift 2 bits and fill the 2 LSB bits with 0) concatenate with PC bits <31:28> L=20 L=20 L=25 20 Savio Chau A Coding Example with Conditional Branches Example: – In the following C code segment, f, g, h, i, and j are variables: if (i == j) goto L1; f = g + h; L1: f = f - i; – Assuming the 5 variables correspond to 5 registers $16 through $20 i.e. f in $16; g in $17; h in $18; i in $19; j in $20 what is the compiled MIPS code? Answer: – The Compiled Program (Assembly Code) is: beq $19,$20, L1 # goto L1 if i equals j add $16,$17,$18 #f=g+h L1: sub $16,$ 16,$19 #f=f-i What would the machine code look like for this? 21 Savio Chau Another C Compilation Example with Loops Aaddr • C Code: Loop: Array A B0 B1 B2 B3 Word 1 B4 B5 B6 B7 Word 2 B8 B9 B10 B11 Word 3 B12 B13 B14 B15 Word 0 g = g + A[i]; i = i+ j; if (i != h) goto Loop; 32 bits Assuming the Following Register Assignments f in $16; g in $17; h in $18; i in $19; j in $20; and a constant 4 in $10 A[i] is an array in memory with starting address at Aaddr[0] Need to compute BYTE address from i, do this? which is wordWhy address (1 word = 4 bytes) • Assembly Code: Loop: mult mflo lw add add bne $19,$10 $9 $8, Aaddr($9) $17,$17,$8 $19,$19,$20 $19,$ 18, Loop # (HI, LO) regs = i * 4 # reg $9 least sig. 32 product bits # Temporary reg $8 = A[i*4] # g = g + A[ i] #i=i+j # goto Loop if i != h 22 Savio Chau An Example Using a Case Statement • C Code: Register Assignments: f in $16; g in $17; h in $18; i in $19; j in $20; k in $21 Constant 4 in $10 switch (k) { case 0: f = i + j; break; case 1: f = g + h; break; case 2: f = g - h; break; case 3; f = i - j; break; } The following MIPS assembly language will work, provided four words in memory, starting at location JumpTable, have addresses corresponding to the labels L0, L1,L2, and L3 respectively. Since we are using the variable k to index into this array of words, we must first multiply by 4 to turn k into its byte address equivalent.. Switch: mult $10, $ 21 mflo $9 lw $8, JumpTable($ 9) jr $8 L0: add $16,$ 19,$ 20 j Exit L1: add $16,$ 17,$ 18 j Exit L2: sub $16,$ 17,$ 18 j Exit L3: sub $16,$ 19,$ 20 Exit: # (HI, LO) regs = k * 4 # Temp reg $9 = least sig. 32 bits of product # Temp reg $8 = Jumptable[k] # Jump based on register $8 # k= 0 so f gets i+ j # k= 1 so f gets g+ h # k= 2 so f gets g- h # k= 3 so f gets i- j 23 JumpTable 0 0 e.g., k=2 1 4 2 8 jr lw Reg $8 Word Byte Addr Addr Memory 3 12 L0 L1 L2 L3 L2 Savio Chau Procedure Calls • Procedure call is used by programmers to structure programs, for easier to understand and reusuability. Example: main() { funct(100); } /* This is the calling procedure (caller) */ int funct(arg) { … } /* This is the called procedure (callee) */ /* procedure call */ • In order to execute procedure call – The calling procedure (caller) has to put parameters in a place where procedure can access – The calling procedure (caller) has to transfer control to the called procedure while saving the return address at the same time – The called procedure (callee) has to put return value in a place where the calling program can access – The called procedure (callee) has return control to the calling program at the point of origin 24 Savio Chau MIPS Software Convention for Registers 0 zero constant 0 1 at reserved for assembler 2 v0 expression evaluation & 3 v1 function results (return value) 23 s7 4 a0 arguments 24 t8 5 a1 (calling procedure uses these 25 t9 6 a2 registers to pass arguments 26 k0 7 a3 to the called procedure) 27 k1 8 t0 temporary: caller saves 28 gp Pointer to global area holding a program’s static data 29 sp Stack pointer 30 fp frame pointer 31 ra Return Address (HW) do not need to be preserved across procedure calls ... 15 (called procedure can clobber) t7 16 s0 callee saves need to be preserved across procedure calls ... (calling procedure can clobber) temporary (cont’d) reserved for OS kernel Stack frame -- A block of memory allocated on the stack for the subroutine call environment. Purpose: hold values passed as subroutine arguments save register values that the calling subroutine needs to use after the callee returns provide space for local variables since there are only a limited number of registers Savio Chau 25 An Overly Simplified Example Addr 1 2 3 addr 3 1 main main addr 2 PC funct addr main() { x = y + z; funct(arg); … } /* Caller */ $v0 /* procedure call */ $ra main addr3 ($31) Addr 1 2 3 ($2) $a0 w arg $t0 x w ($8) $t1 y v ($9) $t2 z ($10) ($4) int funct( arg arg ) /* Callee */ { w = arg – v; return (w); } But! • What if there are more than 4 arguments? • What if there are some register values need to be preserved across procedure call (e.g., if you want to preserve the value x)? • What if another procedure call happens before the current procedure is completed? 26 Savio Chau Call-Return Linkage: Stack Frames Solution: • Save the needed information (e.g., arguments, return address) onto a stack in memory • Information needed by the called procedure are grouped into a stack frame • Many variations on stacks possible (up/down, last pushed / next ) High Mem FP Stack Frame or Activation Record (frame pointer points to 1st word of frame) ARGS Callee Save Registers Reference Arguments and Local Variables at Fixed (negative) Offset From FP (old $fp, $ra, $s0,etc) Local Variables SP (stack pointer points to last word of frame) Low Mem Grows and shrinks during expression evaluation 27 Savio Chau Nested Procedure Call Using Stack Frames PC PC PC PC main() { … funct1(arg0 … argN); … } funct1(arg0 … argN) { … funct2(arg0 … argM); return(w); } FP Stack frame of caller of main SP FP SP Return address to main (pushed by funct1) main’s saved register $s0 … $s7, $fp (pushed by funct1) FP SP SP PC arg4 … argN for funct1 (pushed by main) funct1(arg0 … argM) { … y = x + z; return(y); } Other local variables needed by funct1 (pushed by funct1) arg4 … argM for funct2 (pushed by funct1) Return address to funct1 (pushed by funct2) funct1’s saved register $s0 … $s7, $fp (pushed by funct2) SP 28 Other local variables needed by funct2 (pushed by funct2) Savio Chau Return From Nested Procedure Call PC PC PC main() { … funct1(arg0 … argN); … } funct1(arg0 … argN) { … funct2(arg0 … argM); return(w); } FP Stack frame for main FP SP arg4 … argN for funct1 (popped by funct1 and trashed) Return address to main (popped by funct1 and put back to $ra) main’s saved register $s0 … $s7, $fp (popped by funct1 and put back to $s’s) FP SP Other local variables needed by funct1 (popped by funct1 and trashed) arg4 … argM for funct2 (popped by funct2 and trashed) Return address to funct1 (popped by funct2 and put back to $ra) funct2(arg0 … argM) { … y = x + z; return(y); } funct1’s saved register $s0 … $s7, $fp (popped by funct2 and put back to $s’s) SP 29 Other local variables needed by funct2 (popped by funct2 and trashed) Savio Chau MIPS Instructions for Procedure Call • MIPS uses a jump and link instruction for procedure calls – Jumps to the address specified in the lower 26 bits of the instruction – Simultaneously save the address of next instruction (i.e. PC+ 4) in the Return Address (RA) register (R31) 30 Savio Chau MIPS Procedure Call: Put Everything Together • Step 1: Before the caller makes the subroutine call, the caller: – 1.1 Pushes onto the stack those values in caller-saved registers ($s regs) that the caller wants to use after the callee returns – 1.2 Stores subroutine call arguments in registers 4-7 ($a regs) and pushes any remaining arguments onto the stack for the callee stack frame • Step 2: When the subroutine is invoked, the callee then: – 2.1 Allocates memory on the stack for its stack frame – 2.2 Saves environment registers (e.g., return addr.($31), old frame pointer($30)) and callee-saved registers ($s regs) onto the stack so that the callee can alter them and then restore them before returning (Note: Need to store the $a and $s registers before they are modified) – 2.3 Updates the frame pointer ($fp) to point to the callee stack frame • Step 3: Just before the callee returns, the callee: – 3.1 Puts return values in registers 2-3 ($v regs) – 3.2 Restores registers saved in Step 2.2 (above) and pops the stack frame 31 Savio Chau Procedure Call Coding Example main() { … funct1(10,11,12,13,14); … } funct1(arg1 … arg5) { w = arg1 + arg2 + arg3 + arg4 + arg5; return(w); } Register Assignments: $a0 … $a4: arguments $fp: frame pointer $sp: stack pointer $v0: return value $ra: return address $t0: temporary register main: ... 104 addi 108 addi 112 addi 116 addi 120 addi 124 sw 128 jal 132 … funct1: 500 504 508 512 516 520 524 528 532 536 540 544 548 552 556 subu sw sw addi add add add add add lw add lw lw addi jr 32 $a0, $0, 10 # save arg1 to $a0 $a1, $0, 11 # save arg2 to $a1 $a2, $0, 12 # save arg3 to $a2 $a3, $0, 13 # save arg4 to $a3 $t0, $0, 14 # save arg5 to $t0 $t0, 0($sp) # save (not push) arg5 to stack funct1 # jump to called procedure $sp, $sp, 32 # allocate 32 bytes to new frame $ra, 20($sp) # save return address to main $fp, 16($sp) # save old frame pointer $fp, $sp, 32 # new frame pointer = old sp $v0, $0, $0 # initialize w $v0, $v0, $a0 # calculate w from arguments $v0, $v0, $a1 $v0, $v0, $a2 $v0, $v0, $a3 $t0, 0($fp) # retrieve the 5th argument $v0, $v0, $t0 # add to w (already in $v0) $ra, 20($sp) # pop return address to $ra $fp, 16($sp) # pop old frame pointer $sp, $sp, 32 # pop stack $ra # jump back to main Savio Chau Procedure Call Coding Example (Animated) main() { … funct1(10,11,12,13,14); 104 … } fp 1000 968 1000 996 992 984 980 ra 132 v0 46 60 t0 14 PC Saved Reg & Local Var 976 972 sp 936 968 968 14 (not a stack push) Arguments 964 960 956 952 10 11 12 13 Return Addr Frame Ptr 988 funct1(arg1 … arg5) 500 { w = arg1 + arg2 + arg3 + arg4 + arg5; return(w); } a0 a1 a2 a3 Arguments 132 556 124 120 116 104 128 552 548 544 540 536 532 512 508 504 500 948 944 940 936 132 1000 Return Addr Frame Ptr Saved Reg & Local Var This approach can be modified to allow arbitrary number of arguments to be passed. How? See Example 33 Savio Chau Stacking of Procedure Call/Return Environments 34 Savio Chau Additional Details of the MIPS Instruction Set • Register Zero always has the value Zero (even if you try to write to it) • Jump/ Link instr. puts the Return Addr PC+ 4 into the Link Register • All instructions change entire 32- bits of the destination register (including lui, lb, lh) and all read entire 32- bits of sources (add, sub, and, or,...) • Immediate Arithmetic and Logical Instructions are Extended as Follows: – Logical Immediates are Zero Extended to 32 Bits – Arithmetic Immediates are Sign Extended to 32 Bits • The data loaded by the instruction lb and lh are extended as follows: – lbu, lhu are Zero Extended – lb, lh are Sign Extended • Overflow can occur in these Arithmetic instructions: – add, sub, addl – It cannot occur in addu, subu, addiu, and, or, xor, nor, shifts, mult, multu, div, divu 35 Savio Chau Other Forms of Instruction Set Architecture • What Variations an Instruction Set Architecture Can Have? – Data Path Architecture – Instruction Types – Instruction Lengths – Register and Memory Architecture – Address Modes – Data Types – Order of Bits and Bytes (Endians) 36 Savio Chau Other Forms of Data Path Architecture Stack Architecture Accumulator Architecture Register Push(A) Push(B) Add Accumulator A+B A 0 ALU B A C B A A ALU Stack C=A+B General Purpose Register Architecture Load/Store Architecture Register ALU C Store C A+B ALU A A B B B Memory A Load Register Memory Memory or Register 37 Registers Memory Savio Chau Instruction Examples of Various Path Architectures Number of Addresses Examples Meaning Accumulator (1 register): Example: EDSAC, 8008 1 address add A acc acc + mem[A] 1 address add X acc acc + mem[A+X] Stack: Example: Burroughs machines 0 address add Comments A is an address or displacement in the instruction, X is index tos tos + next tos = top of stack, next = next entry in stack General Purpose Register: (typically 16 or 32 registers) Example: IBM 360, DAC VAX, TI 9900 2 address add A B EA(A) EA(A) + EA(B) EA may be a memory location 3 address add A B C EA(A) EA(B) + EA(C) or register, often many types of addressing modes for memory operands Load/Store: MIPS, SPARC, Power PC 3 address add Ra Rb Rc simplified form of GPR Ra Rb + Rc architecture above load Ra Rb Ra mem[Rb] store Ra Rb mem[Rb] Ra 38 Savio Chau Comparing Number of Instructions • Code sequence for C = A + B for four classes of instruction sets: Stack Push A Push B Add Pop C Accumulator Register (register-memory) Load R1,A Add R1,B Store C, R1 Load A Add B Store C Register (load-store) Load R1,A Load R2,B Add R3,R1,R2 Store C,R3 • Load/store requires more instructions but has better overall performance because: – Registers are Faster than Memory – Registers are Easier for a Compiler to Use • e. g., (A* B) - (C* D) - (E* F) can do Multiplies In Any Order vs. Stack – Registers can Hold Variables • Memory Traffic is Reduced, so Program Is Sped Up (Since Registers Are Faster Than Memory) – Code Density Improves • (Since Register Named With Fewer Bits Than Memory Location) 39 Savio Chau Variations in Instruction Types (Examples) • Data Transfers – Move: • Reg–to–Reg, Reg–to–Mem, Mem–to–Mem, or with Immediate – Explicit push and pop of stack • Control Operations – Explicit return instruction (e.g., RET) – Explicit loop instruction • Arithmetic and Logical Operations – Rotate – Test by non-destruction AND • String Operations – Copy string – Load a byte / word of a string into a register 40 Savio Chau Variations in Instruction Length • If Code Size is Most Important, Use Variable Length Instructions • If Performance is Most Important, Use Fixed Length Instructions 41 Savio Chau Variations in Register & Memory Architecture Intel 8086 220 x 8- bit Bytes AX, BX, CX, DX SP, BP, SI, DI CS, SS, DS, IP, Flags acc, index, count, quot stack, string code, stack, data segment VAX- 11 232 x 8- bit Bytes 16 x 32- bit GPRs r15 - program counter r14 - stack pointer r13 - frame pointer r12 - argument ptr MC68000 224 x 8- bit Bytes 8 x 32- bit GPRs 7 x 32- bit addr reg 1 x 32- bit SP 1 x 32- bit PC MIPS 232 x 8- bit bytes 32 x 32- bit GPRs 32 x 32- bit FPRs HI, LO, PC 42 Savio Chau Variations in Addressing Modes (Examples) Addressing mode Register Immediate Example Add R4,R3 Add R4,R3, R2 Add R4,#3 Meaning R4 R4+R3 R4 R3+R2 R4 R4+3 Base + Index lw R4,n(R3) R4 Mem[R3+n] PC Relative beq R4,R3,100 Goto Mem[PC+100] Pseudo-Direct J 100 Goto Mem[PC<31:28>:100] Displacement Add R4,100(R1) R4 R4+Mem[100+R1] Register indirect Add R4,(R1) R4 R4+Mem[R1] Indexed / Base Add R3,(R1+R2) R3 R3+Mem[R1+R2] Direct or absolute Add R1,(1001) R1 R1+Mem[1001] Memory indirect Add R1,@(R3) R1 R1+Mem[Mem[R3]] Auto-increment Add R1,(R2)+ R1 R1+Mem[R2]; R2 R2+d Auto-decrement Add R1,–(R2) R2 R2–d; R1 R1+Mem[R2] Scaled Add R1,100(R2)[R3] R1 R1+Mem[100+R2+R3*d] 43 Savio Chau Variations in Data Type 44 Savio Chau Variations in Data Type (continued) 45 Savio Chau Variations in Order of Bits and Bytes: Endian • Given: Byte addressable system • Big Endian: Address of Most Significant Byte = Word Address • Little Endian: Address of Least Significant Byte = Word Address • Example: 4 Bytes per Word (xx... 00 = Word Address) 46 Savio Chau Byte Swap Problem with Endians • Big Endian Processors: – MIPS, SPARC, IBM 360/ 370, Motorola 68000, HP PA • Little Endian Processors: – Intel 80x86, DEC Alpha • When words are transferred between Big Endian and Little Endian machines, you must permute the bytes to successfully copy the data • Each system is self- consistent, but causes problems when they need to communicate! 47 Savio Chau Complex Instruction Set Computers (CISC) • General Characteristics of CISC – Multiple Length Instruction Formats – More Addressing Modes – Typically Memory Operands can be used in Arithmetic and Logical operations – An instruction may do several things (e. g., Test, Decrement, and Branch) • The Reasons for this are Historic – There were few registers in early machines, so a Load- Store Architecture would be Inefficient (Why?) – Memory was Expensive and Slow - Making special hardware instructions reduces memory bandwidth (fewer instruction accesses) – Each instruction took several clock cycles including fetch cycles. Compound instructions has lower overhead in clock cycles. • Modern Technology caused the switch to RISC machines: – More registers in the processor – Pipelining to execute one instruction every clock cycle – Cheaper Faster Memory 48 Savio Chau CISC Example: 8086 Family 8080 - Straightforward Accumulator Machine (8-bit) • 1978: 8086 - Additional Registers – • • 1980: 80186 – 16-bit Extensions to the 8086 Architecture 1982: 80286 - Address space extended to 24- bits – – • Super scalar architecture, multiple instructions per clock 1995: Pentium Pro – • A few new instructions, substantial performance increases (e.g. pipelining) 1992: Pentium – • Real Addressing Mode for 8086 Compatibility Virutal 8086 Mode for Multiple 20- bit Partitions New Addressing Modes and Operations Most Operations can use any Register as an Operand Memory Paging 1989: 80486 – • Elaborate Memory Mapping and Protection Scheme Real Addressing Mode to look like 8086 1985: 80386 - 32- bit Machine – – – – – • Segments for 20- bit Address Space in 64KB fragments Use micro instructions to achieve very high clock rates 1997: Pentium Pro MMX – Special instructions for graphics support 49 Savio Chau 8086 Instruction Set • Data Transfer Type – 14 instructions with a total number of 27 varients (e.g, MOV has 7 varients, PUSH has 3 varients etc.) • Arithmetic Type – 20 instructions with a total number of 32 varients • Logic Type – 12 instructions with a total number of 20 varients • String Manipulation Type – 6 instructions • Control Transfer Type (e.g., jump, branch etc.) – 26 instructions with a total number of 36 varients • Processor Control Type (e.g., halt, clear interrupt etc.) – 11 instructions A total of 132 instructions and variants! (MIPS has 66) 50 Savio Chau 8086 Instruction Format • Instruction format: 1 to 6 bytes in length • Opcode is basically 8-bits, may use last 1 or 2 bits for indicators – W Bit: Indicating this is a byte (W=0: half register) or word (W=1: full register) operation – D Bit: Indicating direction (D=1: from mem, D=0: to mem) for instructions such as MOV – V Bit: Indicating 1-bit shift or variable-length shift (V=0: count=1, V=1: count = Reg[CL]) – S Bit: Indicating if the 8-bit immediate operand should be sign-extended (S=1: sign extend) Note: the D, V, and S bits have the same location, before the W bit • Some instructions has Postbyte to encoding addressing mode Example formats: DW 8 VW 8 51 Savio Chau 8086 Registers • 14 16-bit Registers, some registers (AX, BX, CX, DX) can be used either as a full register or a half register, depending on the W bit General registers with special purposes in some instructions Special purposes registers 52 Savio Chau Addressing modes • 2 Operand Arithmetic, Logical, and Data Transfer Instructions – Register- Register, Register- Immediate, Register- Memory, Memory- Register, Memory- Immediate • Memory Addressing Modes – – – – 16- Bit Absolute, Register Indirect Based (Base Register + Displacement) Indexed (Index Register + Displacement) Base Indexed with Displacement (Base + Index + Displacement) • Register Used in Addressing Modes – – – – Register Indirect: BX, SI, DI Based with 8- bit or 16- bit Displacement: BP, BX, SI, DI Based Indexed: BX+ SI, BX+ DI, BP+ SI, BP+ DI Based Indexed with 8- bit or 16- bit Displacement 53 Savio Chau Postbyte Encoding • Postbyte: Use a Byte to Indicate Addressing Modes Source/Destination Effective Addr Example Direction depends on D bit Destination/Source 8 or 16 bits Exercise: what is the machine code for moving 8-bit word from memory address (SI)+(DISP=110011002) to register AL? (op code for MOV from memory to register is 100010, d=1 for memory to register, w=0 for 8-bit op) Answer:[100010 1 0][01 000 100][1100 1100] Register Code assignments (dest / source): Code 16 bit mode 8 bit mode 000 AX AL 001 CX CL 010 DX DL 011 BX BL 100 SP AH 101 BP CH 110 SI DH 111 DI BH 54 Mem Address Code assignments (source): Code Effective Address (EA) 000 (BX)+(SI)+DISP 001 (BX)+(DI)+DISP 010 (BP)+(SI)+DISP 011 (BP)+(DI)+DISP 100 (SI)+DISP 101 (DI)+DISP 110 (BP)+DISP 111 (BX)+DISP Savio Chau 8086 Memory Architecture • The memory has 220 bytes (20-bit address) and is divided into 16 segments. Each segment has 64KBytes of Contiguous Memory. • Separate segments may be Adjacent, Disjoint, Partially Overlapped or Fully Overlapped. • All Segments fall on 16-byte boundary (i.e., last address byte = 0000) • Up to 4 segments are simultaneously active. The addresses of active segments are stored in the Segment Registers. The active segments are: – – – – Current Code Segment (CS) Current Stack Segment (SS) Current Data Segment (DS) Current Extra Data Segment (ES) 64 KB • Address Calculation: Segment register Segment address 0000 offset CS FFFFFh Code Segment XXXX0h Stack Segment SS offset + Physical address Data Segment DS ES Extra Data Segment Offset = Effective Address 55 00000h Savio Chau Is CISC Worthwhile? Instruction Usage • Designed versus actually used operations Typical Instructions Provided by CISC Top 10 80X86 Instructions Rank 1 2 3 4 5 6 7 8 9 10 Instruction Average % total executed load conditional branch compare store add and sub move registerregister call return Total 22% 20% 16% 12% 8% 6% 5% 4% 1% 1% 96% Simple instructions dominate instruction frequency and most of the instructions in CISC are not used 56 Savio Chau SPEC Program Analysis Results • Analysis of 5 Programs From SPECint92 and 5 Programs From SPECfp92Assist in MIPS Instruction Format Design • Results: – Instructions Using Immediate Type Operands 50% - 60% of Immediate Values Fit in 8 Bits 75% - 80% of Immediate Values Fit in 16 Bits – Displacement Addressing 99% of Addresses <= 16 Bits 30% 25% 20% 15% 10% Int. Avg. 5% FP Avg. 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Address bits – Conditional Branches Most Branches Are Close to the Current PC Address Use at Least 8- Bits for PC- Relative Addresses Equal / Not Equal Comparison Most Important for Integer Programs 57 Savio Chau Operand Size Usage Doubleword 69% 74% Word 31% 19% Halfword Int Avg. 7% Byte FP Avg. 0% 20% 40% 60% 80% Frequency of reference by size • Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers 58 Savio Chau Summary of ISA Design • Use General Purpose Register with a Load- Store Architecture • Support These Addressing Modes: Displacement (with an Address Offset Size of 12 to 16 Bits), Immediate (Size 8 to 16 Bits), and Register Direct • Support These Instructions, Since They Will Dominate the Number of Instructions Executed: Load, Store, Add, Subtract, Move RegisterRegister, and, Shift, Compare Equal, Compare Not Equal, Branch (with a PC- relative Address at Least 8- Bits Long), Jump, Call, and Return • Support These Data Sizes and Types: 8- bit, 16- bit, 32- bit Integers and 64-Bit IEEE 754 Floating Point Numbers • Use Fixed instruction Encoding If Interested in Performance and Use Variable Instruction Encoding If Interested in Code Size • Provide at Least 16 General Purpose Registers Plus Separate Floating Point Registers, Be Sure All Addressing Modes Apply to All Data Transfer Instructions, and Aim for a Minimalist Instruction Set. 59 Savio Chau Backup Slides MIPS Instructions 60 Savio Chau MIPS Arithmetic Instructions Instruction add subtract add immediate add unsigned subtract unsigned add imm. unsign. multiply multiply unsigned divide Example add $1,$2,$3 sub $1,$2,$3 addi $1,$2,100 addu $1,$2,$3 subu $1,$2,$3 addiu $1,$2,100 mult $2,$3 multu$2,$3 div $2,$3 divide unsigned divu $2,$3 Move from Hi Move from Lo mfhi $1 mflo $1 Meaning $1 = $2 + $3 $1 = $2 – $3 $1 = $2 + 100 $1 = $2 + $3 $1 = $2 – $3 $1 = $2 + 100 Hi, Lo = $2 x $3 Hi, Lo = $2 x $3 Lo = $2 ÷ $3, Hi = $2 mod $3 Lo = $2 ÷ $3, Hi = $2 mod $3 $1 = Hi $1 = Lo Type R R I R R I R R R R R R R R Comments $1 = destination; $2 & $3 = sources $1 = destination; $2 & $3 = sources $1=destination; $2=source, 100=data $1 = destination; $2 & $3 = sources $1 = destination; $2 & $3 = sources $1=destination; $2=source, 100=data 64-bit signed, product in Hi, Lo 64-bit unsigned, product in Hi, Lo Lo = quotient, Hi = remainder Unsigned quotient & remainder Used to get copy of Hi Used to get copy of Lo Highlighted instructions are described in Chapter 3 61 Savio Chau MIPS Logical Instructions Instruction and or xor nor and immediate or immediate xor immediate shift left logical shift right logical shift right arithm. shift left logical shift right logical shift right arithm. Example and $1,$2,$3 or $1,$2,$3 xor $1,$2,$3 nor $1,$2,$3 andi $1,$2,10 ori $1,$2,10 xori $1, $2,10 sll $1,$2,10 srl $1,$2,10 sra $1,$2,10 sllv $1,$2,$3 srlv $1,$2, $3 srav $1,$2, $3 Meaning $1 = $2 & $3 $1 = $2 | $3 $1 = $2 $3 $1 = ~($2 |$3) $1 = $2 & 10 $1 = $2 | 10 $1 = ~$2 &~10 $1 = $2 << 10 $1 = $2 >> 10 $1 = $2 >> 10 $1 = $2 << $3 $1 = $2 >> $3 $1 = $2 >> $3 62 Type R R R R I I I R R R R R R Comment $1 = destination; $2 & $3 = sources $1 = destination; $2 & $3 = sources $1 = destination; $2 & $3 = sources $1 = destination; $2 & $3 = sources $1=destination; $2=source, 10=data $1=destination; $2=source, 10=data $1=destination; $2=source, 10=data Shift left by constant Shift right by constant Shift right (sign extend) Shift left by variable Shift right by variable Shift right arith. by variable Savio Chau MIPS data transfer instructions Instruction Store word Example SW $S1, 100($S2) Meaning $S1 = Memory[$S2+100] Type I Store byte SB $S1, 100($S2) $S1 = Memory[$S2+100] I Load word LW $S1, 100($S2) Memory[$S2+100] = $S1 I Load byte LB $S1, 100($S2) Memory[$S2+100] = $S1 I Load byte LBU $S1, 100($S2) Memory[$S2+100] = $S1 unsigned Load Upper LUI $S1, 100 $S1 = 100 * 216 Immediate (16 bits shifted left by 16) I I Comment $S1 = destination register, $S2 = base address of source data, 100 = offset $S1 = destination register, $S2 = base address of source data, 100 = offset $S1 = destination register, $S2 = base address of source data, 100 = offset $S1 = destination register, $S2 = base address of source data, 100 = offset $S1 = destination register, $S2 = base address of source data, 100 = offset Load constant in upper 16 bits Highlighted instructions are described in Chapter 3 63 Savio Chau MIPS jump, branch, compare instructions Instruction branch on equal Example beq $1,$2,100 Meaning if ($1 == $2) go to PC+4+100 branch on not eq. bne $1,$2,100 if ($1!= $2) go to PC+4+100 set on less than slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0 set less than imm. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0 set less than unsign. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0 set l. t. imm. unsign. sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0 jump j 10000 go to 10000 jump register jr $31 go to $31 jump and link jal 10000 $31 = PC + 4; go to 10000 Type Comments I Equal test; PC relative branch I Not equal test; PC relative R Compare less than; 2’s comp. I Compare < constant; 2’s comp. R Compare less than; natural numbers I Compare < constant; natural numbers J Jump to target address R For switch, procedure return J For procedure call Highlighted instructions are described in Chapter 3 64 Savio Chau