Instruction Set Architecture Pradondet Nilagupta Spring 2005 (original notes from Prof. Baniasadi, Prof. Shaaban, Prof. Katz) 204521 Digital System Architecture March 22, 2016 Outline ISA Introduction ISA Classifying Memory Addressing Addressing Modes Operands Encoding ISA 204521 Digital System Architecture March 22, 2016 2 Hot Topics in Computer Architecture 1950s and 1960s: – Computer Arithmetic 1970 and 1980s: – Instruction Set Design – ISA Appropriate for Compilers 1990s: – – – – – Design of CPU Design of memory system Design of I/O system Multiprocessors Instruction Set Extensions 204521 Digital System Architecture March 22, 2016 3 Instruction Set Architecture Instruction set architecture is the structure of a computer that a machine language programmer must understand to write a correct (timing independent) program for that machine. The instruction set architecture is also the machine description that a hardware designer must understand to design a correct implementation of the computer. 204521 Digital System Architecture March 22, 2016 4 Evolution of Instruction Sets Single Accumulator (EDSAC 1950) Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) Separation of Programming Model from Implementation High-level Language Based (B5000 1963) Concept of a Family (IBM 360 1964) General Purpose Register Machines Complex Instruction Sets (Vax, Intel 432 1977-80) Load/Store Architecture (CDC 6600, Cray 1 1963-76) RISC (Mips,SPARC,HP-PA,IBM RS6000, . . .1987) 204521 Digital System Architecture March 22, 2016 5 Instruction Set Architecture The instruction set architecture serves as the interface between software and hardware software instruction set hardware 204521 Digital System Architecture March 22, 2016 6 Interface Design A good interface: – Lasts through many implementations (portability, compatibility) – Is used in many different ways (generality) – Provides convenient functionality to higher levels – Permits an efficient implementation at lower levels 204521 Digital System Architecture March 22, 2016 7 What Are the Components of an ISA? (1/2) Sometimes known as The Programmer’s Model of the machine Storage cells – General and special purpose registers in the CPU – Many general purpose cells of same size in memory – Storage associated with I/O devices The machine instruction set – The instruction set is the entire repertoire of machine operations – Makes use of storage cells, formats, and results of the fetch/execute cycle – i.e., register transfers 204521 Digital System Architecture March 22, 2016 8 What Are the Components of an ISA? (2/2) The instruction format – Size and meaning of fields within the instruction The nature of the fetch-execute cycle – Things that are done before the operation code is known 204521 Digital System Architecture March 22, 2016 9 Instruction Set Architecture Computer Program (Instructions) Programmer's View ADD SUBTRACT AND OR COMPARE . . . 01010 01110 10011 10001 11010 . . . CPU Memory I/O Computer's View Princeton (Von Neumann) Architecture --- Data and Instructions mixed in same memory ("stored program computer") Harvard Architecture --- Data & Instructions in separate memories --- Program as data (dubious advantage) --- Storage utilization --- Single memory interface --- Has advantages in certain high performance implementations 204521 Digital System Architecture March 22, 2016 10 Basic Issues in Instruction Set Design --- What operations (and how many) should be provided LD/ST/INC/BRN sufficient to encode any computation But not useful because programs too long! --- How (and how many) operands are specified Most operations are dyadic (eg, A <- B + C) Some are monadic (eg, A <- ~B) --- How to encode these into consistent instruction formats Instructions should be multiples of basic data/address widths Typical instruction set: ° 32 bit word ° basic operand addresses are 32 bits long ° basic operands, like integers, are 32 bits long ° in general case, instruction could reference 3 operands (A := B + C) challenge: encode operations in a small number of bits! 204521 Digital System Architecture March 22, 2016 11 Execution Cycle Instruction Fetch Obtain instruction from program storage Instruction Decode Determine required actions and instruction size Operand Fetch Locate and obtain operand data Execute Compute result value or status Result Store Deposit results in storage for later use Next Instruction Determine successor instruction 204521 Digital System Architecture March 22, 2016 12 What Must be Specified? Instruction Fetch ° Instruction – how is it decoded? Instruction Decode Format or Encoding ° Location of operands and result – where other than memory? Operand – how many explicit operands? Fetch – how are memory operands located? – which can or cannot be in memory? Execute ° Data type and Size Result ° Operations Store Next – what are supported ° Successor instruction Instruction 204521 Digital System Architecture – jumps, conditions, branches March 22, 2016 13 ISA What are the important questions? SIZE ? Opcode SIZE ? How many operations? 204521 Digital System Architecture Data SIZE ? How many kinds of data? March 22, 2016 14 Operand Locations in Four ISA Classes GPR 204521 Digital System Architecture March 22, 2016 15 ISA Classes Input1 ISA Classes? Stack Accumulator Register memory Register register/load store 204521 Digital System Architecture March 22, 2016 Input2 Operation Output 16 ISA Classes Accumulator: 1 address add A tos acc ¬ acc + mem[A] 1+x address addx A acc ¬ acc + mem[A + x] Stack: 0 address add tos ¬ tos + next stack General Purpose Register: 2 address add A B EA(A) ¬ EA(A) + EA(B) 3 address add A B C EA(A) ¬ EA(B) + EA(C) add Ra Rb Rc Ra ¬ Rb + Rc Load/Store: 3 address load Ra Rb Ra ¬ mem[Rb] store Ra Rb mem[Rb] ¬ Ra Comparison: Bytes per instruction? Number of Instructions? Cycles per instruction? 204521 Digital System Architecture March 22, 2016 17 ISA Classes: Stack Operate on TOS, put result TOS TOP OF STACK C= A+B? PUSH A PUSH B ADD POP C Operation MEMORY Memory not touched 204521 Digital System Architecture March 22, 2016 18 ISA Classes: Accumulator Accumulator : Implicit input & output. Accumulator (AC) C= A+B? LOAD A - Put A in Accumulator ADD B - Add B with AC put result in Operation AC STORE C- Put AC in C MEMORY 204521 Digital System Architecture March 22, 2016 19 ISA Classes: Register-Memory Input, Output: Register or Memory Register File C= A+B? LOAD R1, A ADD R3, R1, B STORE R3, C Operation MEMORY 204521 Digital System Architecture March 22, 2016 20 ISA Classes: Register-Register LOAD/STORE ARCH. C= A+B? Register File LOAD R1, A LOAD R2, B ADD R3, R1, R2 STORE R3, C Operation MEMORY 204521 Digital System Architecture March 22, 2016 21 Operand Location Example: – How do we compute C=A+B using four classes of ISAs: Stack Push A Push B Add Pop C Register Accumulator (Reg-Mem) Register (Load/Store) Load A Add B Store C Load R1,A Load R2,B Add R3,R1,R2 Store R3,C Load R1,A Add R3,R1,B Store R3,C Comparing Number of Instruction? 204521 Digital System Architecture March 22, 2016 22 General Purpose Registers Dominate ° Since 1975 all machines use machines use general purpose registers ° Advantages of registers • registers are faster than memory • registers are easier for a compiler to use - e.g., (A*B) – (C*D) – (E*F) can do multiplies in any order vs. stack • registers can hold variables - memory traffic is reduced, so program is sped up (since registers are faster than memory) - code density improves (since register named with fewer bits than memory location) 204521 Digital System Architecture March 22, 2016 23 General-Purpose Register (GPR) Machines Every ISA designed after 1980 uses a loadstore GPR architecture (i.e RISC, to simplify CPU design). Registers, like any other storage form internal to the CPU, are faster than memory. Registers are easier for a compiler to use. GPR architectures are divided into several types depending on two factors: – Whether an ALU instruction has two or three operands. – How many of the operands in ALU instructions may be memory addresses. 204521 Digital System Architecture March 22, 2016 24 ISA Examples Machine Number of General Purpose Registers EDSAC IBM 701 CDC 6600 IBM 360 DEC PDP-8 DEC PDP-11 Intel 8008 Motorola 6800 DEC VAX 1 1 8 16 1 8 1 1 16 Intel 8086 Motorola 68000 Intel 80386 MIPS HP PA-RISC SPARC PowerPC DEC Alpha HP/Intel IA-64 AMD64 (EMT64) 1 16 8 32 32 32 32 32 128 16 204521 Digital System Architecture Architecture accumulator accumulator load-store register-memory accumulator register-memory accumulator accumulator register-memory memory-memory extended accumulator register-memory register-memory load-store load-store load-store load-store load-store load-store register-memory March 22, 2016 year 1949 1953 1963 1964 1965 1970 1972 1974 1977 1978 1980 1985 1985 1986 1987 1992 1992 2001 2003 25 Pros and cons for each ISA type Machine Type Advantages Disadvantages Stack Accumulat or Register 204521 Digital System Architecture March 22, 2016 26 Pros and cons for each ISA type Machine Type Advantages Stack Short instructions Good code density Simple to decode instruction Lack of random access Efficient code hard to get Stack if often a bottleneck Accumulat or Minimal internal state Short instruction Simple to decode instruction Very high memory traffic Register Disadvantages Lots of code generation option Efficient code (compiler options) 204521 Digital System Architecture March 22, 2016 Longer instructions Complex instructions 27 Examples of GPR Machines For Arithmetic/Logic Instructions Number of memory addresses Maximum number of operands allowed SPARK, MIPS PowerPC, ALPHA 0 3 1 2 Intel 80x86, Motorola 68000 2 3 2 3 VAX VAX 204521 Digital System Architecture March 22, 2016 28 Pros/Cons of # Mem. Operands/Operands (1/3) Register-register: 0 memory operands/instr, 3 (register) operands/instr – Pro: Simple, fixed-length instruction encoding. Simple code generation model. Instructions take similar numbers of clocks to execute – Con: Higher instruction count than architectures with memory references in instructions. Some instructions are short and bit encoding may be wasteful. 204521 Digital System Architecture March 22, 2016 29 Pros/Cons of # Mem. Operands/Operands (2/3) Register–memory (1,2) – Pro: Data can be accessed without loading first. Instruction format tends to be easy to encode and yields good density. – Con: Operands are not equivalent since a source operand in a binary operation is destroyed. Encoding a register number and a memory address in each instruction may restrict the number of registers. Clocks per instruction varies by operand location. 204521 Digital System Architecture March 22, 2016 30 Pros/Cons of # Mem. Operands/Operands (3/3) Memory–memory (3,3) – Pro: Most compact. Doesn’t waste registers for temporaries. – Con: Large variation in instruction size, especially for three-operand instructions. Also, large variation in work per instruction. Memory accesses create memory bottleneck. 204521 Digital System Architecture March 22, 2016 31 Memory Addressing How do we specify memory addresses? – This issue is independent of type of ISA (they all need to address memory) We need to specify – – – – (1) Operand sizes (2) Address alignment (3) Byte ordering for multi-byte operands (4) Addressing Modes 204521 Digital System Architecture March 22, 2016 32 Operand Sizes (1) Byte (8 bits), half-word (16 bits), word (32 bits), double word (64 bits) An ISA may (and typically does) support multiple operand sizes Instruction must specify the operand size – E.g. LOAD.b R1,A vs. LOAD.w R1,A • Why? Make sure there’s no “garbage data” But usually there is a “default” size – Most commonly “word” on 32-bit machines – On x86, different register names for different sizes – (And think about Amdahl’s Law too…) 204521 Digital System Architecture March 22, 2016 33 Alignment (2) For multi-byte memory operands An aligned address for an n-byte operand is an address that is a multiple of n – Word-aligned: 0, 4, 8, 12, etc. An ISA can require alignment of operands – Assume it is required unless otherwise specified – MIPS: all memory operands must be aligned (special two-instruction sequences for accessing unaligned data in memory) – x86: no alignment required (but unaligned accesses are slower) 204521 Digital System Architecture March 22, 2016 34 Byte Ordering (“Endianness”) (3) Layout of multi-byte operands in memory Little endian (x86) – Least significant byte at lowest address in memory Big endian (most other ISAs) – Most significant byte at lowest address in memory – Assume this ordering unless otherwise specified Some ISAs support both byte ordering – E.g. MIPS has a little-endian mode 204521 Digital System Architecture March 22, 2016 35 Another view of Endianness No, we’re not making this up. at word address 100 (assume a 4-byte word) long a = 0x11223344; – big-endian (MSB at word address) layout 100 101 102 103 100 11 22 33 44 +0 +1 +2 +3 – little-endian (LSB at word address) layout 103 102 101 100 11 22 33 44 100 +3 +2 +1 +0 204521 Digital System Architecture March 22, 2016 36 “Endianness” Continued Usually instruction sets are byte addressed Provide access for bytes (8 bits), half-words (16 bits), words (32 bits), & double words (64 bits) Two different ordering types: big/little endian Little Endian: Big Endian: 31 23 15 7 0 Puts byte w/addr. “x…x00” at least significant position in the word 31 23 15 7 0 Puts byte w/addr. “x…x00” at most significant position in the word 204521 Digital System Architecture March 22, 2016 37 Addressing Modes (4) What is the location of an operand? Three basic possibilities – Register: operand is in a register • Register number encoded in the instruction – Immediate: operand is a constant • Constant encoded in the instruction – Memory: operand is in memory • Many address modes possibilities 204521 Digital System Architecture March 22, 2016 38 Immediate Addressing Mode Operand is a constant encoded in instruction Can we have any value as an immediate? – x86: yes. # of bytes used to encode the instruction will change to accommodate. – RISC: no, instruction size is fixed (e.g. 32 bits) Immediates in RISC – – – – – Typically a “load immediate” instruction Some bits used to specify the instruction opcode Remaining bits encode the immediate value This is OK: most-frequently needed constants have few bits MIPS also has a special two-instruction sequence to put a full 32-bit immediate into a register 204521 Digital System Architecture March 22, 2016 39 Size of Immediate Operand (comments?) 204521 Digital System Architecture March 22, 2016 40 Memory Addressing Modes (A) Register Indirect – Address is in a register • LD R1, (R2) – Use: access via pointer or computed address Direct (Absolute) – Address is a constant • LD R1, (100) – Use: access to static data – Note: constant encoded in the instruction 204521 Digital System Architecture March 22, 2016 41 Memory Addressing Modes (B) Displacement – Address is register+immediate • LD R1, 100(R2) – Local variables, fields in a structure – Can simulate register-indirect and direct modes • LD R1,0(R2) and LD R1, 100(R0) – Note: displacement encoded in instruction 204521 Digital System Architecture March 22, 2016 42 Size of Displacement 204521 Digital System Architecture March 22, 2016 43 Memory Addressing Modes (C) Autoincrement – Address is in a register • LD R1, (R2)+ – Register value increased by d after access (d is the data size, e.g. 4 for word accesses) – Some flavors useful for iterating through arrays, implementing a stack, etc. Indexed, Memory Indirect, Scaled, Autodecrement – See textbook (Page 98) 204521 Digital System Architecture March 22, 2016 44 FYI: More Addressing Modes Addressing Mode Example Instruction Register Add R4, R3 R4 R4 + R3 When a value is in a register Immediate Add R4, #3 R4 R4 + 3 For constants Displacement Add R4, 100(R1) R4 R4 + Mem[100+R1] Accessing local variables Register deferred or Indirect Add R4, (R1) R4 R4 + Mem[R1] Accessing using pointer or computed address Indexed Add R3, (R1+R2) R3 R3 + Mem[R1+R2] Array addressing; R1 = base of array, R2 = index amount Direct or Absolute Add R1, (1001) R1 R1 + Mem[1001] Accessing static data; addr. constant may need to be big 204521 Digital System Architecture Meaning March 22, 2016 When Used 45 FYI: Addressing modes continued Addressing Mode Example Instruction Memory indirect or Memory deferred Add R1, @(R3) R1 R1 + Mem[Mem[R3] If R3 is the address of a pointer p, then mode yields *p Autoincrement Add R1, (R2)+ R1 R1+Mem[R2]; R2 R2 + d Useful for stepping through arrays within a loop; R2 points to start of array; each ref. increments R2 by d Autodecrement Add R1, -(R2) R1 R1-Mem[R2]; R2 R2 + d Same as autoincrement; can be used for push/pop on stack Scaled Add R1, 100(R2), [R3] R1 R1 + Mem[100+R2+R3*d] Used to index arrays 204521 Digital System Architecture Meaning March 22, 2016 When Used 46 Use of Addressing Modes (DSP) Hand-coded – Uses more powerful modes when possible Figure 2.11 in textbook – Goes through mix of addressing modes and the % of time each is used for a TI DSP – 70%: • Immediate, displacement, register immediate, direct – 25%: • Auto increment, Auto decrement • Thoughts? – (random comment: make these 6 addressing – out of 17 total – the fastest…) 204521 Digital System Architecture March 22, 2016 47 Conrol Flow Instructions Up until now – implicitly have discussed memory and arithmetic instructions… Now, control flow…4 basic types – Procedure Call and Return – Jumps – Conditional branches 204521 Digital System Architecture March 22, 2016 48 Addr. Modes for CF Instructions PC-relative – Most commonly used for branches and jumps – Position-independent code – Target known at compile time Register indirect – Used when target not known at compile time (procedure returns, virtual functions and function pointers, dynamically loaded libraries, case/switch statements, etc.) 204521 Digital System Architecture March 22, 2016 49 Size of Branch Displacement (comments?) 204521 Digital System Architecture March 22, 2016 50 Branch Conditions (thoughts on %s, what constructs they apply to, etc.?) 204521 Digital System Architecture March 22, 2016 51 Call/Return Instructions Call – Minimum: save return address to the stack (x86) or in a register (MIPS) – Can create a stack frame, save registers, etc. Return – Jumps to return address – Can pop the stack frame, restore registers, etc. Simpler typically turns out to be better – E.g. many functions do not need a stack frame 204521 Digital System Architecture March 22, 2016 52 MIPS Call/Return Call: Jump-And-Link (JAL <function>) – Puts return address into R31, then jumps to target address Return: Register-Indirect Jump (JR R31) – Jumps to address in R31 (no special RET instruction) Stack frame create/pop via ordinary add/sub instrs (stack-pointer register is R29) Register save/restore via ordinary load/store instrs 204521 Digital System Architecture March 22, 2016 53 Procedure call essentials (1): Caller/Callee Mechanics who does what when? Four places foo() { bar(int a) { 2. callee at entry int temp = 3; 1. caller at call time bar(42); ... 4.... caller after return return(temp + a); 3. callee at exit } 204521 Digital System Architecture } March 22, 2016 54 Procedure call essentials (2): MIPS Registers Name R# Usage Preserved on Call $zero 0 The constant value 0 n.a. $at 1 Reserved for assembler n.a. $v0-$v1 2-3 Values for results & expr. eval. no $a0-$a3 4-7 Arguments no $t0-$t7 8-15 Temporaries no $s0-$s7 16-23 Saved yes $t8-$t9 24-25 More temporaries no $k0-$k1 26-27 Reserved for use by OS n.a. $gp 28 Global pointer yes $sp 29 Stack pointer yes $fp 30 Frame pointer yes $ra 31 Return address yes 204521 Digital System Architecture March 22, 2016 55 Procedure call essentials (3): Good Strategy do most work at callee entry/exit Caller at call time – put arguments in $a0..$a4 – save any caller-save temporaries – jalr ..., $ra Callee at entry – allocate all stack space – save $ra + $s0..$s3 if necessary Callee at exit most of the work – restore $ra + $s0..$s3 if used – deallocate all stack space – put return value in $v0 Caller after return – retrieve return value from $v0 – restore any caller-save temporaries 204521 Digital System Architecture March 22, 2016 56 Procedure call essentials (4): Summary… Summary – Caller saves registers (outside the agreed upon convention i.e. $ax) at point of call – Callee saves registers (per convention i.e. $sx) at point of entry – Callee restores saved registers, and re-adjusts stack before return – Caller restores saved registers, and re-adjusts stack before resuming from the call Big ?: – Is this clear? I can work through an example if needed… 204521 Digital System Architecture March 22, 2016 57 Instruction Encoding Instruction must specify – What is supposed to be done (opcode) – What are the operands Three popular formats Variable format (VAX, x86) – Opcode specifies how many operands, operands are listed after opcode – Each operand has an address specifier and an address field – Address specifier describes addressing mode for that operand Fixed format (RISC) – All instructions of the same size – Opcode specifies addressing mode for load/store operations – All other operations use register operands Hybrid Format (IBM 360, some DSP processors) – Several (but few) fixed size instruction formats – Some formats have address specifier fields 204521 Digital System Architecture March 22, 2016 58 How is the operation specified? Typically in a bit field called the opcode Also must encode addressing modes, etc. Variable Some options: Operation & Address # of operands Specifier 1 Address Field 1 …. Operation Address Field 1 Address Field 2 Address Field 3 Operation Address Specifier Address Field Operation Address Specifier 1 Address Specifier 2 Address Field Address Field 1 Address Field 2 Operation & Address # of operands Specifier 204521 Digital System Architecture March 22, 2016 Address Specifier n Address Field n Fixed Hybrid 59 Some random comments Variable addressing mode – allows virtually all addressing modes with all operations – Best when many addressing modes & operations Fixed addressing mode – combines operation & addressing mode into opcode – Best when few addressing modes and operations – Good for RISC Hybrid approach is 3rd alternative – Usually need a separate address specifier per operand When encoding instructions, # of registers and addressing modes can affect instruction size 204521 Digital System Architecture March 22, 2016 60 Instruction Encoding Tradeoffs Decoding vs. Programming – Fixed formats easy to decode – Variable formats easier to program (assembler) – But we mostly use compilers now… Number of registers – More registers help optimization (a lot) – Operand fields smaller with few registers – In general, we want many (e.g. 32) registers, but do we want even more is still an issue… 204521 Digital System Architecture March 22, 2016 61 Helping the Compiler Writers Regularity and Orthogonality – General-Purpose Registers – If an operation works with one data type, is should work with all supported data types – If an operation works with one addr. mode… Primitives, not solutions – E.g. JAL vs. elaborate function call instruction Simplify tradeoffs Bind constants at compile time 204521 Digital System Architecture March 22, 2016 62 Today’s compilers work like this: Dependencies: • Language dependent Function Pass • Machine independent Front-end per language Transform language to common, intermediate form Intermediate representation • Somewhat language dependent • Largely machine independent • Small language dependencies • Machine dependencies slight • (I.e. register counts/types) • Highly machine dependent • Language independent 204521 Digital System Architecture High-level optimizations For example, procedure inlining and loop transformations Global optimizer Including global and local optimization + register allocation Code generator Detailed instruction selection and machinedependent optimizations (assembler next?) March 22, 2016 63 How the architect can help the compiler writer Keep in mind: – Most programs are locally simple! – Simple translations work just fine – Complexity arises b/c program require lots of instructions and they must interact globally – Also b/c of the whole multiple pass thing The compiler writer’s corollary/rule/manifesto: – Make the frequent cases fast and the rare cases correct! 204521 Digital System Architecture March 22, 2016 64 Reading Assignment Read Section 2.12 (MIPS Architecture) – Especially Figure 2.27 Not required, but good to read anyway – Section 2.14 (Fallacies and Pitfalls) – Section 2.16 (Historical Perspective) 204521 Digital System Architecture March 22, 2016 65 The DLX mProcessor A generic mP that we’ll use from time-to-time Very similar to a MIPS machine Compiled by taking the average of a # of recent experimental and commercial machines Has 32 general purpose registers (R0, R1, … R31) and floating point registers Data types include: – – – – 8-bit bytes 16-bit half words 32-bit words for integer data words 32 & 64-bit double precision words 204521 Digital System Architecture March 22, 2016 66 DLX addressing modes The only data addressing modes are immediate and displacement Add R1, (1001) # R1 R1 + M(1001) Possible to implement register deferred and absolute Add R4, (R1) # R4 R4 + Mem(R1) DLX memory is byte addressable in the Big Endian mode with a 32-bit address DLX uses a load/store architecture so: – All memory references are through loads or stores between memory and either GPRs and FPRs 204521 Digital System Architecture March 22, 2016 67 DLX instruction format DLX has 2 addressing modes which are encoded in the opcode I-type instruction 6 5 Opcode rs1 5 16 rd Immediate Encodes: Loads and Stores of bytes, words, half words • All immediates (rd rs op immediate) • Conditional branch instructions (rs1 is register, rd is unused) • Jump register, jump and link register (rd = 0, rs1 = destination, immediate = 0) R-type instruction 6 5 Opcode rs1 5 rs2 5 rd 11 func Register-register ALU operations: rd rs1 func rs2 •Function encodes the data path operation: Add, Sub, … •Read/write special registers and moves 204521 Digital System Architecture March 22, 2016 68 DLX instruction format J-type instruction 6 26 Opcode Offset added to PC Jump and jump and link Trap and return from exception 204521 Digital System Architecture March 22, 2016 69 An example MIPS machine 0 M u x ALU Add result Add 4 Instruction [31 26] Control Instruction [25 21] PC Read address Instruction memory Instruction [15 11] Shift left 2 RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite PCSrc Read register 1 Instruction [20 16] Instruction [31– 0] 1 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Zero ALU ALU result Address Write data Instruction [15 0] 16 Sign extend Read data Data memory 1 M u x 0 32 ALU control Instruction [5 0] 204521 Digital System Architecture March 22, 2016 70 Memory Addressing ° Since 1980 almost every machine uses addresses to level of 8-bits (byte) ° 2 questions for design of ISA: • Since could read a 32-bit word as four loads of bytes from sequential byte addresses or as one load word from a single byte address, how do byte addresses map onto words? • Can a word be placed on any byte boundary? 204521 Digital System Architecture March 22, 2016 71 Displacement Address Size FP Avg. Int. Avg. 30% 25% 20% 15% 10% 5% 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0% ° Average of 5 programs from SPECint92 and Average of 5 programs from SPECfp92 ° X-axis is in powers of 2: 4 => addresses > 23 (8) and Š 24 (16) ° 1% of addresses > 16-bits 12 - 16 bits of displacement needed 204521 Digital System Architecture March 22, 2016 72 Addressing Objects: Endianess and Alignment Big Endian: address of most significant bit – IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA C O M P U T E R Little Endian: address of least significant bit – Intel 80x86, DEC Vax, DEC Alpha (Windows NT) R E T U PMO C little endian byte 0 3 2 1 0 msb lsb 0 0 big endian byte 0 1 2 1 2 3 3 Aligned Alignment: require that objects fall on address that is multiple of their size. Not Aligned 204521 Digital System Architecture March 22, 2016 73 Addressing Objects – Big Endian: address of most significant msb lsb 3 2 1 0 little endian word 0: 0 1 2 3 big endian word 0: Alignment: require that objects fall on address that is multiple of their size. (e.g., 2-bye word should be at 0,2,4,..) 204521 Digital System Architecture March 22, 2016 74 Byte Swap Problem D 3 A 3 C 2 B 2 B 1 C 1 A 0 D 0 increasing byte address Little Endian Big Endian When words are transferred 204521 Digital System Architecture March 22, 2016 75 Typical Memory Addressing Modes (Again!!) Addressing mode Example Meaning Register Add R4,R3 R4 R4+R3 Immediate Add R4,#3 R4 R4+3 Displacement Add R4,100(R1) R4 R4+Mem[100+R1] Register indirect Add R4,(R1) R4 R4+Mem[R1] Indexed Add R3,(R1+R2) R3 R3+Mem[R1+R2] Direct or absolute Add R1,(1001) R1 R1+Mem[1001] Memory indirect Add R1,@(R3) R1 R1+Mem[Mem[R3]] Auto-increment Add R1,(R2)+ R1 R1+Mem[R2]; R2 R2+d Auto-decrement Add R1,–(R2) R2 R2–d; R1 R1+Mem[R2] Scaled Add R1,100(R2)[R3] 204521 Digital System Architecture March 22, 2016 R1 R1+Mem[100+R2+R3*d] 76 Addressing Modes Usage Example For 3 programs running on VAX ignoring direct register mode: Displacement 42% avg, 32% to 55% Immediate: 33% avg, 17% to 43% Register deferred (indirect): 13% avg, 3% to 24% Scaled: 7% avg, 0% to 16% Memory indirect: 3% avg, 1% to 6% Misc: 2% avg, 0% to 3% 75% 88% 75% displacement & immediate 88% displacement, immediate & register indirect. Observation: In addition Register direct, Displacement, Immediate, Register Indirect addressing modes are important. CISC to RISC observation (fewer addressing modes simplify CPU design) 204521 Digital System Architecture March 22, 2016 77 Immediate Size • 50% to 60% fit within 8 bits • 75% to 80% fit within 16 bits (size of the “immediate no” used in an instruction) 204521 Digital System Architecture March 22, 2016 78 Addressing Summary •Data Addressing modes that are important: Displacement, Immediate, Register Indirect •Displacement size should be 12 to 16 bits •Immediate size should be 8 to 16 bits 204521 Digital System Architecture March 22, 2016 79 Utilization of Memory Addressing Modes Most Common 204521 Digital System Architecture March 22, 2016 80 Displacement Address Size Example Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs Int. Avg. FP Avg. 30% 20% 10% 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0% Displacement Address Bits Needed 1% of addresses > 16-bits 12 - 16 bits of displacement needed CISC to RISC observation 204521 Digital System Architecture March 22, 2016 81 Operation Types in The Instruction Set Operator Type Arithmetic and logical Examples Integer arithmetic and logical operations: add, or Data transfer Loads-stores (move on machines with memory addressing) Control Branch, jump, procedure call, and return, traps. System Operating system call, virtual memory management instructions Floating point Floating point operations: add, multiply. Decimal Decimal add, decimal multiply, decimal to character conversion String String move, string compare, string search Media The same operation performed on multiple data (e.g Intel MMX, SSE) 204521 Digital System Architecture March 22, 2016 82 Instruction Usage Example: Top 10 Intel X86 Instructions Rank instruction Integer Average Percent total executed 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96% Observation: Simple instructions dominate instruction usage frequency. CISC to RISC observation 204521 Digital System Architecture March 22, 2016 83 Instruction Set Encoding Considerations affecting instruction set encoding: – To have as many registers and addressing modes as possible. – The Impact of of the size of the register and addressing mode fields on the average instruction size and on the average program. – To encode instructions into lengths that will be easy to handle in the implementation. On a minimum to be a multiple of bytes. • Fixed length encoding: Faster and easiest to implement in hardware. • Variable length encoding: Produces smaller instructions. • Hybrid encoding. CISC to RISC observation 204521 Digital System Architecture March 22, 2016 84 Three Examples of Instruction Set Encoding Operations & no of operands Address specifier 1 Address field 1 Address specifier n Address field n Variable: VAX (1-53 bytes) Operation Address field 1 Fixed: Operation Operation Operation Address field 2 Address field3 MIPS, PowerPC, SPARC (Each instruction is 4 bytes) Address Specifier Address Specifier 1 Address Specifier Address field Address Specifier 2 Address field 1 Address field Address field 2 Hybrid : IBM 360/370, Intel 80x86 204521 Digital System Architecture March 22, 2016 85 Complex Instruction Set Computer (CISC) Emphasizes doing more with each instruction: – Thus fewer instructions per program (more compact code). Motivated by the high cost of memory and hard disk capacity when original CISC architectures were proposed – – When M6800 was introduced: 16K RAM = $500, 40M hard disk = $ 55, 000 When MC68000 was introduced: 64K RAM = $200, 10M HD = $5,000 Original CISC architectures evolved with faster more complex CPU designs but backward instruction set compatibility had to be maintained. Wide variety of addressing modes: • 14 in MC68000, 25 in MC68020 A number instruction modes for the location and number of operands: • The VAX has 0- through 3-address instructions. Variable-length instruction encoding. 204521 Digital System Architecture March 22, 2016 86 Example CISC ISA: Motorola 680X0 18 addressing modes: Data register direct. GPR ISA (Register-Memory) Address register direct. Immediate. Absolute short. Range from 1 to 32 bits, 1, Absolute long. 2, 4, 8, 10, or 16 bytes. Address register indirect. Address register indirect with postincrement. Address register indirect with predecrement. Address register indirect with displacement. Address register indirect with index (8-bit). Address register indirect with index (base). Memory inderect postindexed. Instructions are stored in Memory indirect preindexed. 16-bit words. Program counter indirect with index (8-bit). Program counter indirect with index (base). the smallest instruction is 2- bytes (one word). Program counter indirect with displacement. Program counter memory indirect The longest instruction is 5 postindexed. words (10 bytes) in length. Program counter memory indirect preindexed. Operand size: Instruction Encoding: 204521 Digital System Architecture March 22, 2016 87 Example CISC ISA:Intel IA-32, X86 (80386) 12 addressing modes: Register. Immediate. Direct. Base. Base + Displacement. Index + Displacement. Scaled Index + Displacement. Based Index. Based Scaled Index. Based Index + Displacement. Based Scaled Index + Displacement. Relative. 204521 Digital System Architecture GPR ISA (Register-Memory) Operand sizes: Can be 8, 16, 32, 48, 64, or 80 bits long. Also supports string operations. Instruction Encoding: The smallest instruction is one byte. The longest instruction is 12 bytes long. The first bytes generally contain the opcode, mode specifiers, and register fields. The remainder bytes are for address displacement and immediate data. March 22, 2016 88 Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the machine. Reduced CPI. Goal: At least one instruction per clock cycle. Designed with pipelining in mind. (CPI = 1 or less) Fixed-length instruction encoding. Only load and store instructions access memory. (Thus more instructions executed than CISC) Simplified addressing modes. – Usually limited to immediate, register indirect, register displacement, indexed. Delayed loads and branches. Instruction pre-fetch and speculative execution. Examples: MIPS, SPARC, PowerPC, Alpha 204521 Digital System Architecture March 22, 2016 89 Example RISC ISA: HP Precision Architecture, HP PA-RISC Operand sizes: 7 addressing modes: Register Immediate Base with displacement Base with scaled index and displacement Predecrement Postincrement PC-relative Five operand sizes ranging in powers of two from 1 to 16 bytes. Instruction Encoding: Instruction set has 12 different formats. All are 32 bits in length. 204521 Digital System Architecture March 22, 2016 90 Example RISC ISA: DEC/Compaq/Intel? Alpha AXP Operand sizes: 4 addressing modes: Register direct. Immediate. Register indirect with displacement. PC-relative. Four operand sizes: 1, 2, 4 or 8 bytes. Instruction Encoding: Instruction set has 7 different formats. All are 32 bits in length. 204521 Digital System Architecture March 22, 2016 91 RISC ISA Example: MIPS R3000 (32bits) 5 Addressing Modes: Instruction Categories: Registers Register direct (arithmetic). Immedate (arithmetic). Base register + immediate offset (loads and stores). PC relative (branches). Pseudodirect (jumps) Load/Store. Computational. Jump and Branch. Floating Point (using coprocessor). Memory Management. Special. R0 - R31 PC HI Operand Sizes: Memory accesses in any multiple between 1 and 4 bytes. LO Instruction Encoding: 3 Instruction Formats, all 32 bits wide. OP rs rt OP rs rt OP 204521 Digital System Architecture rd sa funct immediate jump target March 22, 2016 92 A RISC ISA Example: MIPS Register-Register 31 26 25 Op 21 20 rs rt 6 5 11 10 16 15 rd sa 0 funct Register-Immediate 31 26 25 Op 21 20 rs 16 15 0 immediate rt Branch 31 26 25 Op 21 20 rs 16 15 0 displacement rt Jump / Call 31 26 25 Op 204521 Digital System Architecture 0 target March 22, 2016 93 An Instruction Set Example: MIPS64 A RISC-type 64-bit instruction set architecture based on instruction set design considerations of chapter 2: – Use general-purpose registers with a load/store architecture to access memory. – Reduced number of addressing modes: displacement (offset size of 16 bits), immediate (16 bits). – Data sizes: 8 (byte), 16 (half word) , 32 (word), 64 (double word) bit integers and 32-bit or 64-bit IEEE 754 floating-point numbers. – Use fixed instruction encoding (32 bits) for performance. – 32, 64-bit general-purpose integer registers GPRs, R0, …., R31. R0 always has a value of zero. – Separate 32, 64-bit floating point registers FPRs: F0, F1 … F31 When holding a 32-bit single-precision number the upper half of the FPR is not used. 204521 Digital System Architecture March 22, 2016 94 MIPS64 Instruction Format (1/2) I - type instruction 6 Opcode 5 5 16 rs rt Immediate Encodes: Loads and stores of bytes, words, half words. All immediates (rt ¬ rs op immediate) Conditional branch instructions Jump register, jump and link register ( rs = destination, immediate = 0) R - type instruction 6 Opcode 5 rs 5 rt 5 rd 5 shamt 6 func Register-register ALU operations: rd ¬ rs func rt Function encodes the data path operation: Add, Sub .. Read/write special registers and moves. 204521 Digital System Architecture March 22, 2016 95 MIPS64 Instruction Format (2/2) J - Type instruction 6 Opcode 26 Offset added to PC Jump and jump and link. Trap and return from exception 204521 Digital System Architecture March 22, 2016 96 MIPS Addressing Modes/Instruction Formats • All instructions 32 bits wide First Operand Register (direct) op Second Operand rs rt Destination rd R-Type register Immediate ALU Displacement: Base+index op rs rt immed op rs rt immed Memory (loads/stores) register PC-relative op Branches rs rt immed PC Memory + Pseudodirect Addressing for jumps (J-Type) not shown here 204521 Digital System Architecture + March 22, 2016 97 MIPS64 Instructions: Load and Store LD R1,30(R2) LW R1, 60(R2) Load double word Load word LB R1, 40(R3) Load byte LBU R1, 40(R3) LH R1, 40(R3) L.S F0, 50(R3) L.D F0, 50(R2) SD R3,500(R4) SW R3,500(R4) S.S F0, 40(R3) S.D F0,40(R3) SH R3, 502(R2) SB R2, 41(R3) Regs[R1] ¬64 Mem[30+Regs[R2]] Regs[R1] ¬64 (Mem[60+Regs[R2]]0)32 ## Mem[60+Regs[R2]] Regs[R1] ¬64 (Mem[40+Regs[R3]]0)56 ## Mem[40+Regs[R3]] Load byte unsigned Regs[R1] ¬64 056 ## Mem[40+Regs[R3]] Load half word Regs[R1] ¬64 (Mem[40+Regs[R3]]0)48 ## Mem[40 + Regs[R3] ] # # Mem [41+Regs[R3]] Load FP single Regs[F0] ¬64 Mem[50+Regs[R3]] ## 032 Load FP double Regs[F0] ¬64 Mem[50+Regs[R2]] Store double word Mem [500+Regs[R4]] ¬64 Reg[R3] Store word Mem [500+Regs[R4]] ¬32 Reg[R3] Store FP single Mem [40, Regs[R3]] ¬ 32 Regs[F0] 0…31 Store FP double Mem[40+Regs[R3]] ¬-64 Regs[F0] Store half Mem[502+Regs[R2]] ¬16 Regs[R3]48…63 Store byte Mem[41 + Regs[R3]] ¬8 Regs[R2] 56…63 204521 Digital System Architecture March 22, 2016 98 MIPS64 Instructions: Arithmetic/Logical DADDU R1, R2, R3 Add unsigned Regs[R1] ¬ Regs[R2] + Regs[R3] DADDI R1, R2, #3 Add immediate Regs[R1] ¬ Regs[R2] + 3 LUI R1, #42 Regs[R1] ¬ 032 ##42 ## 016 Load upper immediate DSLL R1, R2, #5 Shift left logical DSLT R1, R2, R3 Set less than 204521 Digital System Architecture Regs[R1] ¬ Regs [R2] <<5 if (regs[R2] < Regs[R3] ) Regs [R1] ¬ 1 else Regs[R1] ¬ 0 March 22, 2016 99 MIPS64 Instructions: Control-Flow J name Jump JAL name Jump and link Regs[31] ¬ PC+4; PC 36..63 ¬ name; ((PC+4)- 227) £ name < ((PC + 4) + 227) JALR R2 Jump and link register Regs[R31] ¬ PC+4; PC ¬ Regs[R2] JR R3 Jump register PC ¬ Regs[R3] BEQZ R4, name Branch equal zero if (Regs[R4] ==0) PC ¬ name; ((PC+4) -217) £ name < ((PC+4) + 217 BNEZ R4, Name Branch not equal zero if (Regs[R4] != 0) PC ¬ name ((PC+4) - 217) £ name < ((PC +4) + 217 MOVZ R1,R2,R3 PC 36..63 ¬ name Conditional move if zero if (Regs[R3] ==0) Regs[R1] ¬ Regs[R2] 204521 Digital System Architecture March 22, 2016 100 The Role of Compilers The Structure of Recent Compilers: Dependencies Language dependent machine dependent Somewhat Language dependent largely machine independent Function: Front-end per Language Transform Language to Common intermediate form High-level Optimizations For example procedure inlining and loop transformations Small language dependencies machine dependencies slight Global Optimizer (e.g. register counts/types) Highly machine dependent language independent 204521 Digital System Architecture Code generator March 22, 2016 Include global and local optimizations + register allocation Detailed instruction selection and machinedependent optimizations; may include or be followed by assembler 101 The Role of Compilers 204521 Digital System Architecture March 22, 2016 102 Compiler Optimization and Instruction Count Change in instruction count for the programs lucas and mcf from SPEC2000 as compiler optimizations vary. 204521 Digital System Architecture March 22, 2016 103 Typical Operations Data Movement Arithmetic Load (from memory) Store (to memory) memory-to-memory move register-to-register move input (from I/O device) output (to I/O device) push, pop (to/from stack) integer (binary + decimal) or FP Add, Subtract, Multiply, Divide Logical not, and, or, set, clear Shift shift left/right, rotate left/right Control (Jump/Branch) unconditional, conditional Subroutine Linkage call, return Interrupt trap, return Synchronization test & set (atomic r-m-w) String search, translate (e.g., char to int) 204521 Digital System Architecture March 22, 2016 104 Top 10 80x86 Instructions ° Rank instruction Integer Average Percent total executed 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96% ° Simple instructions dominate instruction frequency 204521 Digital System Architecture March 22, 2016 105 Methods of Testing Condition Condition Codes Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions. ex: add r1, r2, r3 bz label ° Condition Register Ex: cmp r1, r2, r3 bgt r1, label ° Compare and Branch Ex: bgt r1, r2, label 204521 Digital System Architecture March 22, 2016 106 Condition Codes Setting CC as side effect can reduce the # of instructions X: . X: . . vs. . . . SUB r0, #1, r0 SUB r0, #1, r0 CMP r0, #0 BRP X BRP X But also has disadvantages: --- not all instructions set the condition codes which do and which do not often confusing! e.g., shift instruction sets the carry bit --- dependency between the instruction that sets the CC and the one that tests it: to overlap their execution, may need to separate them with an instruction that does not change the CC ifetch read compute New CC computed Old CC read ifetch 204521 Digital System Architecture write read March 22, 2016 compute write 107 Branches --- Conditional control transfers Four basic conditions: N -- negative Z -- zero V -- overflow C -- carry Sixteen combinations of the basic four conditions: Always Unconditional Never NOP Not Equal ~Z Equal Z Greater ~[Z + (N + V)] Less or Equal Z + (N + V) Greater or Equal ~(N + V) Less N+V Greater Unsigned ~(C + Z) Less or Equal Unsigned C+Z Carry Clear ~C Carry Set C Positive ~N Negative N Overflow Clear ~V Overflow Set V 204521 Digital System Architecture March 22, 2016 108 Conditional Branch Distance Int. Avg. FP Avg. 40% 30% 20% 10% 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0% Bits of Branch Dispalcement • Distance from branch in instructions 2i => Š ±2i-1 & > 2i-2 • 25% of integer branches are > 2 to 4 204521 Digital System Architecture March 22, 2016 109 Conditional Branch Addressing • PC-relative since most branches At least 8 bits suggested (± 128 instructions) • Compare Equal/Not Equal most important for integer programs (86%) 7% LT/GE 40% Int Avg. 7% GT/LE 23% FP Avg. 86% EQ/NE 37% 0% 50% 100% Frequency of comparison types in branches 204521 Digital System Architecture March 22, 2016 110 Operation Summary Support these simple instructions, since they will dominate the number of instructions executed: – – – – – – – – – load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8-bits long), jump, – call, – return; 204521 Digital System Architecture March 22, 2016 111 Data Types Bit: 0, 1 Bit String: sequence of bits of a particular length 4 bits is a nibble 8 bits is a byte 16 bits is a half-word (VAX: word) 32 bits is a word (VAX: long word) Character: ASCII 7 bit code EBCDIC 8 bit code Decimal: digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte Integers: Sign & Magnitude: 0X vs. 1X 1's Complement: 0X vs. 1(~X) 2's Complement: 0X vs. (1's comp) + 1 Floating Point: Single Precision Double Precision Extended Precision Positive #'s same in all First 2 have two zeros Last one usually chosen exponent MxR E base mantissa 204521 Digital System Architecture March 22, 2016 How many +/- #'s? Where is decimal pt? How are +/- exponents represented? 112 Operand Size Usage Doubleword 0% 69% 74% Word Halfword Byte Int Avg. 31% 19% FP Avg. 0% 7% 0% 0% 20% 40% 60% 80% Frequency of reference by size •Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers 204521 Digital System Architecture March 22, 2016 113 Instruction Format If have many memory operands per instructions and many addressing modes, need an Address Specifier per operand If have load-store machine with 1 address per instr. and one or two addressing modes, then just encode addressing mode in the opcode 204521 Digital System Architecture March 22, 2016 114 Generic Examples of Instruction Formats … Variable: Fixed: Hybrid: 204521 Digital System Architecture March 22, 2016 115 Summary of Instruction Formats If code size is most important, use variable length instructions If performance is most important, use fixed length instructions 204521 Digital System Architecture March 22, 2016 116 Lecture Summary: ISA ° Use general purpose registers with a load-store architecture; ° Support these addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register deferred; ° Support these simple instructions, since they will dominate the number of instructions executed: load, store, add, subtract, move registerregister, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8-bits long), jump, call, and return; ° Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 64bit IEEE 754 floating point numbers; ° Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size; ° Provide at least 16 general purpose registers plus separate floatingpoint registers, be sure all addressing modes apply to all data transfer instructions, and aim for a minimalist instruction set. 204521 Digital System Architecture March 22, 2016 117