Computer Architecture Chapter 2 Lecture (Class A): Tue(1A-2A), Wed(7A-8A) Lecture (Class A): Tue(1A-2A), Wed(7A-8A) Lecture (Class B): Tue(2B-3B), Wed(5B-6B) Lecture (Class B): Tue(2B-3B), Wed(5B-6B) Office Hours: Tue(4A-4B), Wed(8B-9A) Office Hours: Tue(4A-4B), Wed(8B-9A) This material is for educational uses only. Slides adapted from D. Patterson and M. Irwin, and some contents are based on the material provided by other paper/book authors and may be copyrighted by them. Head Up q Last week’s material Review: Number representation and combinational logic circuit q This week’s material Introduction to MIPS assembler, adds/loads/stores q Next week’s material MIPS control flow operations Review: Execute Cycle The datapath executes the instructions as directed by control Devices Processor Network Control 000000 00100 00010 0001000000100000 Memory Input Datapath contents Reg #4 ADD contents Reg #2 results put in Reg #2 Output Memory stores both instructions and data Review: Processor Organization q Control needs to have circuitry to Fetch Decide which is the next instruction and input it from memory Exec Decode Decode the instruction Issue signals that control the way information flows between datapath components Control what operations the datapath’s functional units perform q Datapath needs to have circuitry to Execute instructions - functional units (e.g., adder) and storage locations (e.g., register file) Interconnect the functional units so that the instructions can be executed as required Load data from and store data to memory Assembly Language Instructions q The language of the machine Want an ISA that makes it easy to build the hardware and the compiler while maximizing performance and minimizing cost q Stored program (von Neumann) concept Instructions are stored in memory (as is the data) q Our target: the MIPS ISA similar to other ISAs developed since the 1980's used by Broadcom, Cisco, NEC, Nintendo, Sony, … Design goals: maximize performance, minimize cost, reduce design time (time-to-market), minimize memory space (embedded systems), minimize power consumption (mobile systems) RISC - Reduced Instruction Set Computer q RISC philosophy fixed instruction lengths load-store instruction sets limited number of addressing modes limited number of operations q MIPS, Sun SPARC, HP PA-RISC, IBM PowerPC … q Instruction sets are measured by how well compilers use them as opposed to how well assembly language programmers use them q CISC (C for complex), e.g., Intel x86 MIPS Instructions: Addition q add: mnemonic indicates what operation to perform q b, c: source operands on which the operation is performed q a: destination operand to which the result is written MIPS Instructions: Subtraction q Subtraction is similar to addition, only mnemonic changes q add: mnemonic indicates what operation to perform q b, c: source operands on which the operation is performed q a: destination operand to which the result is written Design Principle 1 Simplicity favors regularity q Consistent instruction format q Same number of operands (two sources and one destination) easier to encode and handle in hardware Instructions: More Complex Code q More complex code is handled by multiple MIPS instructions Design Principle 2 Make the common case fast q MIPS includes only simple, commonly used instructions q Hardware to decode and execute the instruction can be simple, small, and fast q More complex instructions (that are less common) can be performed using multiple simple instructions Operands: Registers q A computer needs a physical location from which to retrieve binary operands q A computer retrieves operands from: Registers / Memory / Constants (also called immediates) Main memory is slow Most architectures have a small set of (fast) registers (MIPS has thirty-two 32-bit registers) MIPS is called a 32-bit architecture because it operates on 32-bit data - A 64-bit version of MIPS also exists, but we will consider only the 32-bit version Design Principle 3 Smaller is faster q MIPS includes only a small number of registers q Just as retrieving data from a few books on your table is faster than sorting through 1000 books, retrieving data from 32 registers is faster than retrieving it from 1000 registers or a large memory MIPS Arithmetic Instruction q MIPS assembly language arithmetic statement add $t0, $s1, $s2 sub $t0, $s1, $s2 q Each arithmetic instruction performs only one operation q Each arithmetic instruction specifies exactly three operands destination ¬ source1 op source2 Operand order is fixed (the destination is specified first) q The operands are contained in the datapath’s register file ($t0, $s1, $s2) Compiling More Complex Statements q Assuming that variable b is stored in register $s1, c is stored in $s2, d is stored in $s3, and the result is to be left in $s0, what is the assembler equivalent to the C statement h = (b - c) + d MIPS Register File q Operands of arithmetic instructions must be from a limited number of special locations contained in the datapath’s register file Thirty-two 32-bit registers - Two read ports - One write port q Registers are Fast - Smaller is faster & Make the common case fast Easy for a compiler to use - e.g., (A*B) – (C*D) – (E*F) can do multiplies in any order Improves code density - Since register are named with fewer bits than a memory location q Register addresses are indicated by using $ Naming Conventions for Registers 0 $zero constant constant 00 (Hdware) 16 $s0 callee saves 1 $at reserved reservedfor forassembler assembler ... 2 $v0 expression evaluation & 23 $s7 3 $v1 function results 24 $t8 temporary (cont’d) 4 $a0 arguments 25 $t9 5 $a1 26 $k0 reserved forOS OSkernel kernel reserved for 6 $a2 27 $k1 7 $a3 28 $gp pointer to global area 8 $t0 temporary: caller saves 29 $sp stack pointer ... (callee can clobber) 30 $fp frame pointer 15 $t7 (caller can clobber) 31 $ra return (Hdware) Return address address (HW) Operand: Registers q Written with a dollar sign ($) before their name For example, register 0 is written “$0”, pronounced “register zero” or “dollar zero” q Certain registers used for specific purposes: $0 always holds the constant value 0 the saved registers, $s0-$s7, are used to hold variables the temporary registers, $t0-$t9, are used to hold intermediate values during a larger computation q For now, we only use the temporary registers ($t0-$t9) and the saved registers ($s0-$s7) q We will use the other registers in later slides Example q How to do the following C statement? f = (g + h) - (i + j); f g h i j $s0 $s1 $s2 $s3 $s4 use intermediate temporary register $t0, $t1 ? # t0 = g + h q add $t0,$s1,$s2 ? # t1 = i + j q add $t1,$s3,$s4 ? # f=(g+h)-(i+j) q sub $s0,$t0,$t1 Registers vs. Memory q Arithmetic instructions operands must be in registers only thirty-two registers are provided Devices Processor Network Control Datapath Memory Input Output q Compiler associates variables with registers What about programs with lots of variables? Register Allocation q Compiler tries to keep as many variables in registers as possible: graph coloring q Some variables can not be allocated large arrays (too few registers) aliased variables (variables accessible through pointers in C) dynamic allocated variables - heap - stack Compiler may run out of registers ⇒ spilling Register Allocation Using Graph Coloring q Example Register Allocation Using Graph Coloring Register Allocation: Spilling q Spill/Reload code Spill/Reload code is needed when there are not enough colors (registers) to color the interference graph Example: What if only two registers available? ? ? Fundamental Data Types Logical Memory Organization q Memory is a large, single-dimension array, with an address. q A memory address is an index into the array q "Byte addressing" means that the index points to a byte of memory. Physical Memory Organization Processor – Memory Interconnections q Memory is a large, single-dimensional array q An address acts as the index into the memory array Memory read addr/ write addr Processor ? locations read data write data 10 101 1 32 bits Data Transfer: Memory to Register (1/3) q To transfer a word of data, need to specify two things: Register: specify this by number (0 - 31) Memory address: more difficult - Think of memory as a 1D array - Address it by supplying a pointer to a memory address - Offset (in bytes) from this pointer - The desired memory address is the sum of these two values, e.g., 8($t0) - Specifies the memory address pointed to by the value in $t0, plus 8 bytes (why “bytes”, not “words”?) - Each address is 32 bits Data Transfer: Memory to Register (2/3) q Load Instruction Syntax: lw $t0,12($s0) 1 2 3 4 1) operation name 2) register that will receive value 3) numerical offset in bytes 4) register containing pointer to memory q Example: lw $t0,12($s0) lw (Load Word, so a word (32 bits) is loaded at a time) Take the pointer in $s0, add 12 bytes to it, and then load the value from the memory pointed to by this calculated sum into register $t0 Data Transfer: Memory to Register (3/3) q Load Instruction Syntax: lw $t0,12($s0) q Notes: $s0 is called the base register, 12 is called the offset Offset is generally used in accessing elements of array: base register points to the beginning of the array Data Transfer: Register to Memory q Also want to store value from a register into memory q Store instruction syntax is identical to Load instruction syntax q Example: sw $t0,12($s0) sw (meaning Store Word, so 32 bits or one word are stored at a time) This instruction will take the pointer in $s0, add 12 bytes to it, and then store the value from register $t0 into the memory address pointed to by the calculated sum Memory Operand Example 1 q Compile by hand using registers: $s1:g, $s2:h, $s3:base address of A g = h + A[8]; q What offset in lw to select an array element A[8] in a C program? 4x8 = 32 bytes to select A[8] 1st transfer from memory to register: lw $t0, 32($s3) # $t0 gets A[8] Add 32 to $s3 to select A[8], put into $t0 q Next add it to h and place in g add $s1,$s2,$t0 # $s1 = h+A[8] Memory Operand Example 2 q C code: A[12] = h + A[8]; - h in $s2, base address of A in $s3 q Compiled MIPS code: Index 8 requires offset of 32 lw add sw $t0, 32($s3) $t0, $s2, $t0 $t0, 48($s3) # load word # store word Accessing Memory q MIPS has two basic data transfer instructions for accessing memory (assume $s3 holds 2410) lw $t0, 4($s3) #load word from memory sw $t0, 8($s3) #store word to memory q The data transfer instruction must specify where in memory to read from (load) or write to (store) – memory address where in the register file to write to (load) or read from (store) – register destination (source) q The memory address is formed by summing the constant portion of the instruction and the contents of the second register MIPS Memory Addressing q The memory address is formed by summing the constant portion of the instruction and the contents of the second (base) register $s3 holds 8 lw sw Memory $t0, 4($s3) $t0, 8($s3) ...0110 24 ...0101 20 ...1100 16 ...0001 12 ...0010 8 ...1000 4 ...0100 Data 0 Word Address #what? is loaded into $t0 #$t0 is stored where? Compiling with Loads and Stores q Assuming that variable b is stored in $s2 and the base address of array A is in $s3, q What is the MIPS assembly code for the C statement A[8] = A[2] - b ... ... A[3] $s3+12 A[2] $s3+8 A[1] $s3+4 A[0] $s3 Compiling with a Variable Array Index ... ... A[3] $s4+12 A[2] $s4+8 A[1] $s4+4 A[0] $s4 q Assuming that the base address of array A is in register $s4 and variables b, c, and i are in $s1, $s2, and $s3, respectively, q What is the MIPS assembly code for the C statement c = A[i] - b add $t1, $s3, $s3 #array index i is in $s3 add $t1, $t1, $t1 #temp reg $t1 holds 4*i Registers vs. Memory q Registers are faster to access than memory q Operating on memory data requires loads and stores More instructions to be executed q Compiler must use registers for variables as much as possible Only spill to memory for less frequently used variables Register optimization is important! Dealing with Constants q Small constants are used quite frequently (50% of operands in many common programs) e.g., A = A + 5; B = B + 1; C = C - 18; q Solutions? Why not? Put “typical constants” in memory and load them Create hard-wired registers (like $zero) for constants like 1, 2, 4, 10, … q How do we make this work? q How do we Make the common case fast ! Constant (or Immediate) Operands q Include constants inside arithmetic instructions Much faster than if they have to be loaded from memory (they come in from memory with the instruction itself) q MIPS immediate instructions addi $s3, $s3, 4 #$s3 = $s3 + 4 There is no subi instruction, can you guess why not? Immediate Operands q No subtract immediate instruction in MIPS ISA design principle: limit types of operation that can be done to minimum If an operation can be decomposed into a simpler operation, do not include it addi …, -X = subi .., X => so no subi q Example C code: f = g – 10 MIPS code: addi $s0, $s1, -10 The Constant Zero q The number zero (0), appears very often in code; so we define register zero q MIPS register 0 ($zero) is the constant 0 Cannot be overwritten This is defined in hardware, so an instruction like addi $0,$0,5 will not do anything q Useful for common operations E.g., move between registers add $t2, $s1, $zero MIPS Instructions, so far Category Instr Arithmetic add Data transfer Example Meaning add $s1, $s2, $s3 $s1 = $s2 + $s3 subtract sub $s1, $s2, $s3 $s1 = $s2 - $s3 add immediate addi $s1, $s2, 4 $s1 = $s2 + 4 load word lw $s1, 32($s2) $s1 = Memory($s2+32) store word sw $s1, 32($s2) Memory($s2+32) = $s1 Review: MIPS Organization q Arithmetic instructions – to/from the register file q Load/store instructions - to/from memory Memory Processor 1…1100 Register File src1 addr src1 data 32 5 src2 addr 32 5 registers dst addr ($zero - $ra) src2 5 write data data 32 32 32 bits 32 ALU 32 read/write addr 230 words 32 read data 32 write data 32 32 4 0 byte address (big Endian) 5 1 6 2 32 bits 7 3 0…1100 0…1000 0…0100 0…0000 word address (binary) Review: Unsigned Binary Representation Hex Binary Decimal 0x00000000 0…0000 0 0x00000001 0…0001 1 0x00000002 0…0010 2 231 230 229 0x00000003 0…0011 3 31 30 29 ... 3 0x00000004 0…0100 4 1 1 1 ... 1 1 1 1 bit 0x00000005 0…0101 5 0x00000006 0…0110 6 0x00000007 0…0111 7 1 0 0 0 ... 0 0 0 0 - 0x00000008 0…1000 8 0x00000009 0…1001 9 … 0xFFFFFFFC 1…1100 232 - 4 0xFFFFFFFD 1…1101 232 - 3 0xFFFFFFFE 1…1110 0xFFFFFFFF 1…1111 232 - 2 232 - 1 ... 23 22 21 20 bit weight 0 bit position 232 - 1 2 1 1 Review: Signed Binary Representation 2’sc binary decimal -23 = 1000 -8 -(23 - 1) = 1001 -7 1010 -6 1011 -5 1100 -4 1101 -3 1110 -2 1111 -1 0000 0 0001 1 0010 2 0011 3 0100 4 0101 5 0110 6 0111 7 complement all the bits 0101 and add a 1 1011 0110 and add a 1 1010 complement all the bits 23 - 1 = Machine Language - Arithmetic Instruction q Instructions, like registers and words of data, are also 32 bits long Example: add $t0, $s1, $s2 registers have numbers $t0=$8,$s1=$17,$s2=$18 q Instruction Format: op rs 000000 10001 rt rd shamt funct 10010 01000 00000 100000 Can you guess what the field names stand for? MIPS Instruction Fields op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits q op q rs q rt q rd q shamt q funct = 32 bits Machine Language - Load Instruction q Consider the load-word and store-word instr’s What would the regularity principle have us do? - But . . . Good design demands compromise q Introduce a new type of instruction format I-type for data transfer instructions (previous format was R-type for register) q Example: op lw $t0, 24($s2) rs rt 23hex 18 100011 10010 16 bit number 8 01000 24 0000000000011000 Where's the compromise? Memory Address Location q Example: lw $t0, 24($s2) Memory 0xf f f f f f f f 2410 + $s2 = 0x00000002 0x12004094 $s2 Note that the offset can be positive or negative 0x120040ac data 0x0000000c 0x00000008 0x00000004 0x00000000 word address (hex) Machine Language - Store Instruction q Example: sw $t0, 24($s2) op rs rt 43 18 8 101011 10010 01000 16 bit number 24 0000000000011000 q A 16-bit offset means access is limited to memory locations within a range of +213-1 to -213 (~8,192) words (+215-1 to -215 (~32,768) bytes) of the address in the base register $s2 2’s complement (1 sign bit + 15 magnitude bits) Machine Language – Immediate Instructions q What instruction format is used for the addi ? addi $s3, $s3, 4 #$s3 = $s3 + 4 q Machine format: Instruction Format Encoding q Can reduce the complexity with multiple formats by keeping them as similar as possible First three fields are the same in R-type and I-type q Each format has a distinct set of values in the op field Instr Frmt op rs rt rd shamt funct address add R 0 reg reg reg 0 32ten NA sub R 0 reg reg reg 0 34ten NA addi I 8ten reg reg NA NA NA constant lw I 35ten reg reg NA NA NA address sw I 43ten reg reg NA NA NA address Assembling Code q Remember the assembler code we compiled last lecture for the C statement A[8] = A[2] - b lw sub sw $t0, 8($s3) $t0, $t0, $s2 $t0, 32($s3) #load A[2] into $t0 #subtract b from A[2] #store result in A[8] q Assemble the MIPS object code for these three instructions (decimal is fine) lw sub sw Review: MIPS Instructions, so far Category Instr Op Code Example Meaning Arithmetic add (R format) subtract 0& 32 add $s1, $s2, $s3 $s1 = $s2 + $s3 0& 34 sub $s1, $s2, $s3 $s1 = $s2 - $s3 Arithmetic add (I format) immediate 8 addi $s1, $s2, 4 $s1 = $s2 + 4 Data transfer (I format) load word 35 lw $s1, 100($s2) $s1 = Memory($s2+100) store word 43 sw $s1, 100($s2) Memory($s2+100) = $s1 Review: MIPS R3000 ISA q Instruction Categories Registers Load/Store Computational Jump and Branch Floating Point R0 - R31 - coprocessor PC HI Memory Management Special LO q 3 Instruction Formats: all 32 bits wide 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits OP rs rt rd shamt funct OP rs rt 16 bit number OP 26 bit jump target R format I format