ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 4: Arithmetic / Data Transfer Instructions Partially adapted from Computer Organization and Design, 4th edition, Patterson and Hennessy, and classes taught by Ryan Kastner at UCSB Agenda • Review of last lecture • Load/store operations • Multiply and divide instructions ECE 15B Spring 2010 Last Lecture ECE 15B Spring 2010 Assembly Language • Basic job of a CPU: execute lots of instructions • Instructions are the primitive operations that the CPU may execute • Different CPUs implement different sets of instructions • Instruction Set Architecture (ISA) is a set of instructions a particular CPU implements • Examples: Intel 80x86 (Pentium 4), IBM/Motorola Power PC (Macintosh), MIPS, Intel IA64, ARM ECE 15B Spring 2010 Assembly Variables: Registers • Unlike HLL like C or Java, assembly cannot use variables – Why not? Keep hardware simple • Assembly Operands are registers – Limited number of special locations built directly into the hardware – Operations can only be performed on these – Benefit: Since registers file is small, it is very fast ECE 15B Spring 2010 Assembly Variables: Registers • By convention, each register also has a name to make it easier to code • For now: $16 - $23 $s0 - $s7 (correspond to C variables) $8 - $15 $t0 - $t7 (correspond to temporary variables) Will explain other 16 register names later • In general, use names to make your code more readable ECE 15B Spring 2010 MIPS Syntax • Instruction Syntax: [Label:] Op-code [oper. 1], [oper. 2], [oper.3], [#comment] (0) (1) (2) (3) (4) (5) – Where 1) operation name 2,3,4) operands 5) comments 0) label field is optional, will discuss later – For arithmetic and logic instruction 2) operand getting result (“destination”) 3) 1st operand for operation (“source 1”) 4) 2nd operand for operation (source 2” • Syntax is rigid – 1 operator, 3 operands – Why? Keep hardware simple via regularity ECE 15B Spring 2010 Addition and Subtraction of Integers • Addition in assembly – Example: add $s0, $s1, $s2 (in MIPS) • Equivalent to: a = b + c (in C) • Where MIPS registers $s0, $s1, $s2 are associated with C variables a, b, c • Subtraction in Assembly – Example Sub $s3, $s4, S5 (in MIPS) • Equivalent to: d = e - f (in C) • Where MIPS registers $s3, $s4, $s5 are associated with C variables d, e, f ECE 15B Spring 2010 Addition and Subtraction of Integers • How do we do this? f = (g + h) – (i + j) Use intermediate temporary registers add $t0, $s1, $s2 #temp = g + h add $t1, $s3, $s4 #temp = I + j sub $s0, $t0, $t1 #f = (g+h)-(i+j) ECE 15B Spring 2010 Immediates • Immediates are numerical constants • They appear often in code, so there are special instructions for them • Add immediate: addi $s0, $s1, 10 # f= g + 10 (in C) – Where MIPS registers $s0 and $s1 are associated with C variables f and g – Syntax similar to add instruction, except that last argument is a number instead of register ECE 15B Spring 2010 Load and Store Instructions ECE 15B Spring 2010 CPU Overview ECE 15B Spring 2010 … with muxes Can’t just join wires together ECE 15B Spring 2010 Use multiplexers … with muxes ECE 15B Spring 2010 Memory Operands • Main memory used for composite data – Arrays, structures, dynamic data • To apply arithmetic operations – Load values from memory into registers – Store result from register to memory • Memory is byte addressed – Each address identifies an 8-bit byte • Words are aligned in memory – Address must be a multiple of 4 • MIPS is Big Endian – Most-significant byte at least address of a word – c.f. Little Endian: least-significant byte at least address ECE 15B Spring 2010 Data Transfer: Memory to Register • MIPS load Instruction Syntax lw register#, offset(register#) (1) (2) (3) (4) Where 1) 2) 3) 4) operation name register that will receive value numerical offset in bytes register containing pointer to memory lw – meaning Load Word 32 bits or one word are loaded at a time ECE 15B Spring 2010 Data Transfer: Register to Memory • MIPS store Instruction Syntax sw register#, offset(register#) (1) (2) (3) (4) Where 1) 2) 3) 4) operation name register that will be written in memory numerical offset in bytes register containing pointer to memory sw – meaning Store Word 32 bits or one word are stored at a time ECE 15B Spring 2010 Memory Operand Example 1 • C code: g = h + A[8]; – g in $s1, h in $s2, base address of A in $s3 • Compiled MIPS code: – Index 8 requires offset of 32 • 4 bytes per word lw $t0, 32($s3) add $s1, $s2, $t0 offset # load word base register ECE 15B Spring 2010 Memory Operand Example 2 • C code: A[12] = h + A[8]; – h in $s2, base address of A in $s3 • Compiled MIPS code: – Index 8 requires offset of 32 lw $t0, 32($s3) # load word add $t0, $s2, $t0 sw $t0, 48($s3) # store word ECE 15B Spring 2010 Registers vs. Memory • Registers are faster to access than memory • Operating on memory data requires loads and stores – More instructions to be executed • Compiler must use registers for variables as much as possible – Only spill to memory for less frequently used variables – Register optimization is important! ECE 15B Spring 2010 Byte/Halfword Operations • MIPS byte/halfword load/store – String processing is a common case lb rt, offset(rs) lh rt, offset(rs) – Sign extend to 32 bits in rt lbu rt, offset(rs) lhu rt, offset(rs) – Zero extend to 32 bits in rt sb rt, offset(rs) sh rt, offset(rs) – Store just rightmost byte/halfword Why do we need them? characters and multimedia data are expressed by less than 32 bits; having dedicated 8 and 16 bits load and store instructions results in faster operation ECE 15B Spring 2010 Two’s Compliment Representation Multiply and Divide ECE 15B Spring 2010 Unsigned Binary Integers • Given an n-bit number n 1 x x n1 2 x n2 2 x1 2 x 0 2 1 Range: 0 to +2n – 1 Example n2 0000 0000 0000 0000 0000 0000 0000 10112 = 0 + … + 1×23 + 0×22 +1×21 +1×20 = 0 + … + 8 + 0 + 2 + 1 = 1110 Using 32 bits 0 to +4,294,967,295 ECE 15B Spring 2010 0 2s-Complement Signed Integers • Given an n-bit number n1 x x n1 2 x n2 2 x1 2 x 0 2 1 Range: –2n – 1 to +2n – 1 – 1 Example n2 1111 1111 1111 1111 1111 1111 1111 11002 = –1×231 + 1×230 + … + 1×22 +0×21 +0×20 = –2,147,483,648 + 2,147,483,644 = –410 Using 32 bits –2,147,483,648 to +2,147,483,647 ECE 15B Spring 2010 0 2s-Complement Signed Integers • Bit 31 is sign bit – 1 for negative numbers – 0 for non-negative numbers • –(–2n – 1) can’t be represented • Non-negative numbers have the same unsigned and 2s-complement representation • Some specific numbers – – – – 0: 0000 0000 … 0000 –1: 1111 1111 … 1111 Most-negative: 1000 0000 … 0000 Most-positive: 0111 1111 … 1111 ECE 15B Spring 2010 Signed Negation • Complement and add 1 – Complement means 1 → 0, 0 → 1 x x 1111...1112 1 x 1 x Example: negate +2 +2 = 0000 0000 … 00102 –2 = 1111 1111 … 11012 + 1 = 1111 1111 … 11102 ECE 15B Spring 2010 Sign Extension • Representing a number using more bits – Preserve the numeric value • In MIPS instruction set – addi: extend immediate value – lb, lh: extend loaded byte/halfword – beq, bne: extend the displacement • Replicate the sign bit to the left – c.f. unsigned values: extend with 0s • Examples: 8-bit to 16-bit – +2: 0000 0010 => 0000 0000 0000 0010 – –2: 1111 1110 => 1111 1111 1111 1110 ECE 15B Spring 2010 Integer Addition • Example: 7 + 6 ECE 15B Spring 2010 Integer Subtraction • Add negation of second operand • Example: 7 – 6 = 7 + (–6) +7: –6: +1: 0000 0000 … 0000 0111 1111 1111 … 1111 1010 0000 0000 … 0000 0001 ECE 15B Spring 2010 Multiplication • Start with long-multiplication approach multiplicand multiplier product 1000 × 1001 1000 0000 0000 1000 1001000 Length of product is the sum of operand lengths ECE 15B Spring 2010 Multiplication Hardware Initially 0 ECE 15B Spring 2010 Stopped here… will start next lecture from here ECE 15B Spring 2010 Optimized Multiplier • Perform steps in parallel: add/shift One cycle per partial-product addition That’s ok, if frequency of multiplications is low ECE 15B Spring 2010 Faster Multiplier • Uses multiple adders – Cost/performance tradeoff Can be pipelined Several multiplication performed in parallel ECE 15B Spring 2010 MIPS Multiplication • Two 32-bit registers for product – HI: most-significant 32 bits – LO: least-significant 32-bits • Instructions – mult rs, rt / multu rs, rt • 64-bit product in HI/LO – mfhi rd / mflo rd • Move from HI/LO to rd • Can test HI value to see if product overflows 32 bits – mul rd, rs, rt • Least-significant 32 bits of product –> rd ECE 15B Spring 2010 Division • Check for 0 divisor • Long division approach quotient dividend divisor 1001 1000 1001010 -1000 10 101 1010 -1000 10 remainder n-bit operands yield n-bit quotient and remainder – If divisor ≤ dividend bits • 1 bit in quotient, subtract – Otherwise • 0 bit in quotient, bring down next dividend bit • Restoring division – Do the subtract, and if remainder goes < 0, add divisor back • Signed division – Divide using absolute values – Adjust sign of quotient and remainder as required ECE 15B Spring 2010 Division Hardware Initially divisor in left half Initially dividend ECE 15B Spring 2010 Optimized Divider • One cycle per partial-remainder subtraction • Looks a lot like a multiplier! – Same hardware can be used for both ECE 15B Spring 2010 Faster Division • Can’t use parallel hardware as in multiplier – Subtraction is conditional on sign of remainder • Faster dividers (e.g. SRT devision) generate multiple quotient bits per step – Still require multiple steps ECE 15B Spring 2010 MIPS Division • Use HI/LO registers for result – HI: 32-bit remainder – LO: 32-bit quotient • Instructions – div rs, rt / divu rs, rt – No overflow or divide-by-0 checking • Software must perform checks if required – Use mfhi, mflo to access result ECE 15B Spring 2010 Conclusions • In MIPS assembly language – Register replace C variables – One instruction (simple operation) per line – Simpler is faster • Memory is byte-addressable, but lw and sw access one word at a time • A pointer (used by lw and sw) is just a memory address, so we can add to it or subtract from it (using offset) ECE 15B Spring 2010 Review • Instructions so far: add, addi, sub mult, div, mfhi, mflo, lw, sw, lb, lbu, lh, lhu • Registers so far C variables: $s0 - $s7 Temporary variables: $t0 - $t9 Zero: $zero ECE 15B Spring 2010