EI 209 Computer Organization Fall 2015 Chapter 3: Arithmetic for Computers Haojin Zhu (http://tdt.sjtu.edu.cn/~hjzhu/ ) [Adapted from Computer Organization and Design, 4th Edition, Patterson & Hennessy, © 2012, MK] EI 209 Chapter 3.1 CSE, 2015 Review: MIPS (RISC) Design Principles Simplicity favors regularity Smaller is faster limited instruction set limited number of registers in register file limited number of addressing modes Make the common case fast fixed size instructions small number of instruction formats opcode always the first 6 bits arithmetic operands from the register file (load-store machine) allow instructions to contain immediate operands Good design demands good compromises three instruction formats EI 209 Chapter 3.2 CSE, 2015 Specifying Branch Destinations Use a register (like in lw and sw) added to the 16-bit offset which register? Instruction Address Register (the PC) - its use is automatically implied by instruction - PC gets updated (PC+4) during the fetch cycle so that it holds the address of the next instruction limits the branch distance to -215 to +215-1 (word) instructions from the (instruction after the) branch instruction, but most branches are local anyway from the low order 16 bits of the branch instruction 16 offset sign-extend 00 32 32 Add PC 32 EI 209 Chapter 3.3 32 4 32 Add 32 branch dst address 32 ? CSE, 2015 Other Control Flow Instructions MIPS also has an unconditional branch instruction or jump instruction: j label #go to label Instruction Format (J Format): 0x02 26-bit address from the low order 26 bits of the jump instruction 26 Why shift left by two bits? 00 32 4 PC EI 209 Chapter 3.4 32 CSE, 2015 Review: MIPS Addressing Modes Illustrated 1. Register addressing op rs rt rd funct Register word operand 2. Base (displacement) addressing op rs rt offset Memory word or byte operand base register 3. Immediate addressing op rs rt operand 4. PC-relative addressing op rs rt offset Memory branch destination instruction Program Counter (PC) 5. Pseudo-direct addressing op Memory jump address || jump destination instruction Program Counter (PC) EI 209 Chapter 3.5 CSE, 2015 Number Representations 32-bit signed numbers (2’s complement): 0000 0000 0000 0000 0000 0000 0000 0000two = 0ten 0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten ... 0111 0111 1000 1000 ... MSB 1111 1111 0000 0000 1111 1111 0000 0000 1111 1111 0000 0000 1111 1111 0000 0000 1111 1111 0000 0000 1111 1111 0000 0000 1110two 1111two 0000two 0001two = = = = + + – – maxint 2,147,483,646ten 2,147,483,647ten 2,147,483,648ten 2,147,483,647ten 1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten 1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten minint LSB Converting <32-bit values into 32-bit values copy the most significant bit (the sign bit) into the “empty” bits 0010 -> 0000 0010 1010 -> 1111 1010 sign extend EI 209 Chapter 3.6 versus zero extend (lb vs. lbu) CSE, 2015 MIPS Arithmetic Logic Unit (ALU) zero ovf Must support the Arithmetic/Logic operations of the ISA add, addi, addiu, addu 1 1 A 32 sub, subu ALU mult, multu, div, divu sqrt result 32 B 32 and, andi, nor, or, ori, xor, xori 4 m (operation) beq, bne, slt, slti, sltiu, sltu With special handling for sign extend – addi, addiu, slti, sltiu zero extend – andi, ori, xori overflow detection – add, addi, sub EI 209 Chapter 3.7 CSE, 2015 Dealing with Overflow Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a value bit of the result and not the proper sign bit When adding operands with different signs or when subtracting operands with the same sign, overflow can never occur Operation Operand A Operand B Result indicating overflow A+B ≥0 ≥0 <0 A+B <0 <0 ≥0 A-B ≥0 <0 <0 A-B <0 ≥0 ≥0 MIPS signals overflow with an exception (aka interrupt) – an unscheduled procedure call where the EPC contains the address of the instruction that caused the exception EI 209 Chapter 3.8 CSE, 2015 Addition & Subtraction Just like in grade school (carry/borrow 1s) 0111 0111 0110 + 0110 - 0110 - 0101 1101 0001 Two's complement operations are easy do subtraction by negating and then adding 0111 - 0110 0001 0001 0111 + 1010 1 0001 Overflow (result too large for finite computer word) e.g., adding two n-bit numbers does not yield an n-bit number 0111 + 0001 1000 EI 209 Chapter 3.10 CSE, 2015 Building a 1-bit Binary Adder carry_in A 1 bit Full Adder B carry_out S A B carry_in carry_out S 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 S = A xor B xor carry_in carry_out = A&B | A&carry_in | B&carry_in (majority function) How can we use it to build a 32-bit adder? How can we modify it easily to build an adder/subtractor? EI 209 Chapter 3.11 CSE, 2015 Building 32-bit Adder c0=carry_in A0 B0 A1 B1 A2 S0 1-bit FA c2 S1 1-bit FA c3 S2 Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect . . . Ripple Carry Adder (RCA) advantage: simple logic, so small (low cost) disadvantage: slow and lots of glitching (so lots of energy consumption) ... B2 1-bit FA c1 c31 A31 B31 1-bit FA S31 c32=carry_out EI 209 Chapter 3.12 CSE, 2015 A 32-bit Ripple Carry Adder/Subtractor Remember 2’s complement is just complement all the bits control (0=add,1=sub) B0 B0 if control = 0 !B0 if control = 1 add a 1 in the least significant bit A 0111 B - 0110 0001 EI 209 Chapter 3.14 0111 + 1001 1 1 0001 c0=carry_in A0 1-bit FA c1 S0 A1 1-bit FA c2 S1 A2 1-bit FA c3 S2 B0 B1 B2 ... add/sub c31 A31 B31 1-bit FA S31 c32=carry_out CSE, 2015 Overflow Detection Logic Carry into MSB ! = Carry out of MSB For a N-bit ALU: Overflow = CarryIn [N-1] XOR CarryOut [N-1] CarryIn0 A0 1-bit ALU B0 CarryIn1 A1 CarryOut0 1-bit ALU B1 CarryIn2 A2 Result1 CarryOut1 1-bit ALU B2 Result0 X Y X XOR Y 0 0 1 1 0 1 0 1 0 1 1 0 why? Result2 CarryIn3 CarryOut2 A3 B3 1-bit ALU Result3 Overflow CarryOut3 EI 209 Chapter 3.15 CSE, 2015 Multiply Binary multiplication is just a bunch of right shifts and adds n multiplicand multiplier partial product array n can be formed in parallel and added in parallel for faster multiplication double precision product 2n EI 209 Chapter 3.16 CSE, 2015 Multiplication More complicated than addition Can be accomplished via shifting and adding 0010 (multiplicand) x_1011 (multiplier) 0010 0010 (partial product 0000 array) 0010 00010110 (product) In every step • multiplicand is shifted • next bit of multiplier is examined (also a shifting step) • if this bit is 1, shifted multiplicand is added to the product EI 209 Chapter 3.17 CSE, 2015 Multiplication Algorithm 1 In every step • multiplicand is shifted • next bit of multiplier is examined (also a shifting step) • if this bit is 1, shifted multiplicand is added to the product EI 209 Chapter 3.18 CSE, 2015 EI 209 Chapter 3.19 CSE, 2015 Comments on Multiplicand Algorithm 1 Performance Three basic steps for each bit It requires 100 clock cycles to multiply two 32-bit numbers If each step took a clock cycle, How to improve it? Motivation (Performing the operations in parallel): Putting multiplier and the product together Shift them together EI 209 Chapter 3.20 CSE, 2015 Refined Multiplicand Algorithm 2 multiplicand add 32-bit ALU product multiplier shift right Control • 32-bit ALU and multiplicand is untouched • the sum keeps shifting right • at every step, number of bits in product + multiplier = 64, hence, they share a single 64-bit register EI 209 Chapter 3.21 CSE, 2015 Add and Right Shift Multiplier Hardware 0110 =6 multiplicand add 32-bit ALU product shift right multiplier add add add add EI 209 Chapter 3.22 0000 0110 0011 0011 0001 0111 0011 0011 0001 0101 0101 0010 0010 1001 1001 1100 1100 1110 Control =5 = 30 CSE, 2015 Exercise Using 4-bit numbers to save space, multiply 2ten*3ten, or 0010two * 0011two EI 209 Chapter 3.23 CSE, 2015 Division Division is just a bunch of quotient digit guesses and left shifts and subtracts dividend = quotient x divisor + remainder n quotient n 0 0 0 dividend divisor 0 partial remainder array 0 0 remainder n EI 209 Chapter 3.24 CSE, 2015 Division Divisor 1000ten 1001ten | 1001010ten -1000 10 101 1010 -1000 10ten Quotient Dividend Remainder At every step, • shift divisor right and compare it with current dividend • if divisor is larger, shift 0 as the next bit of the quotient • if divisor is smaller, subtract to get new dividend and shift 1 as the next bit of the quotient EI 209 Chapter 3.25 CSE, 2015 First Version of Hardware for Division A comparison requires a subtract; the sign of the result is examined; if the result is negative, the divisor must be added back 26 EI 209 Chapter 3.26 CSE, 2015 Divide Algorithm Start 1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. Remainder >=0 2a. Shift the Quotient register to the left setting the new rightmost bit to 1. Test Remainder Remainder < 0 2b. Restore the original value by adding the Divisor reg to the Remainder reg and place the sum in the Remainder reg. Also shift the Quotient register to the left, setting the new LSB to 0 3. Shift the Divisor register right1 bit. 33rd repetition? No: < 33repetitions Yes: 33repetitions Done EI 209 Chapter 3.27 CSE, 2015 Divide Example • Divide 7ten (0000 0111two) by 2ten (0010two) Iter 0 Step Quot Divisor Remainder Initial values 1 2 3 4 5 28 EI 209 Chapter 3.28 CSE, 2015 Divide Example • Divide 7ten (0000 0111two) by 2ten (0010two) Iter Step Quot Divisor Remainder 0 Initial values 0000 0010 0000 0000 0111 1 Rem = Rem – Div Rem < 0 +Div, shift 0 into Q Shift Div right 0000 0000 0000 0010 0000 0010 0000 0001 0000 1110 0111 0000 0111 0000 0111 2 Same steps as 1 0000 0000 0000 0001 0000 0001 0000 0000 1000 1111 0111 0000 0111 0000 0111 3 Same steps as 1 0000 0000 0100 0000 0111 4 Rem = Rem – Div Rem >= 0 shift 1 into Q Shift Div right 0000 0001 0001 0000 0100 0000 0100 0000 0010 0000 0011 0000 0011 0000 0011 5 Same steps as 4 0011 0000 0001 0000 0001 EI 209 Chapter 3.29 CSE, 2015 Efficient Division Shift Right Divisor 64 bits Shift Left Quotient 32 bits 64-bit ALU Remainder 64 bits Write Control 30 divisor subtract 32-bit ALU dividend remainder EI 209 Chapter 3.30 quotient shift left Control CSE, 2015 Left Shift and Subtract Division Hardware 0010 =2 divisor subtract 32-bit ALU dividend remainder sub sub sub sub EI 209 Chapter 3.31 0000 0000 1110 0000 0001 1111 0001 0011 0001 0010 0000 quotient shift left Control 0110 =6 1100 1100 rem neg, so ‘ient bit = 0 1100 restore remainder 1000 1100 rem neg, so ‘ient bit = 0 1000 restore remainder 0000 rem pos, so ‘ient bit = 1 0001 0010 rem pos, so ‘ient bit = 1 0011 = 3 with 0 remainder CSE, 2015 Restoring Unsigned Integer Division s(0) = z the remainder shift left by 1 bit K=32, put divisor in the left 32 bit register for j = 1 to k if 2 s(j-1) - 2k d > 0 qk-j = 1 s(j) = 2 s(j-1) - 2k d else qk-j = 0 s(j) = 2 s(j-1) No need to restore the remainder in the case of R-D>0, Restore the remainder In the case of R-D<0, 32 EI 209 Chapter 3.32 CSE, 2015 Non-Restoring Unsigned Integer Division If in the last step, remainder –divisor >0, Perform subtraction why? s(1) = 2 z - 2k d for j = 2 to k if s(j-1) 0 qk-(j-1) = 1 s(j) = 2 s(j-1) - 2k d else qk-(j-1) = 0 s(j) = 2 s(j-1) + 2k d end for if s(k) 0 q0 = 1 else q0 = 0 If in the last step, remainder –divisor <0, Perform addition Correction step EI 209 Chapter 3.33 CSE, 2015 s(0) =z for j = 1 to k if 2 s(j-1) - 2k d > 0 qk-j = 1 s(j) = 2 s(j-1) - 2k d else qk-j = 0 s(j) = 2 s(j-1) Restoring Unsigned Integer Division s(1) = 2 z - 2k d for j = 2 to k if s(j-1) 0 qk-(j-1) = 1 equal s(j) = 2 s(j-1) - 2k d else qk-(j-1) = 0 Why? s(j) = 2 s(j-1) + 2k d end for if s(k) 0 q0 = 1 else q0 = 0 Correction step Non-Restoring Unsigned Integer Division 2x-y= 2(x-y)+y EI 209 Chapter 3.34 considering two consequent steps j-1 and j, in particular 2s(j-2) - 2k d <0 In the j-1 step, Restoring Algorithm computes qk-j = 0 s(j-1) = 2 s(j-2) In the subsequent j step, Restoring Algorithm computes 2 s(j-1) - 2k d == 2*2 s(j-2) - 2k d Non-Restoring Algorithm s(j-1) = 2 s(j-2) - 2k d In the subsequent j step, nonRestoring Algorithm computes 2 s(j-1) + 2k d = 2*2 s(j-2) - 2*2k d +2k d = 2*2 s(j-2) - 2k d CSE, 2015 Non-restoring algorithm set subtract_bit true 1: If subtract bit true: Subtract the Divisor register from the Remainder and place the result in the remainder register else Add the Divisor register to the Remainder and place the result in the remainder register 2:If Remainder >= 0 Shift the Quotient register to the left, setting rightmost bit to 1 else Set subtract bit to false 3: Shift the Divisor register right 1 bit if < 33rd rep goto 1 else Add Divisor register to remainder and place in Remainder register exit EI 209 Chapter 3.35 CSE, 2015 Example: Perform n + 1 iterations for n bits Remainder 0000 1011 Divisor 00110000 ----------------------------------Iteration 1: (subtract) Rem 1101 1011 Quotient 0 Divisor 0001 1000 ----------------------------------Iteration 2: (add) Rem 11110011 Q00 Divisor 0000 1100 ----------------------------------Iteration 3: (add) Rem 11111111 Q000 Divisor 0000 0110 EI 209 Chapter 3.36 ----------------------------------Iteration 4: (add) Rem 0000 0101 Q0001 Divisor 0000 0011 ----------------------------------Iteration 5: (subtract) Rem 0000 0010 Q 00011 Divisor 0000 0001 Since reminder is positive, done. Q = 0011 and Rem = 0010 CSE, 2015 Exercise Calculate A divided by B using restoring and non-restoring division. A=26, B=5 EI 209 Chapter 3.37 CSE, 2015 MIPS Divide Instruction Divide (div and divu) generates the reminder in hi and the quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1 0 16 17 0 0 0x1A Instructions mfhi rd and mflo rd are provided to move the quotient and reminder to (user accessible) registers in the register file As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0. EI 209 Chapter 3.38 CSE, 2015 Lecture 1 EI 209 Chapter 3.39 CSE, 2015