ECE 369 Chapter 3 ECE369 1 Multiplication • • More complicated than addition – Accomplished via shifting and addition More time and more area ECE369 2 Multiplication: Implementation Start Multiplier0 = 1 Multiplicand 1. Test Multiplier0 Multiplier0 = 0 Shift left 64 bits 1a. Add multiplicand to product and place the result in Product register Multiplier Shift right 64-bit ALU 32 bits 2. Shift the Multiplicand register left 1 bit Product Write Control test 3. Shift the Multiplier register right 1 bit 64 bits 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done ECE369 3 Example Multiplicand Shift left 64 bits Multiplier Shift right 64-bit ALU 32 bits Product Write Control test 64 bits ECE369 4 Second version Multiplicand Shift left 64 bits Start Multiplier Shift right 64-bit ALU Multiplier0 = 1 32 bits Product Write 1. Test Multiplier0 Multiplier0 = 0 Control test 64 bits 1a. Add multiplicand to the left half of the product and place the result in the left half of the Product register Multiplicand 2. Shift the Product register right 1 bit 32 bits Multiplier Shift right 32-bit ALU 3. Shift the Multiplier register right 1 bit 32 bits 32nd repetition? Product Shift right Write Control test No: < 32 repetitions Yes: 32 repetitions 64 bits Done ECE369 5 Example Multiplicand 32 bits Multiplier Shift right 32-bit ALU 32 bits Product Shift right Write Control test 64 bits ECE369 6 Final version Start Product0 = 1 1. Test Product0 Multiplicand Product0 = 0 32 bits 1a. Add multiplicand to the left half of the product and place the result in the left half of the Product register 32-bit ALU Product Shift right Write Control test 2. Shift the Product register right 1 bit 64 bits 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done ECE369 7 Example Multiplicand 32 bits 32-bit ALU Product Shift right Write Control test 64 bits ECE369 8 Division • • • • Even more complicated – Can be accomplished via shifting and addition/subtraction More time and more area Negative numbers: Even more difficult There are better techniques, we won’t look at them ECE369 9 Division (7÷2) ECE369 12 Improved Division ECE369 14 Number Systems • Fixed Point: Binary point of a real number in a certain position – Can treat real numbers as integers, do the addition or subtraction normally – Conversion 9.8125 to fixed point (4 binary digits) • Addition or division rule • Keep multiplying fraction by 2, anytime there is a carry out insert 1 otherwise insert 0 and then left shift (= 1001.1101) • Scientific notation: – 3.56*10^8 (not 35.6*10^7) – May have any number of fraction digits (floating) ECE369 15 Floating point (a brief look) • • • We need a way to represent – Numbers with fractions, e.g., 3.1416 – Very small numbers, e.g., 0.000000001 – Very large numbers, e.g., 3.15576 x 109 Representation: – Sign, exponent, fraction: (–1)sign x fraction x 2exponent – More bits for fraction gives more accuracy – More bits for exponent increases range IEEE 754 floating point standard: – single precision: 8 bit exponent, 23 bit fraction – double precision: 11 bit exponent, 52 bit fraction ECE369 16 IEEE 754 floating-point standard • • • 1.f x 2e 1.s1s2s3s4…. snx2e Leading “1” bit of significand is implicit • Exponent is “biased” to make sorting easier – All 0s is smallest exponent, all 1s is largest – Bias of 127 for single precision and 1023 for double precision If exponent bits are all 0s and if mantissa bits are all 0s, then zero If exponent bits are all 1s and if mantissa bits are all 0s, then +/- infinity 17 ECE369 Single Precision –summary: (–1)sign x (1+significand) x 2(exponent – bias) • Example: • 11/100 = 11/102= 0.11 = 1.1x10-1 –Decimal: -.75 = -3/4 = -3/22 –Binary: -.11 = -1.1 x 2-1 –IEEE single precision: 1 01111110 10000000000000000000000 –exponent-bias=-1 => exponent = 126 = 01111110 ECE369 18 Opposite Way Sign Exponent - 129 Fraction 0x2-1+1x2-2=0.25 ECE369 19 Floating point addition 1.610x10-1 + 9.999x101 0.01610x101 + 9.999x101 10.015x101 1.0015x102 1.002x102 ECE369 20 Floating point addition • Sign Exponent Fraction Sign Exponent Start Fraction 1. Compare the exponents of the two numbers. Shift the smaller number to the right until its 0 Small ALU exponent would match the larger exponent Exponent difference 2. Add the significands 1 0 1 0 1 3. Normalize the sum, either shifting right and incrementing the exponent or shifting left and decrementing the exponent Shift right Control Overflow or Big ALU Yes underflow? No 0 0 1 Exception 1 4. Round the significand to the appropriate Increment or decrement number of bits Shift left or right No Still normalized? Rounding hardware Yes Sign Exponent Fraction Done ECE369 21 Add 0.510 and -0.437510 ECE369 22 Multiplication ECE369 23 Floating point multiply • To multiply two numbers – Add the two exponent (remember access 127 notation) – Produce the result sign as exor of two signs – Multiply significand portions – Results will be 1x.xxxxx… or 01.xxxx…. – In the first case shift result right and adjust exponent – Round off the result – This may require another normalization step ECE369 24 Multiplication 0.510 and -0.437510 ECE369 25 Floating point divide • To divide two numbers – Subtract divisor’s exponent from the dividend’s exponent (remember access 127 notation) – Produce the result sign as exor of two signs – Divide dividend’s significand by divisor’s significand portions – Results will be 1.xxxxx… or 0.1xxxx…. – In the second case shift result left and adjust exponent – Round off the result – This may require another normalization step ECE369 26 Floating point complexities • • • • • Operations are somewhat more complicated (see text) In addition to overflow we can have “underflow” Accuracy can be a big problem – IEEE 754 keeps two extra bits, guard and round – Four rounding modes – Positive divided by zero yields “infinity” – Zero divide by zero yields “not a number” – Other complexities Implementing the standard can be tricky Not using the standard can be even worse – See text for description of 80x86 and Pentium bug! ECE369 27 Lets Build a Processor, Introduction to Instruction Set Architecture • • First Step Into Your Project !!! How could we build a 1-bit ALU for add, and, or? • Need to support the set-on-less-than instruction (slt) – slt is an arithmetic instruction – produces a 1 if a < b and 0 otherwise – use subtraction: (a-b) < 0 implies a < b • Need to support test for equality (beq $t5, $t6, Label)operation – use subtraction: (a-b) = 0 implies a = b a • How could we build a 32-bit ALU? 32 ALU result 32 b Must Read Appendix 32 ECE369 28 One-bit adder • • Takes three input bits and generates two output bits Multiple bits can be cascaded cout = a.b + a.cin + b.cin sum = a <xor> b <xor> cin ECE369 29 Building a 32 bit ALU CarryIn Operation Operation CarryIn a0 b0 CarryIn ALU0 Result0 CarryOut a 0 a1 1 2 b Result b1 CarryIn ALU1 Result1 CarryOut a2 b2 CarryIn ALU2 Result2 CarryOut CarryOut a31 b31 ECE369 CarryIn ALU31 Result31 30 What about subtraction (a – b) ? • • Two's complement approach: just negate b and add. How do we negate? Operation Binvert CarryIn • Operation CarryIn A very clever solution: a 000 = and 001 = or 010 = add 110 = subtract a 0 0 1 1 Result 0 b 2 b Result 2 1 CarryOut ECE369 CarryOut 31 Supporting Slt • Can we figure out the idea? 000 = and 001 = or 010 = add 110 = subtract 111 = slt ECE369 32 Test for equality • Notice control lines Bnegate Operation 000 = and 001 = or 010 = add 110 = subtract 111 = slt • a0 b0 CarryIn ALU0 Less CarryOut Result0 a1 b1 0 CarryIn ALU1 Less CarryOut Result1 a2 b2 0 CarryIn ALU2 Less CarryOut Result2 Zero Note: Zero is a 1 if result is zero! a31 b31 0 ECE369 CarryIn ALU31 Less Result31 Set Overflow 33 How about “a nor b” 000 = and 001 = or 010 = add 110 = subtract 111 = slt ECE369 34 Big Picture ECE369 35 Conclusion • We can build an ALU to support an instruction set – key idea: use multiplexor to select the output we want – we can efficiently perform subtraction using two’s complement – we can replicate a 1-bit ALU to produce a 32-bit ALU • Important points about hardware – all of the gates are always working – speed of a gate is affected by the number of inputs to the gate – speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) • Our primary focus: comprehension, however, • – Clever changes to organization can improve performance (similar to using better algorithms in software) How about my instruction smt (set if more than)??? ECE369 36 ALU Summary • • • • We can build an ALU to support addition Our focus is on comprehension, not performance Real processors use more sophisticated techniques for arithmetic Where performance is not critical, hardware description languages allow designers to completely automate the creation of hardware! ECE369 37 Optional Reading ECE369 38 Overflow ECE369 39 Formulation ECE369 40 A Simpler Formula ? ECE369 41 Problem: Ripple carry adder is slow! • • Is a 32-bit ALU as fast as a 1-bit ALU? Is there more than one way to do addition? • Can you see the ripple? How could you get rid of it? c1 = a0b0 + a0c0 + b0c0 c2 = a1b1 + a1c1 + b1c1 c3 = a2b2 + a2c2 + b2c2 c4 = a3b3 + a3c3 + b3c3 • c2 = c3 = c4 = Not feasible! Why? ECE369 42 Carry Bit cout ab acin bcin ci 1 aibi ai ci bi ci c1 a0b0 a0c0 b0c0 c2 a1b1 a1c1 b1c1 a1b1 a1 a0b0 a0c0 b0c0 b1 a0b0 a0c0 b0c0 a1b1 a1a0b0 a1a0c0 a1b0c0 b1a0b0 b1a0c0 b1b0c0 c3 a2b2 a2c2 b2c2 ECE369 43 Generate/Propagate c1 a0b0 a0c0 b0 c0 c1 a0b0 (a0 b0 )c0 c2 a1b1 a1c1 b1c1 a1b1 (a1 b1 )c1 a1b1 (a1 b1 )a0b0 (a0 b0 )c0 common {ai bi , ai bi } ai bi ci+1 ai bi ci+1 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 1 0 1 1 1 1 1 1 1 generate ai bi propagate ai bi ECE369 44 Generate/Propagate (Ctd.) generate ai bi propagate ai bi c1 a0b0 (a0 b0 )c0 g 0 p0 c0 c2 a1b1 (a1 b1 )c1 g1 p1 ( g 0 p0 c0 ) g1 p1 g 0 p1 p0 c0 ci 1 g i pi ci ECE369 45 Carry-look-ahead adder • Motivation: – If we didn't know the value of carry-in, what could we do? – When would we always generate a carry? gi = ai . bi – When would we propagate the carry? pi = ai + bi • Did we get rid of the ripple? a3 a2 a1 a0 b3 b2 b1 b0 c1 = g0 + p0c0 c2 = g1 + p1c1 c2 = g1 + p1g0 + p1p0c0 c3 = g2 + p2c2 c3 = g2 + p2g1 + p2p1g0 + p2p1p0c0 c4 = g3 + p3c3 c4 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3p2p1p0c0 • Feasible! Why? c1 = a0b0 + a0c0 + b0c0 c2 = a1b1 + a1c1 + b1c1 c3 = a2b2 + a2c2 + b2c2 c4 = a3b3 + a3c3 + b3c3 c2 = c3 = c4 = ECE369 46 A 4-bit carry look-ahead adder • Generate g and p term for each bit • Use g’s, p’s and carry in to generate all C’s • Also use them to generate block G and P • CLA principle can be used recursively ECE369 47 16 Bit CLA ECE369 48 Gate Delay for 16 bit Adder generate ai bi 1 propagate ai bi 1+2 1+2+2 ECE369 49 64-bit carry lookahead adder ECE369 50