ELECT 90X Programmable Logic Circuits: Computer Arithmetic: Introduction Dr. Eng. Amr T. Abdel-Hamid Slides based on slides prepared by: • B. Parhami, Computer Arithmetic: Algorithms and Hardware Design, Oxford University Press, 2000. • I. Koren, Computer Arithmetic Algorithms, 2nd Edition, A.K. P eters, Natick, MA, 2002. Fall 2009 What is Computer Arithmetic? Programmable Logic Circuits Pentium Division Bug (1994-95): Pentium’s radix-4 SRT algorithm occasionally gave incorrect quotient First noted in 1994 by T. Nicely who computed sums of re ciprocals of twin primes: 1/5 + 1/7 + 1/11 + 1/13 + . . . + 1/p + 1/(p + 2) + . . . Worst-case example of division error in Pentium: Dr. Amr Talaat c = 4 195 835 = 3 145 727 1.333 820 44... 1.333 739 06... Correct quotient circa 1994 Pentium double FLP value; accurate to only 14 bits (worse than single!) ELECT 90X A Motivating Example Programmable Logic Circuits Dr. Amr Talaat Using a calculator with √, x2, and xy functions, compute: u = √√ … √ 2 = 1.000 677 131 “1024th root of 2” v = 21/1024 = 1.000 677 131 Save u and v; If you ca n’t save, recompute values when needed x = (((u2)2)...)2 = 1.999 999 963 x' = u1024 = 1.999 999 973 y = (((v2)2)...)2 = 1.999 999 983 y' = v1024 = 1.999 999 994 Perhaps v and u are not really the same value w = v – u = 1 10–11 Nonzero due to hidden digits (u – 1) 1000 =0.677 130 680 [Hidden ... (0) 68] (v – 1) 1000 =0.677 130 690 [Hidden ... (0) 69] ELECT 90X Finite Range Can Lead to Disaster Programmable Logic Circuits Example: Explosion of Ariane Rocket (1996 J une 4) Dr. Amr Talaat Unmanned Ariane 5 rocket of the European Space Agency v eered off its flight path, broke up, and exploded only 30 s after lift-off (altitude of 3700 m) The $500 million rocket (with cargo) was on its first voyage after a decade of development costing $7 billion Cause: “software error in the inertial reference system” Problem specifics: A 64 bit floating point number relating to the horizontal velocity of the rocket was being convert ed to a 16 bit signed integer An SRI* software exception arose during conversion becaus e the 64-bit floating point number had a value greater th an what could be represented by a 16-bit signed integer (max 32 767) *SRI = Inertial Reference System ELECT 90X Encoding Numbers in 4 Bits 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 16 Programmable Logic Circuits Number format Unsigned integers Signed-magnitude 3 + 1 fixed-point, xxx.x Signed fraction, .xxx 2’s-compl. fraction, x.xxx Dr. Amr Talaat 2 + 2 floating-point, s 2 e e in [2, 1], s in [0, 3] e 2 + 2 logarithmic (log = xx.xx) log x s Some of the possible ways of assigning 16 distinct codes to represent n umbers. ELECT 90X The Binary Number System Programmable Logic Circuits In conventional digital computers - integers repr esented as binary numbers of fixed length n An ordered sequence of bi nary digits Each digit x (bit) is 0 or 1 i The above sequence represents the integer value X Dr. Amr Talaat Upper case letters represent numerical values or s equences of digits Lower case letters, usually indexed, represent indi vidual digits ELECT 90X Radix of a Number System Programmable Logic Circuits Dr. Amr Talaat The weight of the digit xi is the i th power of 2 2 is the radix of the binary number system Binary numbers are radix-2 numbers allowed digits are 0,1 Decimal numbers are radix-10 numbers - allo wed digits are 0,1,2,…,9 Radix indicated in subscript as a decimal numb er Example: (101)10 - decimal value 101 (101) - decimal value 5 2 ELECT 90X Range of Representations Programmable Logic Circuits Dr. Amr Talaat Operands and results are stored in registers of fixed length n - finite number of distinct value s that can be represented within an arithmetic unit Xmin ; Xmax - smallest and largest representab le values [Xmin,Xmax] - range of the representable num bers A result larger then Xmax or smaller than Xmin - incorrectly represented The arithmetic unit should indicate that the ge nerated result is in error - an overflow indica tion ELECT 90X Example - Overflow in Binary System Unsigned integers with 5 binary digits (bits) Programmable Logic Circuits Xmax = (31)10 - represented by (11111)2 Dr. Amr Talaat Xmin = (0)10 - represented by (00000)2 Increasing Xmax by 1 = (32)10 =(100000)2 5-bit representation - only the last five digits retained yielding (00000)2 =(0)10 In general A number X not in the range [Xmin,Xmax]=[0,31] is represented by X mod 32 If X+Y exceeds Xmax - the result is S = (X+Y) mod 32 Example: X 10001 17 +Y 10010 18 1 00011 3 = 35 mod 32 Result has to be stored in a 5-bit register - the most signif icant bit (with weight 2 =32) is discarded 5 ELECT 90X Fixed Radix Systems Programmable Logic Circuits r - the radix of the number system Conventional number systems are also called fix ed-radix systems With no redundancy - 0 xi r-1 xi r introduces redundancy into the fixed-radix number system ?? HOW? If xi r is allowed - Dr. Amr Talaat two machine representations for the same value -(...,xi+1,xi,... ) and (...,xi+1+1,xi-r,... ) ELECT 90X Representation of Mixed Numbers Programmable Logic Circuits A sequence of n digits in a register - not necessa rily representing an integer Can represent a mixed number with a fractional part and an integral part The n digits are partitioned into two - k in the in tegral part and m in the fractional part (k+m=n) The value of an n-tuple with a radix point betwee n the k most significant digits and the m least sig nificant digits Dr. Amr Talaat is ELECT 90X Fixed Point Representations Programmable Logic Circuits Dr. Amr Talaat Radix point not stored in register - understood to be in a fix ed position between the k most significant digits and the m least significant digits These are called fixed-point representations Programmer not restricted to the predetermined position of the radix point Operands can be scaled - same scaling for all operands Add and subtract operations are correct aX aY=a(X Y) (a - scaling factor) Corrections required for multiplication and division aX aY=a2 X Y ; aX/aY=X/Y Commonly used positions for the radix point rightmost side of the number (pure integers - m=0) leftmost side of the number (pure fractions - k=0) ELECT 90X ULP - Unit in Last Position Programmable Logic Circuits Given the length n of the operands, the weigh t -m r of the least significant digit indicates the position of the radix point Unit in the last position (ulp) - the weight of t he least significant digit ulp = r-m Dr. Amr Talaat This notation simplifies the discussion No need to distinguish between the different p artitions of numbers into fractional and integr al parts ELECT 90X Representation of Negative Numbers Programmable Logic Circuits Dr. Amr Talaat Fixed-point numbers in a radix r system Two ways of representing negative numbers: Sign and magnitude representation (or signedmagnitude representation) Complement representation with two alternative s Radix complement (two's complement in the binary system) Diminished-radix complement (one's comple ment in the binary system) ELECT 90X Signed-Magnitude Representation Programmable Logic Circuits Dr. Amr Talaat Sign and magnitude are represented separately First digit is the sign digit, remaining n-1 digits repre sent the magnitude Binary case - sign bit is 0 for positive, 1 for negative numbers Non-binary case - 0 and r-1 indicate positive and ne gative numbers n n-1 Only 2r out of the r possible sequences are utili zed Two representations for zero - positive and negative Inconvenient when implementing an arithmetic un it - when testing for zero, the two different repre sentations must be checked ELECT 90X Programmable Logic Circuits Disadvantage of the Signed-Magnitude Representation Dr. Amr Talaat Operation may depend on the signs of the operands Example - adding a positive number X and a negative num ber -Y : X+(-Y) If Y>X, final result is -(Y-X) Calculation switch order of operands perform subtraction rather than addition attach the minus sign A sequence of decisions must be made, costing excess con trol logic and execution time This is avoided in the complement representation methods ELECT 90X Programmable Logic Circuits Complement Representations of Negative Numbers Dr. Amr Talaat Two alternatives Radix complement (called two's complemen t in the binary system) Diminished-radix complement (called one's c omplement in the binary system) In both complement methods - positive numbe rs represented as in the signed-magnitude met hod A negative number -Y is represented by R-Y w here R is a constant This representation satisfies -(-Y )=Y since R -(R-Y)=Y ELECT 90X Programmable Logic Circuits Advantage of Complement Representation Dr. Amr Talaat No decisions made before executing addition o r subtraction Example: X-Y=X+(-Y) -Y is represented by R-Y Addition is performed by X+(R-Y) = R-(Y-X) If Y>X, -(Y-X) is already represented as R-(YX) No need to interchange the order of the two o perands ELECT 90X Two’s Complement Programmable Logic Circuits Dr. Amr Talaat 0 r=2, k=n=4, m=0, ulp=2 =1 Radix complement (called two's complement in the binary c 4 ase) of a number X = 2 - X It can instead be calculated by X+1 0000 to 0111 represent positive numbers 010 to 710 The two's complement of 0111 is 1000+1=1001 it represents the value (-7)10 The two's complement of 0000 is 1111+1=10000=0 mod 24 - single representation of zero Each positive number has a corresponding negative number that starts with a 1 1000 representing (-8)10 has no corresponding positive num ber Range of representable numbers is -8 X 7 ELECT 90X The Two’s Complement Representation Programmable Logic Circuits Dr. Amr Talaat ELECT 90X Example - Addition in Two’s complement Programmable Logic Circuits Dr. Amr Talaat Calculating X+(-Y) with Y>X - 3+(-5) 0011 3 + 1011 -5 1110 -2 Correct result represented in the two's comple ment method - no need for preliminary decision s or post corrections Calculating X+(-Y) with X>Y - 5+(-3) 0101 5 + 1101 -3 1 0010 2 Only the last four least significant digits are ret ained, yielding 0010 ELECT 90X One’s Complement in Binary System Programmable Logic Circuits r=2, k=n=4, m=0, ulp=2 =1 0 Diminished-radix complement (called one's com plement in the binary case) of a number X = Dr. Amr Talaat (2 - 1) - X = XAs before, the sequences 0000 to 0111 represen t the positive numbers 010 to 710 The one's complement of 0111 is 1000, represe nting (-7)10 The one's complement of zero is 1111 - two rep resentations of zero Range of representable numbers is -7 X 7 4 ELECT 90X Comparing the Three Representations in a Binary System Programmable Logic Circuits Dr. Amr Talaat ELECT 90X 5.1 Bit-Serial and Ripple-Carry Adders Programmable Logic Circuits Inputs Outputs x y c s ---------------0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 x y c HA s Half-adder (HA): Truth table and block diagram Dr. Amr Talaat Inputs Outputs x y cin cout s ---------------------0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 x cout y FA cin s Full-adder (FA): Truth table and block diagram ELECT 90X Half-Adder Implementations Programmable Logic Circuits x y c _ x _ y c x y x y s s (a) AND/XOR half-adder. _ c (b) NOR-gat e half-adder. x Dr. Amr Talaat s y (c) NAND -gat e half-adder wit h com plem ent ed carry. Three implementations of a half-adder. ELECT 90X Full-Adder Implementations y x y x Programmable Logic Circuits cout HA cout HA cin cin s ( a) Built o f h alf- adders. y x Mux cout 0 1 2 3 0 1 s Dr. Amr Talaat s 0 1 2 3 ( c) Suitable f or cin ( b) Built as an AND-O R circuit . Possible designs for a full-adder in terms of half-adders, logic gates, and CMOS CM OS realization . transmission gates. ELECT 90X Full-Adder Details Programmable Logic Circuits Logic equations for a full-adder: s = x y cin = x y cin x y cin x y cin x y cin cout = x y x cin y cin (odd parity function) (majority function) y P x0 TG z Dr. Amr Talaat N TG (a) CMOS transmission gate: circuit and symbol x1 TG (b) Two-input mux built of two transmission gates CMOS transmission gate and its use in a 2-to-1 mux. ELECT 90X Simple Adders Built of Full-Adders Programmable Logic Circuits y Using full-adders in building bit-serial and ripple-carry adders. x xi Shift Carry FF yi ci ci+1 FA Clock Shift si s (a) Bit-serial adder. x31 y31 c32 x1 c31 Dr. Amr Talaat FA . . . y1 c2 x0 y0 c1 FA c0 FA cout s32 cin s31 s1 s0 (b) Ripple-carry adder. ELECT 90X Critical Path Through a Ripple-Carry Adder Programmable Logic Circuits Dr. Amr Talaat sk Tripple-add = TFA(x,ycout) + (k – 2)TFA(cincout) + TFA(cins) xk–1 yk–1 ck xk-2 yk–2 ck–1 FA ck–2 FA x1 . . . y1 c2 x0 y0 c1 FA c0 FA cout cin sk–1 sk–2 s1 s0 Critical path in a k-bit ripple-carry adder. ELECT 90X Inputs Outputs Binary Adders as Versatile Building Blocks x y c c s in Programmable Logic Circuits Set one input to 0: Set one input to 1: Set one input to 0 and another to 1: Bit 3 0 1 w xyz c4 out ---------------------0 0 0 0 0 c = AND of other inputs out 0 0 1 0 1 0 1 0 0 1 c = OR of other inputs 0 out 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 s 1= NOT 0 of third 1 input 0 1 1 1 1 1 Bit 2 w 1 c3 w xyz Bit 1 z 0 c2 xyz x y FA cout cin s Bit 0 y x c1 xy c0 0 Dr. Amr Talaat (w xyz) Four-bit binary adder used to realize the logic function f = w + xyz and its complement. ELECT 90X Conditions and Exceptions Programmable Logic Circuits y0 x0 y1 x1 yk–1 xk–1 yk–2 xk–2 c k–1 c1 ck c k–2 c2 c0 ... FA c FA FA FA in cout Ov erflo w Neg ativ e Zero s k–1 s k–2 s1 s0 Dr. Amr Talaat Two’s-complement adder with provisions for detecting conditions and exceptions. overflow2’s-compl = ck ck–1 = ck ck–1 ck ck–1 ELECT 90X Manchester Carry Chains and Adders Programmable Logic Circuits Sum digit in radix r Special case of radix 2 si si = = (xi + yi + ci) mod r xi yi ci Computing the carries ci is thus our central problem For this, the actual operand digits are not important What matters is whether in a given position a carry is generated, For binary addition: gi = x i y i propagated, or pi = x i y i annihilated (absorbed) ai = xiyi = (xi yi) Dr. Amr Talaat It is also helpful to define a transfer signal: ti = gi pi = ai = xi yi Using these signals, the carry recurrence is written as ci+1 = gi ci pi = gi ci gi ci pi = gi ci ti ELECT 90X Carry Network is the Essence of a Fast Adder Programmable Logic Circuits gi pi Carry is: 0 0 1 1 annihilated or killed propagated generated (impossible) 0 1 0 1 g k1 p k1 xi g k2 p k2 yi gi = x i y i pi = x i y i g i+1 p i+1 gi pi ... ... g1 p1 g0 p0 c0 Carry network ck c k1 ... c k2 ci Dr. Amr Talaat c i+1 ... c1 c0 Ripple; Skip; Lookahead; Parallel-prefix si The main part of an adder is the carry network. The rest is just a set of gates to produce the g and p signals and the sum bits. ELECT 90X Ripple-Carry Adder Revisited Programmable Logic Circuits The carry recurrence: ci+1 = gi pi ci Latency of k-bit adder is roughly 2k gate delays: 1 gate delay for production of p and g signals, plus 2(k – 1) gate delays for carry propagation, plus 1 XOR gate delay for generation of the sum bits gk1 pk1 Dr. Amr Talaat ck gk2 pk2 g1 p1 ... ck1 ck2 c2 c1 g0 p0 c0 The carry propagation network of a ripple-carry adder. ELECT 90X The Complete Design of a Ripple-Carry Adder Programmable Logic Circuits gi pi Carry is: 0 0 1 1 annihilated or killed propagated generated (impossible) 0 1 0 1 g k1 p k1 gk1 pk1 ck ck xi g k2 p k2 c k1 gi = x i y i pi = x i y i g i+1 p i+1 gi pi ... ... gk2 pk2 g1 ... ck1 yi ck2 c2 p1 g0 gk2 pk2 c Carry network c c c 1 ck ... c k2 p0 pk1 gk1 ci Dr. Amr Talaat c i+1 g1 p1 ... 0 k1 g1 p1 k2 c2 c1 ... c1 g0 p0 g0 c0 p0 c0 c0 si ELECT 90X Unrolling the Carry Recurrence Programmable Logic Circuits Recall the generate, propagate, annihilate (absorb), and transfer signals: Signal gi pi ai ti si Radix r is 1 iff xi + yi r is 1 iff xi + yi = r – 1 is 1 iff xi + yi < r – 1 is 1 iff xi + yi r – 1 Binary x i yi x i yi xiyi = (xi yi) x i yi (xi + yi + ci) mod r x i yi ci The carry recurrence can be unrolled to obtain each carry signal directly from inputs, rather than through propagation Dr. Amr Talaat ci = gi–1 ci–1 pi–1 = gi–1 (gi–2 ci–2 pi–2) pi–1 = gi–1 gi–2 pi–1 ci–2 pi–2 pi–1 = gi–1 gi–2 pi–1 gi–3 pi–2 pi–1 ci–3 pi–3 pi–2 pi–1 = gi–1 gi–2 pi–1 gi–3 pi–2 pi–1 gi–4 pi–3 pi–2 pi–1 ci–4 pi–4 pi–3 pi–2 pi–1 =... ELECT 90X Full Carry Lookahead Programmable Logic Circuits x3 y3 x2 y2 x1 y1 x0 y0 cin ... s3 s2 s1 s0 Dr. Amr Talaat Theoretically, it is possible to derive each sum digit directly from the inputs that affect it Carry-lookahead adder design is simply a way of reducing the complexity of this ideal, but impractical, arrangement by hardware sharing among the various lookahead circuits ELECT 90X Four-Bit Carry-Lookahead Adder c4 Programmable Logic Circuits Complexity reduced by deriving the carry-out indirectly p3 g3 c3 p2 g2 Full carry lookahead is quite practical for a 4-bit adder Dr. Amr Talaat c1 c2 c3 c4 = = = = p1 c2 g0 c 0 p0 g1 g0 p1 c 0 p0 p 1 g2 g1 p2 g0 p1 p 2 c 0 p0 p1 p2 c g3 g2 p3 g1 p2 p 3 g0 p1 p2 p3 1 c 0 p0 p1 p2 p3 g1 p0 g0 c0 Four-bit carry network with full lookahead. ELECT 90X Carry Lookahead Beyond 4 Bits Programmable Logic Circuits Consider a 32-bit adder c 1 = g 0 c 0 p0 c 2 = g1 g0 p1 c 0 p0 p1 c 3 = g2 g1 p2 g0 p1 p2 c 0 p0 p1 p2 . . . 32-input AND c31 = g30 g29 p30 g28 p29 p30 g27 p28 p29 p30 . . . c0 p0 p1 p2 p3 ... p29 p30 Dr. Amr Talaat ... 32-input OR High fan-ins necessitate tree-structured circuits ELECT 90X Solutions to the Fan-in Problem Programmable Logic Circuits • Multilevel lookahead • Block Adders •High-radix addition (i.e., radix 2h) : Increases the latency for generating g and p signals and sum digits, but simplifies the carry network (optimal radix?) Example: 16-bit addition Radix-16 (four digits) Two-level carry lookahead (four 4-bit blocks) Dr. Amr Talaat Either way, the carries c4, c8, and c12 are determined first c16 c15 c14 c13 c12 c11 c10 c9 Cout ? cin c8 ? c7 c6 c5 c4 c 3 c2 c1 c0 ? ELECT 90X Block Ripple Adder Programmable Logic Circuits Dr. Amr Talaat ELECT 90X Larger Carry-Lookahead Adder Design Programmable Logic Circuits Block generate and propagate signals g [i,i+3] = gi+3 gi+2 pi+3 gi+1 pi+2 pi+3 gi pi+1 pi+2 pi+3 p [i,i+3] = pi pi+1 pi+2 pi+3 • If all 4 bits in a block propagate, the block propagates a carry. • If at least one of the 4 bits generates carry and it can be propagated to the MSB, the block generates a carry. ci+3 ci+2 ci+1 gi+3 p i+3 gi+2 pi+2 gi+1 pi+1 gi pi Dr. Amr Talaat 4-bit lookahead carry generator g[ i,i+3] ci p[ i,i+3] ELECT 90X A Building Block for Carry-Lookahead Addition Programmable Logic Circuits Four-bit lookahead carry generator. p [i,i+3] g [i,i+3] pi+3 c4 gi+3 Block Signal Generation Int ermediat e Carries p3 g3 ci+3 c3 Four-bit adder p2 pi+2 g2 gi+2 Dr. Amr Talaat p1 c2 pi+1 ci+2 gi+1 g1 p0 pi ci+1 c1 g0 c0 ci gi ELECT 90X Combining Block g and p Signals Programmable Logic Circuits Dr. Amr Talaat Combining of g and p signals of four blocks of arbitrary widths into the g and p signals for the overall block ELECT 90X A Two-Level Carry-Lookahead Adder Programmable Logic Circuits c16 c 32 c48 c12 c8 c4 g [12,15] p [12,15] g [8,11] p [8,11] g [4,7] p [4,7] c0 g [0,3] p [0,3] 4-bit lookahead carry generator g [48,63] p [48,63] g [32,47] p [32,47] g [16,31] p [16,31] g [0,15] p [0,15] 16-bit Carry-Lookahead Adder 4-bit lookahead carry generator Dr. Amr Talaat g [0,63] p [0,63] Building a 64-bit carry-lookahead adder from 16 4-bit adders and 5 lookahead carry generators. ELECT 90X Ling Adder and Related Designs Programmable Logic Circuits Consider the carry recurrence and its unrolling by 4 steps: ci = gi–1 ci–1 ti–1 = gi–1 gi–2 ti–1 gi–3 ti–2 ti–1 gi–4 ti–3 ti–2 ti–1 ci–4 ti–4 ti–3 ti–2 ti–1 Ling’s modification: Propagate hi = ci ci–1 instead of ci hi = gi–1 hi–1 ti–2 = gi–1 gi–2 gi–3 ti–2 gi–4 ti–3 ti–2 hi–4 ti–4 ti–3 ti–2 CLA: Ling: 5 gates 4 gates max 5 inputs max 5 inputs 19 gate inputs 14 gate inputs The advantage of hi over ci is even greater with wired-OR: Dr. Amr Talaat CLA: Ling: 4 gates 3 gates max 5 inputs max 4 inputs 14 gate inputs 9 gate inputs Once hi is known, however, the sum is obtained by a slightly more complex expression compared with si = pi ci si = (ti hi+1) hi gi ti–1 ELECT 90X Carry Determination as Prefix Computation Programmable Logic Circuits Blo ck B' g Blo ck B" j0 j1 p i0 i1 p (g", p") g" p" g g' (g', p') p' ¢ Dr. Amr Talaat g g = g" + g'p" p = p'p" p (g, p) Block B g p ELECT 90X Formulating the Prefix Computation Problem Programmable Logic Circuits The problem of carry determination can be formulated as: Given (g0, p0) (g1, p1) . . . (gk–2, pk–2) (gk–1, pk–1) Find (g [0,0] , p [0,0]) (g [0,1] , p [0,1]) . . . (g [0,k–2] , p [0,k–2]) (g [0,k–1] , p [0,k–1]) Dr. Amr Talaat Prefix sums analogy: Given x0 x1 Find x0 x0+x1 c1 c2 . . . ck–1 ck The desired pairs are found by evaluating all prefixes of (g0, p0) ¢ (g1, p1) ¢ . . . ¢ (gk–2, pk–2) ¢ (gk–1, pk–1) The carry operator ¢ is associative, but not commutative [(g1, p1) ¢ (g2, p2)] ¢ (g3, p3) = (g1, p1) ¢ [(g2, p2) ¢ (g3, p3)] x2 x0+x1+x2 . . . . . . xk–1 x0+x1+...+xk–1 ELECT 90X Example Prefix-Based Carry Network g3, p3 Programmable Logic Circuits 6 g2, p2 1 + + 12 2 + + 6 g[0,3], p[0,3] g[0,2], p[0,2] =g(c , --) = g(c2,3,p--) 3, 4p3 2 ¢ Dr. Amr Talaat ¢ g1, p1 g[0,3], p[0,3] g[0,2], p[0,2] = (c4, --) = (c3, --) 5 Four-input prefix sums network 7 5 g[0,1], p[0,1] g[0,0], p[0,0] Scan order = , --) g1(c , p21, --) =g(c , p 0 1 0 ¢ ¢ g0, p0 g p g p Four-bit Carry lookahead network g[0,1], p[0,1] g[0,0], p[0,0] = (c2, --) = (c1, --) g p ELECT 90X Alternative Parallel Prefix Networks Programmable Logic Circuits xk–1 . . . xk/2 . . . xk/2–1 . . . x0 . . . P refix Sums k/2 P refix Sums k/2 . . . . . . ... s k/2–1 . . . s 0 + + s k–1 . . . s k/2 Dr. Amr Talaat Parallel prefix sums network built of two k/2-input networks and k/2 adders. (Ladner-Fischer) ELECT 90X Brent-Kung Recursive Construction x k–1 x k–2 . . . x3 x2 x1 x0 Programmable Logic Circuits + + + . . . P refix Sums k/2 . . . + s k–1 s k–2 + . . . s3 s2 s1 s0 Dr. Amr Talaat Parallel prefix sums network built of one k/2-input network and k – 1 adders. ELECT 90X Brent-Kung Carry Network (8-Bit Adder) Programmable Logic Circuits [7, 7 ] [6, 6 ] [5, 5 ] [4, 4 ] [3, 3 ] [2, 2 ] [1, 1 ] [0, 0 ] g[1,1] p[1,1] g[0,0] ¢ ¢ ¢ [6, 7 ] ¢ p[0,0] [2, 3 ] [4, 5 ] [0, 1 ] ¢ ¢ [4, 7 ] [0, 3 ] ¢ ¢ Dr. Amr Talaat ¢ ¢ ¢ g[0,1] p[0,1] [0, 7 ] [0, 6 ] [0, 5 ] [0, 4 ] [0, 3 ] [0, 2 ] [0, 1 ] [0, 0 ] ELECT 90X Brent-Kung Carry Network (16-Bit Adder) Programmable Logic Circuits x15 x14 x13 x12 x x x x x x x x x x x x 11 10 9 8 7 6 5 4 3 2 1 0 Level 1 Reason for latency 2 3 4 Dr. Amr Talaat Brent-Kung parallel prefix graph for 16 inputs. 5 6 s 15 s 14 s 13 s 12 s 11 s 10 s 9 s 8 s s s s s s s s 7 6 5 4 3 2 1 0 ELECT 90X Kogge-Stone Carry Network (16-Bit Adder) x15 x14 x13 x12 x x10 x9 x8 x7 x6 x5 x4 x3 x2 x1 x0 s15 s14 s13 s12 s s10 s9 s8 s7 s6 s5 s4 s3 s2 s1 s0 11 Programmable Logic Circuits log2k levels (minimum possible) Dr. Amr Talaat Kogge-Stone parallel prefix graph for 16 inputs. 11 ELECT 90X Speed-Cost Tradeoffs in Carry Networks Programmable Logic Circuits Method Delay Cost Ladner-Fischer ? (k/2) log2k Kogge-Stone ? k log2k – k + 1 Brent-Kung ? 2k – 2 – log2k Dr. Amr Talaat ELECT 90X Hybrid B-K/K-S Carry Network (16-Bit Adder) x x x 15 14 13 x x 12 11 x x 10 9 x 8 x7 x6 x5 x 4 x 3 x x 2 1 x 0 x15 x14 x13 x12 x x x x x x x x x x x x 11 10 9 8 7 6 5 4 3 2 1 0 Leve l Programmable Logic Circuits 1 2 Brent-Kung: 6 levels 26 cells Kogge-Stone: 4 levels 49 cells 3 4 5 6 s 15 s 14 s 13 s 12 s 11 s 10 s 9 s 8 s s s s s s s s 7 6 5 4 3 2 1 0 x15 x14 x13 x12 x 11 s 15 s14 s 13 s12 s11 s 10 s9 s8 s7 s 6 s5 s4 s 3 s2 s 1 s0 x10 x9 x8 x7 x6 x5 x4 x3 x2 x1 x0 BrentKung Dr. Amr Talaat A Hybrid Brent-Kung/ Kogge-Stone parallel prefix graph for 16 inputs. KoggeStone Hybrid: 5 levels 32 cells BrentKung s15 s14 s13 s12 s 11 s10 s9 s8 s7 s6 s5 s4 s3 s2 s1 s0 ELECT 90X Simple Carry-Skip Adders Programmable Logic Circuits c 16 4-Bit Block c12 4-Bit Block c8 4-Bit Block c4 (a) Ripple-carry adder. c16 c 12 4-Bit Block p [12,15] Skip 4-Bit Block c8 p [8,11] Skip c4 4-Bit Block p [4,7] Skip (b) Simple carry-skip adder. c0 3 2 1 0 Ripple-carry st ages c0 3 2 1 0 p[0,3] Ski p logi c (2 gates ) Dr. Amr Talaat Converting a 16-bit ripple-carry adder into a simple carry-skip adder with 4-bit skip blocks. ELECT 90X Another View of Carry-Skip Addition Programmable Logic Circuits g4j+3 p4j+3 c4j+4 c4j+3 g4j+2 p4j+2 c4j+2 g4j+1 p4j+1 c4j+1 g4j p4j c4j One-way street Dr. Amr Talaat Freeway Street/freeway analogy for carry-skip adder. ELECT 90X Multilevel Carry-Skip Adders c out Programmable Logic Circuits c in S1 S1 S1 S1 S1 One-level carry-skip adder. c out c in S1 S1 S1 S1 S1 S2 Example of a two-level carry-skip adder. c out c in Dr. Amr Talaat S1 S1 S1 S2 Two-level carry-skip adder optimized by removing the short-block skip circuits. ELECT 90X Using Two-Operand Adders Programmable Logic Circuits Some applications of multioperand addition • • • • • • • • ---------• • • • • • • • • • • • • • • • ---------------• • • • • • • • a x x0 x1 x2 x3 p a2 0 a2 1 a2 2 a2 3 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • ----------------• • • • • • • • • p (0) p (1) p (2) p (3) p (4) p (5) p (6) s Dr. Amr Talaat Multioperand addition problems for multiplication or innerproduct computation in dot notation. ELECT 90X Serial Implementation with One Adder Programmable Logic Circuits x(i) k bits Adder k + log2 n bits i–1 x (j) j=0 Partial sum register Dr. Amr Talaat Serial implementation of multi-operand addition with a single 2-operand adder. ELECT 90X Pipelined Implementation for Higher Throughput Programmable Logic Circuits x(i–6) +x(i–7) x(i–1) Ready t o comput e Del ay Dr. Amr Talaat x(i) Del ays s (i–12) x(i) + x(i–1) x(i–8) +x(i–9) +x(i–10) +x(i–11) x(i–4) +x(i–5) Serial multi-operand addition when each adder is a 4-stage pipeline. ELECT 90X Parallel Implementation as Tree of Adders Programmable Logic Circuits k k Adder k+1 n–1 adders k k Adder k+1 k k Adder k+1 Adder k+2 Adder k+2 k log2n adder levels Adder k+3 Dr. Amr Talaat Adding 7 numbers in a binary tree of adders. ELECT 90X Carry-Save Adders Programmable Logic Circuits Cut A ripple-carry adder turns into a carry-save adder if the carries are saved (stored) rather than propagated. cin FA FA FA FA FA FA FA FA FA FA FA FA Carry-propagate adder cout Dr. Amr Talaat Carry-save adder (CSA) or (3; 2)-counter or 3-to-2 reduction circuit Carry-propagate adder (CPA) and carry-save adder (CSA) functions in dot notation. Full-adder Half-adder Specifying full- and halfadder blocks, with their inputs and outputs, in dot notation. ELECT 90X Multioperand Addition Using Carry-Save Adders Programmable Logic Circuits CSA CSA CSA Input CSA CSA Sum register Carry register Dr. Amr Talaat CPA CSA Carry-propagate adder Output Serial carry-save addition using a single CSA. Tree of carry-save adders reducing seven numbers to two. ELECT 90X Example Reduction by a CSA Tree Programmable Logic Circuits 8 12 FAs 6 FAs 6 FAs 7 6 5 4 3 2 7 7 7 7 2 5 5 5 5 3 4 4 4 4 1 2 3 3 3 3 2 2 2 2 2 1 --Carry-propagate 1 1 1 1 1 1 1 1 0 7 7 5 3 4 1 2 1 2 1 adder-1 1 Bit position 62 = 12 FAs 6 FAs 6 FAs 4 FAs + 1 HA 7-bit adder Representing a seven-operand addition in tabular form. 4 FAs + 1 HA Dr. Amr Talaat 7-bit adder Total cost = 7-bit adder + 28 FAs + 1 HA Addition of seven 6-bit numbers in dot notation. A full-adder compacts 3 dots into 2 (compression ratio of 1.5) A half-adder rearranges 2 dots (no compression, but still useful) ELECT 90X Width of Adders in a CSA Tree [0 , k –1 ] Programmable Logic Circuits [0 , k –1 ] [0 , k –1 ] [0 , k –1 ] k -bit CSA [1 , k ] [0 , k –1 ] [0 , k –1 ] [0 , k –1 ] k -bit CSA [0 , k –1 ] Adding seven k-bit numbers and the CSA/CPA widths required. [0 , k –1 ] [1 , k ] k -bit CSA [1 , k ] [0 , k –1 ] k -bit CSA [2 , k +1 ] Dr. Amr Talaat The ind ex pair [i , j] mean s t hat b it p os iti on s from i u p to j are i nv olv ed . [1 , k –1 ] [1 , k ] k -bit CSA Due to the gradual retirement (dropping out) of some of the result bits, CSA widths do not vary much as we go down the tree levels [1 , k +1 ] [2 , k +1 ] [2 , k +1 ] k -bit CP A k +2 [2 , k +1 ] 1 0 ELECT 90X Wallace Tree Multiplier Programmable Logic Circuits Dr. Amr Talaat ELECT 90X Wallace Tree Multiplier Programmable Logic Circuits Dr. Amr Talaat ELECT 90X DADDA Tree Multiplier Programmable Logic Circuits Dr. Amr Talaat ELECT 90X DADDA Tree Multiplier Programmable Logic Circuits Dr. Amr Talaat ELECT 90X DADDA Tree Multiplier Programmable Logic Circuits Dr. Amr Talaat ELECT 90X Wallace Tree Multiplier Programmable Logic Circuits Dr. Amr Talaat ELECT 90X Saturating Adders Programmable Logic Circuits Saturating (saturation) arithmetic: When a result’s magnitude is too large, do not wrap around; rather, provide the most positive or the most negative value that is representable in the number format Example – In 8-bit 2’s-complement format, we have: 120 + 26 18 (wraparound); 120 +sat 26 127 (saturating) Saturating arithmetic in desirable in many DSP applications Designing saturating adders Dr. Amr Talaat Adder Unsigned (quite easy) Signed (slightly harder) 0 1 Overflow Saturation value ELECT 90X Readings: Programmable Logic Circuits Main reference for the above slides: Chapters 5,6,7,& 8, B. Parhami, Computer Ar ithmetic: Algorithms and Hardware Design, O xford University Press, 2000. Dr. Amr Talaat ELECT 90X