Shift Operations Source: David Harris Aug 2007 1 Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold drop. Source: David Harris Aug 2007 Not amenable to synthesis, high capacitive loading for large arrays. 2 Shifter Implementation Each level shifts by two. Amenable to synthesis, fast. Aug 2007 3 Multiplication Source: David Harris Aug 2007 4 Array Multiplier with CPAs Array adder with Carry propagate adders (CPA), multiple near-critical paths Source: Jan Rabaey Aug 2007 5 Array Multiplier with CSAs Only one critical path Source: Jan Rabaey Aug 2007 6 How do CSAs work? CSA: Carry Save Adder Want to add these four numbers together (same problem as adding partial products in a multiplier) Source: David Harris Aug 2007 7 How do CSAs work? (cont) Can use a full adder network to add three numbers together if we view the carry-in inputs as a bus that contains the third number. The output produces a sum vector and a carry vector, and these have to be added to produce the final result. Source: David Harris Aug 2007 8 How do CSAs work? (cont) carry vector has to be shifted to left by 1 before being added to the sum because the COUT bit has a weight of 2x that of the sum bit. Source: David Harris Aug 2007 9 CSA Multiplier Carry is shifted to left before being added. This final addition is always N/2 in size if the product has N bits. For large multipliers, need to use a fast adder structure to do this addition. Source: Jan Rabaey Aug 2007 10 Multiplier Layout Layout can be made to be rectangular Source: David Harris Source: David Harris Aug 2007 11 Source: David Harris 2’s Complement Multiply Definition MSb has negative weight MSb has negative weight 4 bit 2’s complement example: = -5 = 0xB = 1011 = -1*23 + 0*22 +1*21 +1*20 =-8+0+2+1=-5 Source: David Harris Aug 2007 12 2’s Complement Multiplication Source: David Harris 2’s complement Aug 2007 Source: David Harris 13 Modified Baugh-Wooley Multiplier (2’s complement) Source: David Harris Pre-compute sums of constant ‘1’, push some terms upwards. Aug 2007 14 Multiplier Layout For Two’s Complement Shaded Cells are modified cells for BaughWooley. Source: David Harris Aug 2007 15 Booth Encoding Previous multipliers use radix-2, one bit of the multiplier is observed at a time. In general, radix-2r multipliers produce N/r partial products (assuming NxN multiplier). Fewer partial products lead to smaller/faster CSA arrays. A radix-4 = radix-22 multiplier produces N/2 partial products. Two-bits * two bits = Y1Y0 * X1X0 = Y*X = Y*0, Y*1, Y*2, Y*3 Y*0, Y*1, Y*2 are easy/fast (Y*2 is a shift). Y*3 is hard, has to be done Y*3= Y*(2+1)= 2Y + Y, involves a carry propagate. Aug 2007 16 Radix-4 Partial Products Y * XN-1XN-2...X3X2 X1X0 Y* X1X0 + + Y* X3X2 Number of partial products is reduced. Y* XN-1XN-2 Source: David Harris Aug 2007 17 Booth Encoding (cont.) Observe that 2Y = 4Y – 2Y and 3Y = 4Y – Y 4Y is simply the next row in the partial product, so just add Y to next row. In both cases, Y has to be added to current partial product. Booth encoding looks at current 2 bits, and MSB of previous 2 bits, and modifies the partial product. If the MSB of the previous pair is ‘1’, add in ‘Y’ to current value. Aug 2007 18 Booth Encoding (cont) PP =0*Y PP =0*Y +Y = Y PP =Y +0 = Y PP =Y +Y = 2Y PP =-2Y +0 = -2Y PP =-2Y +Y = -Y PP =-Y +0 = -Y PP =-Y +Y = 0 Negative operations are done at bit level as complements with +1 added to PP to complete 2’s complement 1Y select Aug 2007 2Y select Sign bit select Source: David Harris 19 Booth Selection Logic Replaces AND gates in CSA array When –Y is chosen, have a problem in that a ‘1’ has to be added to complete two’s complement Source: David Harris Aug 2007 20 Unsigned R-4 Booth Array (16 x 16) sign extension, either all 1’s or all 0’s for -Y terms Extra PP in case last PP needed a ‘Y’ added in here (last two X bits were either 2 or 3) Source: David Harris ‘1’ or ‘0’ needed to complete 2’s complement Aug 2007 21 Optimized R-4 Booth Array (unsigned) SSSS = 1111 + S additional reduction produces this. Source: David Harris Aug 2007 22 Signed R-4 Booth Array (16 x 16) ei = Mi xor y15 Last PP8 is not needed for signed multiply Source: David Harris Aug 2007 23 Booth Speedup • Radix-4 arrays 20-to-50% smaller than CSA arrays and up to 20% faster. • Higher Radix multipliers are possible, but not worth it except for larger multipliers (at least 64 bits). Aug 2007 24 Wallace Trees A CSA adder just adds the PPs together one at a time: 3,2 Counter is another name for a full adder Source: David Harris Aug 2007 25 Wallace Trees (cont). A Wallace tree adds the partial products in parallel! Number of levels is: Layout is not regular, long wires can cause delay. Source: David Harris Aug 2007 26 4-2 Compressor Used to reduce the number of levels in a Wallace Tree Number of levels is: Layout is more regular. Source: David Harris Logic more complex than Full AdderAug 2007 27 Multiplier Summary • CSA’s – simple, but many partial products • Booth Encoding – reduces number of required PPs, achieves speedup over CSAs • Wallace Trees – adds PPs in parallel Aug 2007 28