MAGNITUDE COMPARATOR WITH SMALL TRANSISTOR COUNT A HIGH-SPEED Shun-Wen Cheng Tamkang Univ., Taipei, TAIWAN swcheng@ieee.org ABSTRACT The comparator is a very basic and useful arithmetic component of digital systems. An individual, compact, high-performance, good cost-benefit ratio comparator core plays an important role on almost all hardware sorters. The study proposes a tine cost-performance ratio comparator design. Based on modified 1’s complement principle and conditional sum adder scheme, the proposed design has small transistor count and short propagation delay. Post-layout simulations based on TSMC 0.6um lP3M CMOS process has completed. It shown a 64-b static CMOS comparator of the proposed architecture only needs 1,556 transistors and 4.211s. max(a,b) min(a,h) ha ha#; ba a min(a, b) max(a,b) a ;#b a Pb a#; b a <b ;#; a<b a 2b Figure 1. Compare & swap elements are vital for sorting Level-1 stage 1 , Index Term - magnitude comparator, digital comparator, sorter, 1’s complement, conditional sum adder, CMOS, digital IC and VLSI. Level-2 stage2 -stage . 1 , Level -3 sub-soner , , stage 1 stage2 stage3 1. INTRODUCTION Sorting is one of the most important problems in computer science. Many fundamental processes in computing and communication systems require sorting of data. Sorting network play a key role in the areas of parallel computing, multi-access memories and multiprocessing [I], [2], [SI. As depicted in Fig. 1, compare and swap elements of data are vital for sorting. In conventional computer systems, instruction COMPARE and instruction SUBTRACT often share the hardware. This can reduce cost. And the time complexity is limited on O(n) of radix sort or O(n log n) of quick sort in average cases [SI. . ..... Figure 2. A three-level bitonic sorter Figure 2 displays an eight number three-level hitonic sorter. It uses 24 comparators to attain a higher performance target. The time complexity of n (log n)‘ comparator bitonic sorter is O((log n)‘), far better than common software solutions [I], [2]. Ancient magnitude comparators are shown in Fig. 3 [41. The circuit compares two binary number A and B, and produces three output: A>B, A=B, A<B. In many applications, two output is enough: A t B and A<B. Due to the limitation of CMOS logic [XI, 4-bit comparator is the basic constructive unit. The example just reveals circuit costlcomplexity of a (2k)-bit comparator are often not only twice than a k-bit comparator. Implement a long bit-length comparator by the old scheme is uneconomical. But if someone needs to process long digit integer sorting, then directly design a corresponding hardware sorter, the comparators array will become very large. At this time, a compact, high-performance comparator core is very important. This paper is organized as follows. In Section 2, it shows the feasibility of modified 1’s complement for comparator design. Then the proposed comparator architecture is presented in Section 3. Finally conclude the major findings and outline the future work. 0-7803-8163-7/03/$17.00 B 2003 IEEE ICECS-2003 1168 x=0 Y = I n o o n o,= 84,, I I , = 67,,, A. X - Y = 01010100 - 0 lOOOOll (84 > 67) A>=B 4 1 0 I O 1 0 o O l O l O l D O (a) I-bit magnitude comparator. + ~ c o 1 0 1 1 1 I u t 0 ~0 1o 0 0 0 0 +'....................... = ,-Sum..-- --- -- - -- , I = ,@-around cany_bilJ 0 0 1 0 0 0 1 =Campm0 =x Do=1's CUmplementofY = CarrectAnswer: 17," B. X -X =01010100 - 01010100 (84= 84) 0 1 0 1 0 1 0 0 - + (b) %bit comparator. I 0 I 0 1 0 I C a m p r n l 1 1 I 1 1 1 1 + ~ C a m 1 p =x K= 1's wmplementofX = Cclrrecthswer. =Edcarry-in bit 1 0~ 0 O0 0 0 0 0 C. Y - X =01000011 -01010100 (67 < 84) 0 1 0 0 0 0 1 1 + I 0 I 0 1 0 m ~ o u t m I 1 + ~ - C a m I 0 I I 1 0 I p = Y E= I'r,mplementofX = CiirrectAnswer:-l7,, = E < e d cany-in bit I ~ I l O l1 I 1 I Figure 4. Modified l'c compkment for comparator design. (c) 4-bit comparator. 2. MODIFIED 1's COMPLEMIENT FOR COIMPARATOR DESIGN BfY A,* (d) 16-bitcomparator. Figure 4 displays a modified 1's complement scheme. For comparator design, thi: cany out bit information is only concerned. If X > Y , bit Cout = 1. If X $Y, bit Cout =O. After modificalion, the scheme always adds a fixed carry, then if X L Y , bit Comp = 1. If X < Y , bit Comp =O. Thus the status of hit Comp fits the convention of comparison. The classic designs in Fig. 3, they need two hits to express the same information, is ineffective. In common discussions, both ONO numbers are positive. If two numbers have different signs, directly compare the sign bit and then the answer is obtained. If both two numbers are negative, the answer is just oonosite. At this time make the fixed cam-in bit = 0 always, then the output signal Comp = Comp Cl? Sign-hit, and the condition is solved. I . Figure 3. The classic circuit of magnitude comparators. 1169 Stage0 i Stage 1 Stage 2 Stage 3 A7 B7 Comp = 1 , A>=B; Comp = 0, A < B. AS BS A3 E3 A2 B2 AI BI A0 BO Figure 5. An 8-b brief schematic example of the proposed comparator architecture. An individual, high-performance, good cost-benefit ratio comparator core plays an important role on almost all hardware sorters. 3. THEPROPOSEDCOMPARATORARCHITECTURE For high-performance demand, the proposed architecture is improved from Conditional Sum Adder 161. Originally Cany = AB + AC + BC = AB + (A + B) C. Now if C=O, Cany =AB. If C=l, Cany = AB + (A+B) = A + B. The sum of MUX gates of N-bit comparator is, ., 2(2k-1), k=1 where M = l o g , N . Figure 5 shows an %bit comparator example of the proposed comparator architecture. The 8-b comparator needs 11(=1+3+7) 240-1 multiplexers. The schematic need eight inverters to generate complementary values of input B. The total transistor count is eight inverters, eight two-input OR gates, and seven two-input AND gates, and eleven 2-10-1 multiolexers. The static CMOS AND gate and OR gate internally generate their complementary signals [SI. They can provide for the use of stage-l multiplexers, so this reduces the requirements of inverter. 1170 Bit Number Cornoarator Of 2 II design II The Proposed design with [41 oseudo-NMOS static 38p+38n 8p+13n lI l I The Proposed design with static CMOS 4. CONCLUSmIG REMARKS I 13p+I3n The complexity informarion is listed in Table 1. The author found the transistor count of the new design is less than that required in the conventional design, while the transistor count of the new design with static CMOS is only approximately half of the conventional design. Post-layout simulation results are summarized in Table 2. The comparisons of a:omparator design are based upon TSMC 0.6um Single-layer Polysilicon Triple-layer Metal (1P3M) CMOS Process Technology. The transistor count and layout area of the proposed comparator are both less than 1998 Chua-Chin Wang’s Comparator [71 and 2003 Chung-Hsun Huang’s Comparator [31. And the worst propagation delay is shorter than their designs. 32 64 1,522p+1,522n 375p+742n 742p+742n ACKNOWLE!DGEMENT 400~x380~ comparator, (= 72,172 w’) 576pmx120pm The Proposed Comparator. (using static CMOS) I (= 69,120 p2) I I I Table 2. Transistor count and simulation comparisons of 64-b comparator designs. (All are based on TSMC 0.6um CMOS prncess.) REFERENCES The total gate count of N-hit comparator is, INV x N + ANDZx N + OR2x(N - I) + Mux2fols,,,, x(N - I)+ i *I M~2~~Lzb~,.b7~rz,x lzC2*-l)l-(N-l) where M = log N. *.I The author, Shun-Wen ‘Cheng, would like to thank his advisor Prof. Kuo-Hsing Cheng, for his previous teaching on IC design. Prof. Cheng has already left Dept. of EE, Tamkang University, Tamsui, TAIWAN, and now he joins Dept. of EE, National Central University, Chung-Li, TAIWAN. One of the stars of Tamkang has gone.. . [I] K. E. Batcher, “Sorting Networks and Their Applications,” in Proc. AFIPS 1968 Spring Joinr Computer Conference, 1 pp. 307-314, Apr. 1968. I So the total transistor count of the 8-b static CMOS comparator is (lp+ln)x8 + (3p+3n)x8 + (3p+3n)x7 + (2p+2n)x7 + (3p+3n)x(ll-7) = 79p + 7911. The total transistor count of N-bit static CMOS comparator is, 1 M 1 (4p+4n)N +(5p+5n)(N - 1 ) + ( 3 p + 3 n ) [ x ( 2 * -l)I-(N -I), *.I where M = log N. If N = M-hit, and eighteen buffers are used for increasing driving capability, therefore the total transistor count is 742x2 + 18x4 = 1,556. Under the proposed comparator architecture, implement AND gate and OR gate by NMOS logic, and use pure NMOS multiplexer networks, will have the fewest transistor count but large power dissipation and slow operation speed. Complementary Pass-transistor Logic (CPL) can reduce the data skew problem and power dissipation. We can use CPL to replace static CMOS logic gates in low-power low-voltage applications. The circuit easily partitions to several stage pipelines for increasing the hardware sharing and data throughput. [2] Shun-Wen Cheng, “Arbitrtuy Long Digit Sorter HWISW Co-Design,” in Proc. Asia and South Pacific Design Auromarion Conj, ASP-DAC‘03, pp. 538-543, Jan. 2003. [3] Chung-Hsun Huang and Jinn-Shyan Wang, “HighPerformance and Power-Efficient ChllOS Comparators”, IEEE J. Solid-state Circuii’s, Vol. 38, pp. 254-262, Feb. 2003. [4] Kai Hwang, Computer Arithmetic-Principles, Architecture andDesign. Reading: John Wiley & Sans, 1979. [51 D. E. Knuth, Soning and Searching. Reading: AddisonWesley, 1973. [6] J. Sklansky, “Conditional-Sum Addition Logic,” IRE Transncrions on Elecrronic Computers, Vol. EC-9, No. 2, pp. 226-231, June 1960. [7] Chua-Chin Wang, C.-F. Wu, and K.-C. Tsai, “A 1.0 GHz 64-bit High-speed Comparator Using ANT Dynamic Logic with Two-Phase Clocking,” IEE Proceedings Compurers and Digital Te,rhniques, vol. 145, no. 6, pp. 433436, Nov. 1998. [8] N. H. E. Weste and K. E:ihraghian, Principle of CMOS VLSIDesign, 2nd Ed., Reading: Addison-Wesley. 1993. 1171