Dhaka Univ. J. Eng. & Tech. Vol. 1(2) 1-5, 2011 (January) 1 Design of a Reversible Parallel Loading Shift Register Ashis Kumer Biswas, Lafifa Jamal, M. A. Mottalib1, Hafiz Md. Hasan Babu Department of Computer Science & Engineering, University of Dhaka, Dhaka, Bangladesh ashis@cse.univdhaka.edu, lafifa@univdhaka.edu, hafizbabu@hotmail.com 1 Department of Computer Science & Information Technology, Islamic University of Technology, Gazipur, Bangladesh mottalib@iut-dhaka.edu Received on 24.11.2010. Accepted for publication on 20.01.2011 Abstract Reversible flip-flops are the most significant memory elements that store data consuming zero power which will be the target building block of memory for the forthcoming quantum computing devices. Therefore cost effective solution to design varieties of flip-flops has been considered a major objective for the researchers quite a long time. In this paper, we propose a new approach to design less costly RS and D Flip-Flop as compared to the existing designs. The cost is measured in terms of numbers of gates, garbage bits and quantum cost. The proposed designs are then applied to build reversible shift register that can load inputs in parallel and speed up operations where it will be used. Appropriate algorithms and theorems are presented to clarify the proposed design and to establish its efficiency. Keywords : Parallel loading reversible shift registers, reversible flip-flop, garbage output, quantum cost 1. Introduction Landauer [1] showed that irreversible circuits must consume power and consequently dissipate heat whenever they erase or discard information. Further, Bennett [2] showed that to avoid the heat dissipation circuits must be built using reversible gates only. Hence there are compelling reasons to consider circuits composed of reversible gates only and the synthesis of such networks. Moreover, reversible implementations have applications in quantum computing [3] and nanotechnology [4]. Despite the great potential of reversible logic and these endorsements from the leaders in the field, little work has been done in the area of sequential reversible logic. Toffoli [5] proved that a finite automaton is reversible if its transition function is invertible. In order to realize a finite automaton with a reversible sequential circuit, it suffices to build a reversible circuit realizing the transition function and use this as the combinational part of the sequential network. This paper introduces new realizations of RS and D FlipFlops. The proposed flip-flops are then used to build reversible shift register that can load data in parallel. Rest of the paper is organized as follows: Section 2 provides the necessary background on reversible logic along with the examples of popular reversible logic gates. Section 3 provides existing realizations of RS and D Flip-Flops as well as the new realizations. Section 4 contains novel design of reversible memory circuits. Proper algorithmic procedures, theorems and lemmas are devised to modularize the design approaches. 2. Basic Definitions In this section, basic definitions related to reversible logic are presented. We formally define reversible gate, garbage output and present the popular reversible gates along with their input-output specifications. Definition 1. A Reversible Gate is a k-input, k-output (denoted by k*k) circuit that produces a unique output pattern for each possible input pattern [6][7]. Definition 2. Unwanted or unused outputs which are needed to maintain reversibility of a reversible gate (or circuit) are known as Garbage Outputs. Definition 3. The delay of a logic circuit is the maximum number of gates in a path from any input line to any output line. This definition is based on the assumptions that: (i) Each gate performs computation in one unit time and (ii) All inputs to the circuit are available before the computation begins [8]. Definition 4. The input vector, Iv and output vector, Ov for 2*2 Feynman Gate (FG) [9] is defined as follows: Iv = (A, B) and Ov = (P = A and Q=A B). Definition 5. The input vector, Iv and output vector, Ov for 3*3 Toffoli gate (TG) [5] is defined as follows: Iv = (A, B, C) and Ov = ( P = A, Q = B and R = AB C). Definition 6. The input vector, Iv and output vector, Ov for 3*3 Fredkin gate (FRG)[10] is defined as follows: Iv= ( A, B, C) and Ov=(P=A, Q= AB ACand R = AC AB). Definition 7. The input vector, Iv and output vector, Ov for 3*3 Peres gate (PG) [11] is defined as follows: Iv = (A, B, C) and Ov = (P=A, Q=A B and R=AB C). 2 Ashis Kumer Biswas, Lafifa Jamal *, M. A. Mottalib, Hafiz Md. Hasan Babu Table 1: Comparison of clocked RS Flip-flop designs Criteria Number of gates Number of garbage bits Quantum Cost Existing design [16] With fanout Without fanout 6 7 8 8 34 35 Definition 8. The quantum cost of a circuit is the minimum number of 2*2 unitary gates to represent the circuit keeping the output unchanged. The quantum cost of a 1*1 gate is 0 and that of any 2*2 gate is the same, which is 1 [12]. So, the quantum cost of a 2*2 Feynman gate is 1. Again, the quantum cost of a 3*3 Toffoli, Fredkin, Peres and New gate is 5, 5, 4 and 11 respectively [13][14][9]. 3. Reversible Flip-Flop Design A number of researchers including Toffoli [5] and Frank [15] discussed the potential for sequential reversible logic, but did not present any structure for its realization. Fredkin and Toffoli [10] appear to be the first to suggest a conservative logic sequential element in the form of a JK Flip-Flop. This section discusses about existing realizations of RS and D Flip-Flops along with the proposed realizations. A. RS Flip-Flop Designs Thapliyal [16] proposed reversible clocked RS Flip-Flop which is a direct realization of conventional irreversible circuit realized. His design is costly because it incorporated two New gates whose quantum cost is higher than other reversible gates which is mentioned at Definition 8 in Section 2. Moreover, the design contained a single fan-out which is forbidden in strict reversible sense [17]. 1) Proposed R S Flip-Flop The proposed realization of R S Flip-Flop is shown in Figure 1. As we can perform NAND operations using a single 3*3 Peres gate instead of using Fredkin gate, we can design less costly reversible R S Flip-Flop using the Peres gate. Figure 1. Proposed Clocked R S Flip-Flop The two first level Peres gates generate S.CLK and R.CLK which are then fed to S and R lines respectively of the Proposed Design 5 6 17 second level of Peres gates. Thus the proposed clocked R S Flip-Flop performs the same way as the conventional irreversible design using NAND gates. 2) Comparison of RS Flip-Flop Designs Table 1 shows the comparison of the clocked reversible RS and R S Flip-Flops. Since the design of Thapliyal [16] consists of two New gates and the quantum cost of each New gate is 11, so the overall design cost increased. Moreover his realization contained two redundant Feynman gates. But the proposed design (Figure 1) is less costly in terms of number of gates, garbage bits and quantum costs than the existing design. B. D Flip-Flop Designs Thapliyal [16] proposed a reversible clocked D Flip-Flop that is also costly because it contained four New gates. Moreover a fanout can be seen in the design. 1) Proposed D Flip-Flop using Proposed R S FlipFlop The proposed design of reversible clocked D Flip-Flop employs the proposed R S latch of Figure 1. The resultant flip-flop is shown in Figure 2. The first level of the design is simple to verify, D.CLK and D.CLK are generated by the Feynman and Fredkin gates and these are fed to the R and S input of the proposed R S -latch (shown in Figure 1). A total of five gates complete the clocked version of reversible D Flip-Flop. Figure 2. Proposed reversible clocked D Flip-Flop using proposed R S latch Dhaka Univ. J. Eng. & Tech. Vol. 1(2) 1-5, 2011 (January) 3 Table 2: Comparison of reversible clocked D Flip-Flop Existing Design [16] Criteria With fanout Without fanout Proposed Design as shown in Figure 2 Proposed Design generated using equation (i) (Figure 3) Number of gates 7 8 5 2 Number of garbage bits 8 8 5 2 Quantum Cost 47 48 15 9 2) Proposed D Flip-Flop using Output Equation Figure 3 shows the proposed clocked D Flip-Flop that uses the same technique to map the output equation of the flipD . CLK Q . CLK flop which is Q … … … (i) Since Equation (i) can be generated using a single 3*3 Fredkin gate, we will notice significant reduction of reversible gates in the proposed D Flip-Flop. Table 3: Truth table for the Parallel loading circuit of the Shift Left Register LOAD 0 0 1 1 SHL 0 1 0 1 Final output (Qi+) Qi (No Change) Qi-1 (Left Shift) Xi (LOAD from vector X) * (Forbidden) Table 3 shows the truth table of the parallel loading circuit. The characteristic function of Qi+ would be: . . . . Q Q Q SHL LOAD X LOAD SHL i i i 1 i … … (ii) The characteristic function can be modified to simplify the design approach without changing the output behavior Figure 3. Proposed Reversible clocked D Flip-Flop 3) Comparison of D Flip-Flop Designs The proposed clocked D Flip-Flop is efficient to use in designing reversible memory circuits because of its superiority over the existing one [16] in terms of number of gates, garbage bits and quantum cost. The comparison is shown in Table 2. as follows: (iii) Algorithm 1. PARALLEL-LOADING-CIRCUITALGORITHM ( LOAD , SHL , Xi , Qi , Qi-1 ) Input: Output: 4. Reversible Shift Register Design A shift register stores binary data and can be shifted left or right when a shift signal is applied. Bits shifted out one end of the register may be either lost or shifted back on the other end. In this section, design of an efficient reversible shift register is proposed as well as parallel loading of the proposed shift register will be discussed. Parallel Loading Reversible Shift . . . . Q Q Q SHL LOAD X LOAD SHL i i i 1 i … LOAD and SHL: the control signals of the register. Xi : ith data input to the register. Qi-1 and Qi : output of the (i-1)st D Flip-Flop and ith D FlipFlop respectively. Next output expression Qi+ as well as LOAD, SHL and Qi signals should be regenerated to drive next bit position’s PARALLEL-LOADING-CIRCUIT. BEGIN Step 1: The first block will take LOAD and Qi and compute .Q LOAD .Q Step 2: The second block will take LOAD and SHL as i i .SHL .Q input and compute LOAD Step 3: The third block will take LOAD and Xi as input and i .X i compute LOAD . Left/Right Q Xi as SHL Step 4: Next block takes LOAD and LOAD . . . Q LOAD X LOAD SHL i i inputs and compute Step 5: The fifth block will take SHL and Qi-1 and compute . Registers Parallel loading of reversible shift registers is another step in the evolution of shift registers. The circuit can load X bits (X is an input vector consisting of n bits: Xn-1Xn-2Xn3…X2X1X0) directly into the flip-flops, the same as a buffer register. This kind of entry is called parallel or broadside loading; it takes only one clock pulse to store an entire digital word. . i . .Q SHL i1 Qi1and Step 6: The sixth block takes SHL . . . Q LOAD X LOAD SHL i iand compute the final Qi+ expression: . . . . Q Q Q SHL LOAD X LOAD SHL i i i 1 i . END 4 Ashis Kumer Biswas, Lafifa Jamal *, M. A. Mottalib, Hafiz Md. Hasan Babu Example 1. Figure 4 shows the direct implementation of Algorithm 1 to construct the parallel loading circuit for the controlled shift register. Output: Next output expression Qi+ as well as LOAD, SHL and Qi signals should be regenerated to drive next bit position’s PARALLEL-LOADING-CIRCUIT. BEGIN Q0+ = PARALLEL-LOADING-CIRCUITALGORITHM ( LOAD , SHL , X0 , Q0 , Din ); Drive D0 using Q0+. FOR i 1 to n-1 DO Qi+ = PARALLEL-LOADING-CIRCUITALGORITHM ( LOAD , SHL , Xi , Qi , Qi-1 ); Drive Di using Qi+. END FOR END Figure 4. Reversible Parallel Loading circuit for the shift register Theorem 1. The number of gates of the proposed reversible parallel loading circuit is at least 8. Proof: As there are four AND operations in the final output of the proposed circuit, four Peres gates can easily be used to implement the said operations. In addition, two Feynman gates are used to copy two signals, such as LOAD and SHL in the above mentioned circuit. Moreover the two Feynman gates are required to implement the XOR operations of the circuit. Therefore, the total number of gates in the circuit is = Number of gates to implement the AND operations + number of gates to copy two signals + number of gates to implement two XOR operations = 4 + 2 + 2 = 8. Example 2. Figure 5 shows the direct implementation of Algorithm 2 to construct the parallel loading circuit for the controlled shift register where the argument of the function PARALLEL-LOADING-CONTROLLED-SHIFTREGISTER-CONSTRUCTION ( n ), value for n is 4. Lemma 1. The quantum cost of the proposed reversible parallel loading circuit is at least 20. Proof: In Theorem 1, it was proved that the proposed parallel loading circuit requires at least four Peres gates and four Feynman gates. As we know from Section II, the quantum cost for each 3*3 Peres gate is 4 and that of each 2*2 Feynman is 1. So the proposed parallel loading circuit has a total quantum cost 20. One parallel loading circuit can be used for each bit position in the controlled shift register. Each Qi+ signal generated from the corresponding parallel loading circuit drives the Di inputs of ith D Flip-Flop. Algorithm 2 presents the design procedure of n-bit parallel loading shift register. Algorithm 2. PARALLEL-LOADING-CONTROLLEDSHIFT-REGISTER-CONSTRUCTION ( n ) Input: n : Number of bits handled by the register LOAD and SHL: the control signals of the register. Xi : ith data input to the register. Qi-1 and Qi : output of the (i-1)st D Flip-Flop and ith D Flip-Flop respectively. Figure 5. Parallel loading of reversible Shift Register Theorem 2. Let Tg be the total number of gates and Tb be the total number of garbage bits. Then, for the proposed n bit reversible parallel loading shift register the relation between Tg and Tb is Tg + Tb ≥ 20n. Proof: As each Qi+ signal generated from the corresponding parallel loading circuit of an n bit reversible shift register drives the Di inputs of ith D Flip-Flop, a total of n bit reversible D Flip-Flops are required in this circuit. Each of the n Qi+ bits are to used both in the parallel loading circuit and in the output buffer, so we need a Feynman gate for each of the n bits of the proposed reversible shift register. Therefore, the total number of gates required for each bit in the circuit is = total number of gates in a proposed D Flip-Flop + total number of gates in proposed Parallel loading circuit + a Feynman gate = 2 + 8 + 1 = 11. Moreover each of the proposed parallel Dhaka Univ. J. Eng. & Tech. Vol. 1(2) 1-5, 2011 (January) 5 loading circuit generates seven garbage bits and D FlipFlop generates two garbage bits. Therefore, for each bit of the n bit parallel loading shift register 9 garbage bits are generated. So, for an n bit proposed parallel loading shift register the relation of Tg + Tb ≥ 20n is true. "Reversible Logic Synthesis for minimization of full-adder circuit," Belek, Antalya, Turkey: 2003, pp. 50-54. Lemma 2. The proposed realization of the n bit parallel loading shift register has a total quantum cost at least 30n. Proof: From Lemma 1, the quantum cost of a parallel loading circuit is 20. From Figure 5 of the proposed design, a D Flip-Flop and a Feynman gate are required for each bit of the shift register. As the quantum costs of a 2*2 Feynman gate and that of the proposed D Flip-Flop are 1 and 9 respectively, the total quantum cost of the n-bit proposed reversible parallel loading shift register is = The quantum cost of n proposed parallel loading circuit + the quantum cost of n D Flip-Flop + the quantum cost of n Feynman gate = 20n + 9n + n =30n. 5. Conclusion Parallel loading is the most efficient approach over serial loading in terms of time it takes to instantiate a shift register. This paper demonstrated a novel approach to load data in parallel into the reversible shift register. The design used parallel loading circuit for each bit of the reversible register and reversible D Flip-Flops which were also proposed in this paper. The designs were compared with the existing designs to show that the proposed designs are less costly in terms of number of gates, number of garbage bits and quantum cost. References [1] Rolf Landauer, "Irreversibility and Heat Generation in the Computing Process," IBM Journal of Research and Development, vol. 5, pp. 183-191, 1961. [2] Charles H.Bennett, "Logical Reversibility of computation," IBM Journal of Research and Development, vol. 17, no. 6, pp. 525-532, 1973. [3] E.Knill, R.Laflamme, and G.J.Milburn, "A scheme for efficient quantum computation with linear optics," 2001, pp. 46-52. [4] Ralph C.Merkle, "Reversible electronic logic using switches," Nanotechnology, vol. 4, pp. 21-40, 1993. [5] Tommaso Toffoli, "Reversible Computing," Automata, Languages and Programming, 7th Colloquium of Lecture Notes in Computer Science, vol. 85, pp. 632-644, 1980. [6] Hafiz Md.Hasan Babu, Md.Rafiqul Islam, Ahsan Raja Chowdhury, and Syed Mostahed Ali Chowdhury, "Synthesis of full-adder circuit using reversible logic," International Conference on VLSI Design, vol. 17, pp. 757760, 2004. [7] Hafiz Md.Hasan Babu, Md.Rafiqul Islam, Ahsan Raja Chowdhury, and Syed Mostahed Ali Chowdhury, [8] Ashis Kumer Biswas, Md. Mahmudul Hasan, Ahsan Raja Chowdhury and Hafiz Md. Hasan Babu, “Efficient approaches for designing reversible Binary Coded Decimal adders”, Elsevier Journal of Microelectronics, vol. 39, pp. 1693-1703, 2008. [9] Richard P.Feynman, "Quantum mechanical computers," Foundations of Physics, vol. 16, no. 6, pp. 507-531, 1986. [10] Edward Fredkin and Tommaso Toffoli, "Conservative Logic," International Journal of Theoretical Physics, vol. 21, pp. 219-253, 1982. [11] A. Peres, "Reversible Logic and Quantum Computers," Physical Review A, vol. 32, pp. 3266-3276, 1985. [12] J.Smoline and David P.DiVincenzo, "Five two-qubit gates are sufficient to implement the quantum Fredkin gate," Physical Review A, vol. 53, no. 4, pp. 2855-2856, 1996. [13] Yang G., Song X, Hung WNN, and Marek Perkowski, "Bidirection Synthesis for Reversible circuits," IEEE Computer Society Annual Symposium on VLSI New Frontiers in VLSI Design, 2007. [14] D. Maslov and G. W. Dueck. Reversible Cascades with Minimal Garbage. IEEE Transactions on CAD, vol. 23(11), pp. 1497-1509, November 2004. [15] Michael P.Frank, "Physical Limits of Computing," CIS 4930. 1194X/6930. 1078X, 2000. [16] Himanshu Thapliyal, M.B.Srinivas, and M.Zwolinski, "A Beginning in the Reversible Logic Synthesis of Sequential Circuits," MAPLD Conference (NASA office of Logic Design), vol. 8 2005. [17] D. Maslov, G.W. Dueck, D.M. Miller, “Toffoli Network Synthesis with templates”, IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, Vol 24, Issue 6, June 2005, pp:807-817.