International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 Design and Implementation of an Enhanced LUT System in Security Based Computation dama.dhanalakshmi1, K.Annapurna2 Vignan Universit, Guntur district ABSTRACT: In this project, the anti-symmetric process generations, and so do not take deep-submicron product coding (APC) and odd-multiple-storage (OMS) electrical effects into account. In the present work, we techniques for lookup-table (LUT) design for memory- perform detailed transistor-level design of circuits and based multipliers are presented to be used in digital perform appropriate buffer and transistor sizing for all the signal processing applications. All these techniques logic and routing elements. Field Programmable Gate results in the reduction of the LUT size by a factor of Arrays (FPGAs) are an attractive hardware design option, two. We present a different form of APC and a modified making technology mapping for FPGAs an important OMS scheme, in order to combine them for efficient EDA problem. For an excellent overview of the classical memory-based multiplication. The proposed combined and recent work on FPGA technology mapping, focusing approach provides a reduction in LUT size to one-fourth on area, delay, and power minimization, the reader is of the conventional LUT. It has also suggested a simple referred to [2]. The recent advanced algorithms for FPGA technique for selective sign reversal to be used in the mapping, such as [2][12][16][23], focus on area proposed design. It is shown that the proposed LUT minimization under delay constraints. If delay constraints design for small input sizes can be used for efficient are not given, first the optimum delay for the given logic implementation of high-precision multiplication by structure is found and then area is minimized without input operand decomposition. changing delay. In terms of the algorithms employed, the Keywords: anti-symmetric odd- mappers are divided into structural and functional. multiple-storage, lookup-table, Digital signal processing Structural mappers consider the circuit graph as a given INTRODUCTION: A look-up table (LUT) size of 4 is and find a covering of the graph with K-input subgraphs the most area efficient in a non clustered context. A corresponding to LUTs. Since functional mappers explore LUT size of 5 to 6 gave the best performance. The work a larger solution space, they tend to be time-consuming, in [12] has suggested that using a heterogeneous which limits their use to small designs. In practice, FPGA mixture of LUT sizes of 2 and 3 was equivalent in area mapping for large designs is done using structural efficiency to a LUT size of 4 and, hence, could be a mappers, whereas the functional mappers are used for good choice. In addition, [1] states that a logic structure resynthesis after technology mapping. In this paper, we using two three-input LUTs was most beneficial in consider terms of area and speed. However, it must be noted that representative of the advanced structural technology both these last two papers did not perform a full area or mapping for LUT-based FPGAs and refer to it as “the delay study where a range of LUT sizes was examined. previous work” and discuss several ways of improving it. First, prior work focused on non clustered logic blocks, LOOK UP TABLE: which are known to have a significant impact on the LUT means “Look Up Table.” It’s helpful to think of it area and delay [21]. Second, most prior studies tended like a math problem: R= S+L to look at area or delay, but not both as we will here. “R” being your result or what you want to attain.“S” Third, prior results were based on IC process being your source or what you start with. “L” being your generations that are several factors larger than current LUT or the difference needed to make up between your ISSN: 2231-5381 product coding, the http://www.ijettjournal.org recent work on DAOmap Page 3308 [2] as International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 source and your desired outcome. In all cases of LUT as the address for the LUT, then the corresponding use, the LUT is the means to make up the difference product value A · Xi is available as its output. Let input be between source and result.((All cases assume the “X” , and it should be multiplied with A. The products are colorist (or you) is grading through a correctly as shown in second column of above table. In our design calibrated monitor for evaluation and finishing. LUTs in product values are stored in LUT’S. Each product value is no way replace proper calibration or color correction. stored in separate row. For the selection of product value, In computer science, a lookup table is an array that input data is acts as a address. If the input size is of length replaces runtime computation with a simpler array 5 then 25 values are to be stored. If the input length indexing operation. The savings in terms of processing increases more number of data is to be stored and it time can be significant, since retrieving a value from requires more memory. memory is often faster than undergoing an 'expensive' PROPOSED TECHNIQUE: computation or input/output operation. [1] The tables may Present technique: be precalculated and stored in static program storage, LUT optimization is the main key factor in our project, in calculated (or "pre-fetched") as part of a program's order to reduce power and area. The following techniques initialization phase (memoization), or even stored in have to be implemented in LUT to get exact optimized hardware in application-specific platforms. Lookup results. tables are also used extensively to validate input values 1. Anti symmetric Product coding (A.P.C) by matching against a list of valid (or invalid) items in 2. Modified Odd multiple storage (O.M.S) an array and, in some programming languages, may include pointer functions (or offsets to labels) to process the matching input. A conventional lookup-table (LUT)-based multiplier is shown in Fig. 1, where A is a fixed coefficient, and X is an input word to be multiplied with A. Assuming X to be a positive binary number of word length L, there can be 2L possible values of X, and accordingly, there can be 2L possible values of product C = A · X. Therefore, for memory-based multiplication, an LUT of 2L words, consisting of precomputed. Fig 1: Conventional LUTbased multiplier product values corresponding to all possible values of X, is conventionally used. The product word A · Xi is stored at the location Xi for 0 ≤ Xi ≤ 2L − 1, such that if an L-bit binary value of Xi is used In this project, for the reduction of look-up-table (LUT) size of memory-based multipliers to be used in digital signal processing applications. It is shown that by simple sign-bit exclusion, the LUT size is reduced by half at the cost of a marginal area overhead. Moreover, a novel antisymmetric product coding (APC) scheme is proposed to ISSN: 2231-5381 http://www.ijettjournal.org Page 3309 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 reduce the LUT size by further half, where the LUT found to involve less area and less multiplication time output is added with or subtracted from a fixed value. It than the existing LUT-multipliers. is shown that the optimized LUTs for small input width could be used for efficient implementation of high-precision LUT- multipliers, where the total contribution of all such fixed offsets could be added to the final result or could be initialized for successive accumulations. The proposed LUT-multiplier and the existing ones are coded in VHDL and synthesized by Synopsys Design Compiler using TSMC 90 nanometer library. The proposed optimized LUT-multiplier is Table 1.1: General LUT table The proposed APC–OMS combined design of control circuit for generating the RESET signal the LUT for L = 5 and for any coefficient and control word (s1s0) for the barrel shifter. width W is shown in Fig. 3. It consists of an The precomputed values of A × (2i + 1) are LUT of nine words of (W + 4)-bit width, a stored as Pi, for i = 0, 1, 2, . . . , 7, at the eight four-to-nine-line address decoder, a barrel consecutive locations of the memory array, as shifter, an address generation circuit, and a specified in Table II, while 2A is stored for ISSN: 2231-5381 http://www.ijettjournal.org Page 3310 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8August 2013 input X = (00000) at LUT address “1000,” as simple modification of 3-to-8-line decoder, as specified in Table III. The decoder takes the 4- shown in Fig. 4(a). The control bits s0 and s1 bit address from the address generator and to be used by the barrel shifter to produce the generates nine word-select signals, i.e., {wi, desired number of shifts of the LUT output are for 0 ≤ i ≤ 8}, to select the referenced word generated by the control circuit, according to from the LUT. The 4-to-9-line decoder is a the relations ASM chart of LUT optimization Step2: Calculate APC word of X Step3: If X(4)=1 then ALGORITHM: output <= 16A - APC word(X) Step1: Load input multiplicand value into X Else register Output <= 16A + APC word(X) Step2: Deside whether APC or OMS technique OMS: Step3: If X(4)=1 then select APC technique Step1:Takes last four bits of X Step4: Else select OMS technique Step2: Calculate s0, s1 and address APC: Step3: Depends on s0, s1 output is shifted and Step1: Take 2’s complement of X and pass to stored into final output next block Fig: Flow chart of proposed technique ISSN: 2231-5381 http://www.ijettjournal.org Page 3311 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8August 2013 SIMULATION RESULT OF LUT OPTIMIZATION: RTL Internal block: APPLICATIONS: less area and less multiplication time than the The applications of LUT optimization for existing LUT-multipliers. memory based computation are: 1. Communications: Finally, combined approach The future wireless provides a reduction in LUT size to one-fourth mutually conflicting of the conventional LUT. We will design a demands, e.g., high computational-bandwith, simple technique for selective sign reversal to be low-power consumption and reconfigurability. used in the proposed design. In future, we are Such a set of demands will continue to be a further going to reduce the power consumption challenge to the designers of computing circuits that has been consumed by the proposed LUT. and systems for the next generation wireless REFERENCES: communication. The lookup-table (LUT)-based [1] International Technology Roadmap for Semiconductors. arithmetic circuits have significant potential to [Online]. systems have three Available: http://public.itrs.net/ satisfy these requirements to a great extent. 2. 3. [2] J.-I. Guo, C.-M. Liu, and C.-W. Jen, “The efficient This is also applicable in the DSP memory-based VLSI array design for DFT and DCT,” IEEE processors. Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. This project is also useful in FIR, 39, no. 10, pp. 723–733, Oct. 1992. [3] H.-R. Lee, C.-W. Jen, and C.-M. Liu, “On the design FFT processors. automation of the memory-based VLSI architectures for FIR CONCLUSION & FUTURE SCOPE: The proposed LUT-multiplier and the existing ones are coded in VHDL and filters,” IEEE Trans. Consum.Electron., vol. 39, no. 3, pp. 619–629, Aug. 1993. [4] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, “A systolic array architecture for the discrete sine synthesized by Synopsys Design Compiler using transform,” IEEE Trans. Signal Process., vol. 50, no. 9, pp. TSMC 90 nanometer library. The proposed 2347–2354, Sep. 2002. optimized LUT-multiplier is found to involve [5] H.-C. Chen, J.-I. Guo, T.-S. Chang, and C.-W. Jen, “A memory-efficient realization of cyclic convolution and its application to discrete cosine transform,” IEEE Trans. ISSN: 2231-5381 http://www.ijettjournal.org Page 3312 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8August 2013 Circuits Syst. Video Technol., vol. 15, no. 3, pp. 445–453, Mar. 2005. [6] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, “Systolic algorithms and a memory-based design approach for a unified architecture for the computation of DCT/DST/IDCT/IDST,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 6, pp. 1125–1137, Jun. 2005. [7] P. K. Meher, “Systolic designs for DCT using a lowcomplexity concurrent convolutional formulation,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 9, pp. 1041–1050, Sep. 2006. [8] P. K. Meher, “Memory-based hardware for resourceconstrained digital signal processing systems,” in Proc. 6th Int. Conf. ICICS, Dec. 2007, pp. 1–4. [9] P. K. Meher, “New approach to LUT implementation and accumulation for memory-based multiplication,” in Proc. IEEE ISCAS, May 2009,pp. 453–456. [10] P. K. Meher, “New look-up-table optimizations for memory-based multiplication,” in Proc. ISIC, Dec. 2009, pp. 663–666. [11] A. K. Sharma, Advanced Semiconductor Memories: Architectures, Designs, andApplications. Piscataway, NJ: IEEE Press, 2003. [12] TSC4000 0.35m CMOS Standard Cell, Macro Library Summary, Texas Instmments, Application Specific Integrated Circuits, 1995. ISSN: 2231-5381 http://www.ijettjournal.org Page 3313