International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7- Nov 2013 Designing an Efficient and Secured LUT Approach for Area Based Occupations 1 1 2 D. Jahnavi, 2 Y. Ravikiran varma M.Tech scholar, E.C.E, Sreenivasa institute of technology and management studies, Chittoor Assistant Professor, E.C.E, Sreenivasa institute of technology and management studies, Chittoor ABSTRACT: In this project, the implementation of multiplier with more enhanced LUT technique is presented. Anti-symmetric product coding (APC) and odd-multiple-storage (OMS) techniques for lookup-table (LUT) design for memory-based multipliers to be used in digital signal processing applications. These two techniques separately reduces LUT size to half. In this project, it presents a different form of APC and a modified OMS scheme, in order to combine them for an efficient memory-based multiplication. The proposed mixed approach implements a reduction in LUT size to one-fourth of the conventional LUT. It has also suggested a simple technique for selective sign reversal to be used in the proposed design. It is shown that the proposed LUT design for small input sizes can be used for efficient implementation of highprecision multiplication by input operand decomposition. Keywords: Memory based computations, antisymmetric product coding, odd-multiple-storage, lookup-table, Digital signal processing INTRODUCTION: In terms of the algorithms employed, the pampers are divided into structural and functional. Structural pampers consider the circuit graph as a given and find a covering of the graph with K-input sub graphs corresponding to LUTs. The functional approaches perform Boolean decomposition of the logic functions of the nodes into sub-functions of limited support size realizable by individual LUTs. Since functional pampers explore a larger solution space, they tend to be time-consuming, which limits their use to small designs. In practice, FPGA mapping for large designs is done using structural mappers, whereas the functional mappers are used for resynthesis after technology mapping. In this paper, we consider the recent work on DA Omap[2] as representative of the advanced structural technology mapping for LUTbased FPGAs and refer to it as “the previous work” and discuss several ways of improving it. Field Programmable Gate Arrays (FPGAs) are an attractive hardware design option, making technology mapping for FPGAs an important EDA problem. For an excellent overview of the classical and recent work on FPGA technology mapping, focusing on area, delay, and power minimization, the reader is referred to [2]. The ISSN: 2231-5381 recent advanced algorithms for FPGA mapping, such as [2][12][16][23], focus on area minimization under delay constraints. If delay constraints are not given, first the optimum delay for the given logic structure is found and then area is minimized without changing delay. In terms of the algorithms employed, the mappers are divided into structural and functional. Structural mappers consider the circuit graph as a given and find a covering of the graph with K-input sub graphs corresponding to LUTs. The functional approaches perform Boolean decomposition of the logic functions of the nodes into sub-functions of limited support size realizable by individual LUTs. Since functional mappers explore a larger solution space, they tend to be time-consuming, which limits their use to small designs. In practice, FPGA mapping for large designs is done using structural mappers, whereas the functional mappers are used for resynthesis after technology mapping. In this paper, we consider the recent work on DA Omap [2] as representative of the advanced structural technology mapping for LUTbased FPGAs and refer to it as “the previous work” and discuss several ways of improving it. LUT for Multipliers: Multiplications can be computationally expensive in most hardware and software implementations. Various approaches in literature have been proposed to alleviate this overhead, usually at the cost of multiplication accuracy. One such example is the conversion of multiplication coefficients to dyadic fractions, which can be computed with a minimal sequence of bit shifts and additions. However, such approaches have proved to be limiting, requiring a lot of handtweaking to simultaneously minimize the complexity of the calculation as well as the deviation from the desired result. Instead, a tablebased lookup scheme to implement the multiplication steps is proposed. Whenever a multiplication result is needed, the system can simply look up the correct result on a precomputed table, without needing any computation whatsoever. This greatly simplifies the transform and inverse calculations. It is possible to store binary data within solid-state devices. Those storage "cells" within solid-state memory devices are easily addressed by driving the "address" lines of the device with the proper binary values. A http://www.ijettjournal.org Page 372 International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7- Nov 2013 ROM memory circuit written, or programmed, with certain data, such that the address lines of the ROM served as inputs and the data lines of the ROM served as outputs, generating the characteristic response of a particular logic function Table lookup can replace any coefficient multiplication or unary operation. Although table lookup is often simpler than the actual calculation, the table size grows exponentially with the input signal range. However, for image and video applications, most signals are unsigned 8 bit values, which require only 256 possible cases, so the table based approach can be implemented with a reasonable cost. It used to implement coefficient multiplication, where the coefficient is 0.6834. To avoid using a multiplier, traditional lossless transforms approximate the given coefficient with a dyadic fraction (for example, to ¾). Then the coefficient multiplication can be implemented using shifts and additions as shown. Table lookup is also depicted .Unlike in the dyadic fraction case, table based multiplication yields a much more accurate approximation of the original coefficient. Literature Survey: a.The efficient memorybased VLSI array designs for DFT and DCT perfect cyclic forms to facilitate an efficient realization of 1-D N-point DCT using (N-1)/2 adders or sub tractors, one small ROM module, a barrel shifter, and N-1/2+1 accumulators. PROPOSED TECHNIQUE: LUT optimization is the main key factor in our project, in order to reduce power and area. Th e following techniques have to be implemented in LUT to get required qualities. 1. Anti symmetric Product coding (A.P.C) 2. Modified Odd multiple storage (O.M.S) In this project, for the reduction of look-up-table (LUT) size of memory-based multipliers to be used in digital signal processing applications. It is shown that by simple sign-bit exclusion, the LUT size is reduced by half at the cost of a marginal area overhead. Moreover, a novel anti-symmetric product coding (APC) scheme is proposed to reduce the LUT size by further half, where the LUT output is added with or subtracted from a fixed value. It is shown that the optimized LUTs for small input width could be used for efficient implementation of high-precision LUT- Guo, J.-I.; Liu, C.-M.; Jen, C.-W Nat. Chiao Tung Univ., Hsinchu :Efficient memory-based VLSI arrays and a new design approach for the discrete Fourier transform and discrete cosine transform are presented. The DFT and DCT are formulated as cyclic convolution forms and mapped into linear arrays which characterize small numbers of I/O channels and low I/O bandwidth. b.On the design automation of the memorybased VLSI architectures for FIR filters Lee, H.-R. Jen, C.-W. Liu, C.-M. Dept. of Electron. Eng., Nat. Chiao Tung Univ., Hsinchu:An approach to automating the design of memory based VLSI architectures for FIR filters has been developed. The automation is based on the exploration of the design space and schemes for efficient memory replacement, algorithm formulation, architecture design, and evaluation method. c.A memory-efficient realization of cyclic convolution and its application to discrete cosine transform The memory efficient design for realizing the cyclic convolution and its application to the discrete cosine transform. To adopt the method of distributed arithmetic computation, and exploit the symmetry property of DCT coefficients to merge the elements in the matrix of the DCT kernel and then separate the kernel to be two ISSN: 2231-5381 multipliers, where the total contribution of all such fixed offsets could be added to the final result or could be initialized for successive accumulations. The proposed optimized LUTmultiplier is found to involve less area and less multiplication time than the existing LUTmultipliers. http://www.ijettjournal.org Page 373 International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7- Nov 2013 The proposed APC–OMS combined design of the LUT for L = 5 and for any coefficient width W is shown in Fig. 3. It consists of an LUT of nine words of (W + 4)-bit width, a four-to-nineline address decoder, a barrel shifter, an address generation circuit, and a control circuit for generating the RESET signal and control word (s1s0) for the barrel shifter. The precomputed values of A × (2i + 1) are stored as Pi, for i = 0, 1, 2, . . . , 7, at the eight consecutive locations of the memory array, as specified in Table II, while 2A is stored for input X = (00000) at LUT address “1000,” as specified in Table III. The decoder takes the 4-bit address from the address generator and generates nine word-select signals, i.e., {wi, for 0 ≤ i ≤ 8}, to select the referenced word from the LUT. The 4-to-9-line decoder is a simple modification of 3-to-8-line decoder, as shown in Fig. 4(a). The control bits s0 and s1 to be used by the barrel shifter to produce the desired number of shifts of the LUT output are generated by the control circuit, according to the relations Step2: Calculate APC word of X Step3: If X(4)=1 then output <= 16A - APC word(X) Else Output <= 16A + APC word(X) OMS: Step1:Takes last four bits of X Step2: Calculate s0, s1 and address Step3: Depends on s0, s1 output is shifted and stored into final output Proposed System Architecture A new approach to LUT design is presented, where only the odd multiples of the fixed coefficient are required to be stored, which is referred to as the odd-multiple-storage scheme in this brief. In addition, we have shown that, by the anti-symmetric product coding approach, the LUT size can also be reduced to half, where the product words are recoded as Anti-symmetric pairs. Fig: Architecure of Present method If the input bit size= 5 then the memory stored is of 2^5/2 = 15 locations which results in a reduction in LUT size by factor of 2. Hardware Environment: ISSN: 2231-5381 FPGA Implementation FPGA stands for field programmable gate arrays that can be configured http://www.ijettjournal.org Page 374 International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7- Nov 2013 by the customer or designer after manufacturing. specify how the chip will work. FPGAs contain Field programmable gate arrays are called this programmable logic components called "logic because rather than having a structure similar to blocks" and a hierarchy of reconfigurable a PAL or other programmable device, they are interconnects that allow the blocks to be "wired structured very much like a gate array ASIC. together". The programmable logic blocks are This makes FPGAs very nice for use in called prototyping ASICs, or in places where and ASIC reconfigurable interconnects are called switch will eventually be used. For example an FPGA boxes. Logic blocks can be configured to may be used in a design that needs to get to perform complex combinational functions, or market quickly regardless of cost. Later an ASIC merely simple logic gates like AND and XOR. In can be used in place of the FPGA when the most FPGAs, the logic blocks also include production volume increases, in order to reduce memory elements, which may be simple flip- cost. FPGAs are programmed using a logic flops or more complete blocks of memory. configurable logic blocks and circuit diagram or a source code in a HDL to Fig: Flow chart of proposed technique ISSN: 2231-5381 http://www.ijettjournal.org Page 375 International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7- Nov 2013 SIMULATION RESULT OPTIMIZATION: OF LUT APPLICATIONS: The applications of LUT optimization for memory based computation are: 1. Bio-medical: The total body wireless operations systems have nano components like nano cameras, CROs. Nano caeras have to be designed with less area occupancy inorderto embed in to human body. So, in design of those nano devices LUTs plays a vital role. CONCLUSION : Finally, an advanced and efficient LUT based multiplier is designed with reduction in area and barrel shifters. This yields multiple through put and gives huge applications with more comfort. Implementation of this type of LUT plays vital role in all type of applications such Biomedical, tele communications, militaries. REFERENCES: [l] A. V. Oppenheim and R. W. Schaffer, Discrete Time Signal Processing, Prentice Hall, 1989 [2] S. A. White, "Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review", IEEE ASSP Magazine, July 1989, pp. 4-19 [3] M. Mehendale, S. D. Sherlekar and G. Venkatesh, "Area-Delay Tradeoff in Distributed Arithmetic based Implementation of FIR Filters", VLSI Design 97, pp. 134-129 [4] S. Wolter, A. Schubert, H. Matz, R. Laur, "On the Comparison between Achitectures for the Implementationof Distributed Arithmetic", ISCAS 93, pp. 1829-1832 [5] K. Nourji and N. Demassieux, "Optimization of Real- Time VLSI Architectures for Distributed Arithmetic based Algorithms : ISSN: 2231-5381 Application to HDTV Filters",ISCAS 94, vol. 4, pp. 223-226 [6] E. M. Sentovich et. al. "SIS: A System for Sequential Circuit Synthesis", Memorandum No. UCB/ERL M92/41 [8] V. S. Rosa, E. Costa, S. Bampi. “A High Performance Parallel FIR Filters Generation Tool”. In Iberchip, San Jose:Costa Rica, 2006. [9] Altera Corporation, 101 Innovation Drive, San Jose,California 95134, USA. http://www.altera.com [10] Xilinx, Inc. http://www.xilinx.com [11] Hamming, R. W. “Digital Filters”, Prentice Hall, 3rd ed., 1989. [12] A. K. Sharma, AdvancedSemiconductor MemoriesArchitectures, ignsandApplications. scataway,NJ:IEEEPress,2003. [13]K.Meher,―NewapproachtoLUTimplementat ionandaccumulationformemorybased multiplication,‖inProc. IEEEISCAS,May2009 [14] P.K.Meher,―Memorybasedhardwareforresource-constrained digital signalprocessingsystems,‖inProc.6thInt.Conf .ICI CS,Dec.2007, pp.1–4. [15] International TechnologyRoadmap for Semiconductors. [Online]. [16]P.K.Meher,―Newlook-up-Table optimizationsformemoryb based multiplication,‖inProc.ISIC,Dec.2009,pp. 663– 666. [17] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, ―A systolic array architecture for the discrete sine transform,‖ IEEE Trans. Signal Process., vol. 50, no. 9, pp. 2347–2354, Sep. 2002. [18] H.-C. Chen, J.-I. Guo, T.-S. Chang and C.W. Jen, ―A memory-efficient realization of cyclic convolution and its application to discrete cosine transform,‖ IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 3, pp. 445–453, Mar. 2005 [19] P. K. Meher, ―Systolic designs for DCT using a lowcomplexity concurrent convolutional formulation,‖ IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 9, pp. 1041–1050, Sep. 2006. [20] P. K. Meher, ―New approach to LUT implementation and accumulation for memory-based multiplication,‖ in Proc. IEEE ISCAS, May 2009, pp. 453–456. http://www.ijettjournal.org Page 376