Designing an Efficient and Secured LUT Approach for Area Based Occupations

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7- Nov 2013
Designing an Efficient and Secured LUT Approach for
Area Based Occupations
1
1
2
D. Jahnavi, 2 Y. Ravikiran varma
M.Tech scholar, E.C.E, Sreenivasa institute of technology and management studies, Chittoor
Assistant Professor, E.C.E, Sreenivasa institute of technology and management studies, Chittoor
ABSTRACT: In this project, the implementation
of multiplier with more enhanced LUT technique
is presented. Anti-symmetric product coding
(APC)
and
odd-multiple-storage
(OMS)
techniques for lookup-table (LUT) design for
memory-based multipliers to be used in digital
signal processing applications. These two
techniques separately reduces LUT size to half. In
this project, it presents a different form of APC
and a modified OMS scheme, in order to combine
them
for
an
efficient
memory-based
multiplication. The proposed mixed approach
implements a reduction in LUT size to one-fourth
of the conventional LUT. It has also suggested a
simple technique for selective sign reversal to be
used in the proposed design. It is shown that the
proposed LUT design for small input sizes can be
used for efficient implementation of highprecision multiplication by input operand
decomposition.
Keywords: Memory based computations, antisymmetric product coding, odd-multiple-storage,
lookup-table, Digital signal processing
INTRODUCTION: In terms of the algorithms
employed, the pampers are divided into structural
and functional. Structural pampers consider the
circuit graph as a given and find a covering of the
graph with K-input sub graphs corresponding to
LUTs. The functional approaches perform
Boolean decomposition of the logic functions of
the nodes into sub-functions of limited support
size realizable by individual LUTs. Since
functional pampers explore a larger solution
space, they tend to be time-consuming, which
limits their use to small designs. In practice,
FPGA mapping for large designs is done using
structural mappers, whereas the functional
mappers are used for resynthesis after technology
mapping. In this paper, we consider the recent
work on DA Omap[2] as representative of the
advanced structural technology mapping for LUTbased FPGAs and refer to it as “the previous
work” and discuss several ways of improving it.
Field Programmable Gate Arrays (FPGAs) are an
attractive hardware design option, making
technology mapping for FPGAs an important
EDA problem. For an excellent overview of the
classical and recent work on FPGA technology
mapping, focusing on area, delay, and power
minimization, the reader is referred to [2]. The
ISSN: 2231-5381
recent advanced algorithms for FPGA mapping,
such as [2][12][16][23], focus on area
minimization under delay constraints. If delay
constraints are not given, first the optimum delay
for the given logic structure is found and then area
is minimized without changing delay. In terms of
the algorithms employed, the mappers are divided
into structural and functional. Structural mappers
consider the circuit graph as a given and find a
covering of the graph with K-input sub graphs
corresponding to LUTs. The functional
approaches perform Boolean decomposition of the
logic functions of the nodes into sub-functions of
limited support size realizable by individual
LUTs. Since functional mappers explore a larger
solution space, they tend to be time-consuming,
which limits their use to small designs. In
practice, FPGA mapping for large designs is done
using structural mappers, whereas the functional
mappers are used for resynthesis after technology
mapping. In this paper, we consider the recent
work on DA Omap [2] as representative of the
advanced structural technology mapping for LUTbased FPGAs and refer to it as “the previous
work” and discuss several ways of improving it.
LUT for Multipliers:
Multiplications can be computationally
expensive in most hardware and software
implementations. Various approaches in literature
have been proposed to alleviate this overhead,
usually at the cost of multiplication accuracy. One
such example is the conversion of multiplication
coefficients to dyadic fractions, which can be
computed with a minimal sequence of bit shifts
and additions. However, such approaches have
proved to be limiting, requiring a lot of handtweaking to simultaneously minimize the
complexity of the calculation as well as the
deviation from the desired result. Instead, a tablebased lookup scheme to implement the
multiplication steps is proposed. Whenever a
multiplication result is needed, the system can
simply look up the correct result on a precomputed table, without needing any computation
whatsoever. This greatly simplifies the transform
and inverse calculations. It is possible to store
binary data within solid-state devices. Those
storage "cells" within solid-state memory devices
are easily addressed by driving the "address" lines
of the device with the proper binary values. A
http://www.ijettjournal.org
Page 372
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7- Nov 2013
ROM memory circuit written, or programmed,
with certain data, such that the address lines of the
ROM served as inputs and the data lines of the
ROM served as outputs, generating the
characteristic response of a particular logic
function Table lookup can replace any coefficient
multiplication or unary operation. Although table
lookup is often simpler than the actual calculation,
the table size grows exponentially with the input
signal range. However, for image and video
applications, most signals are unsigned 8 bit
values, which require only 256 possible cases, so
the table based approach can be implemented with
a reasonable cost. It used to implement coefficient
multiplication, where the coefficient is 0.6834. To
avoid using a multiplier, traditional lossless
transforms approximate the given coefficient with
a dyadic fraction (for example, to ¾). Then the
coefficient multiplication can be implemented
using shifts and additions as shown. Table lookup
is also depicted .Unlike in the dyadic fraction
case, table based multiplication yields a much
more accurate approximation of the original
coefficient.
Literature Survey: a.The efficient memorybased VLSI array designs for DFT and DCT
perfect cyclic forms to facilitate an efficient
realization of 1-D N-point DCT using (N-1)/2
adders or sub tractors, one small ROM module, a
barrel shifter, and N-1/2+1 accumulators.
PROPOSED TECHNIQUE:
LUT optimization is the main key factor in our
project, in order to reduce power and area. Th e
following techniques have to be implemented in
LUT to get required qualities.
1. Anti symmetric Product coding
(A.P.C)
2. Modified Odd multiple storage
(O.M.S)
In this project, for the reduction of look-up-table
(LUT) size of memory-based multipliers to be
used in digital signal processing applications. It is
shown that by simple sign-bit exclusion, the LUT
size is reduced by half at the cost of a marginal
area overhead. Moreover, a novel anti-symmetric
product coding (APC) scheme is proposed to
reduce the LUT size by further half, where the
LUT output is added with or subtracted from a
fixed value. It is shown that the optimized LUTs
for small input width could be used for efficient
implementation
of
high-precision
LUT-
Guo, J.-I.; Liu, C.-M.; Jen, C.-W Nat. Chiao
Tung Univ., Hsinchu :Efficient memory-based
VLSI arrays and a new design approach for
the discrete Fourier transform and discrete
cosine transform are presented. The DFT and
DCT are formulated as cyclic convolution
forms and mapped into linear arrays which
characterize small numbers of I/O channels
and low I/O bandwidth.
b.On the design automation of the memorybased VLSI architectures for FIR filters
Lee,
H.-R.
Jen,
C.-W.
Liu,
C.-M.
Dept. of Electron. Eng., Nat. Chiao Tung Univ.,
Hsinchu:An approach to automating the design of
memory based VLSI architectures for FIR filters
has been developed. The automation is based on
the exploration of the design space and schemes
for efficient memory replacement, algorithm
formulation, architecture design, and evaluation
method. c.A memory-efficient realization of
cyclic convolution and its application to
discrete cosine transform
The memory efficient design for realizing the
cyclic convolution and its application to the
discrete cosine transform. To adopt the method of
distributed arithmetic computation, and exploit
the symmetry property of DCT coefficients to
merge the elements in the matrix of the DCT
kernel and then separate the kernel to be two
ISSN: 2231-5381
multipliers, where the total contribution of all
such fixed offsets could be added to the final
result or could be initialized for successive
accumulations. The proposed optimized LUTmultiplier is found to involve less area and less
multiplication time than the existing LUTmultipliers.
http://www.ijettjournal.org
Page 373
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7- Nov
2013
The proposed APC–OMS combined design of
the LUT for L = 5 and for any coefficient width
W is shown in Fig. 3. It consists of an LUT of
nine words of (W + 4)-bit width, a four-to-nineline address decoder, a barrel shifter, an address
generation circuit, and a control circuit for
generating the RESET signal and control word
(s1s0) for the barrel shifter. The precomputed
values of A × (2i + 1) are stored as Pi, for i = 0,
1, 2, . . . , 7, at the eight consecutive locations of
the memory array, as specified in Table II,
while 2A is stored for input X
= (00000) at LUT address “1000,” as specified
in Table III. The decoder takes the 4-bit address
from the address generator and generates nine
word-select signals, i.e., {wi, for 0 ≤ i ≤ 8}, to
select the referenced word from the LUT. The
4-to-9-line decoder is a simple modification of
3-to-8-line decoder, as shown in Fig. 4(a). The
control bits s0 and s1 to be used by the barrel
shifter to produce the desired number of shifts
of the LUT output are generated by the control
circuit, according to the relations
Step2: Calculate APC word of X
Step3: If X(4)=1 then
output <= 16A - APC word(X)
Else
Output <= 16A + APC word(X)
OMS:
Step1:Takes last four bits of X
Step2: Calculate s0, s1 and address
Step3: Depends on s0, s1 output is shifted and
stored into final output
Proposed System Architecture
A new approach to LUT design is
presented, where only the odd multiples of the
fixed coefficient are required to be stored,
which is referred to as the odd-multiple-storage
scheme in this brief. In addition, we have shown
that, by the anti-symmetric product coding
approach, the LUT size can also be reduced to
half, where the product words are recoded as
Anti-symmetric pairs.
Fig: Architecure of Present method
If the input bit size= 5 then the memory
stored is of 2^5/2 = 15 locations which results in
a reduction in LUT size by factor of 2.
Hardware Environment:
ISSN: 2231-5381
FPGA Implementation
FPGA
stands
for
field
programmable gate arrays that can be configured
http://www.ijettjournal.org
Page 374
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7- Nov 2013
by the customer or designer after manufacturing.
specify how the chip will work. FPGAs contain
Field programmable gate arrays are called this
programmable logic components called "logic
because rather than having a structure similar to
blocks" and a hierarchy of reconfigurable
a PAL or other programmable device, they are
interconnects that allow the blocks to be "wired
structured very much like a gate array ASIC.
together". The programmable logic blocks are
This makes FPGAs very nice for use in
called
prototyping ASICs, or in places where and ASIC
reconfigurable interconnects are called switch
will eventually be used. For example an FPGA
boxes. Logic blocks can be configured to
may be used in a design that needs to get to
perform complex combinational functions, or
market quickly regardless of cost. Later an ASIC
merely simple logic gates like AND and XOR. In
can be used in place of the FPGA when the
most FPGAs, the logic blocks also include
production volume increases, in order to reduce
memory elements, which may be simple flip-
cost. FPGAs are programmed using a logic
flops or more complete blocks of memory.
configurable
logic
blocks
and
circuit diagram or a source code in a HDL to
Fig: Flow chart of proposed technique
ISSN: 2231-5381
http://www.ijettjournal.org
Page 375
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7- Nov 2013
SIMULATION
RESULT
OPTIMIZATION:
OF
LUT
APPLICATIONS:
The applications of LUT optimization for
memory based computation are:
1. Bio-medical: The total body wireless
operations systems have nano components like
nano cameras, CROs. Nano caeras have to be
designed with less area occupancy inorderto
embed in to human body. So, in design of those
nano devices LUTs plays a vital role.
CONCLUSION : Finally, an advanced and
efficient LUT based multiplier is designed with
reduction in area and barrel shifters. This yields
multiple through put and gives huge applications
with more comfort. Implementation of this type
of LUT plays vital role in all type of applications
such
Biomedical,
tele
communications,
militaries.
REFERENCES:
[l] A. V. Oppenheim and R. W. Schaffer,
Discrete Time
Signal Processing, Prentice Hall, 1989
[2] S. A. White, "Applications of Distributed
Arithmetic to Digital Signal Processing: A
Tutorial Review", IEEE ASSP Magazine, July
1989, pp. 4-19
[3] M. Mehendale, S. D. Sherlekar and G.
Venkatesh, "Area-Delay Tradeoff in Distributed
Arithmetic based Implementation of FIR Filters",
VLSI Design 97, pp. 134-129
[4] S. Wolter, A. Schubert, H. Matz, R. Laur,
"On the Comparison between Achitectures for
the Implementationof Distributed Arithmetic",
ISCAS 93, pp. 1829-1832
[5] K. Nourji and N. Demassieux, "Optimization
of Real- Time VLSI Architectures for
Distributed Arithmetic based Algorithms :
ISSN: 2231-5381
Application to HDTV Filters",ISCAS 94, vol. 4,
pp. 223-226
[6] E. M. Sentovich et. al. "SIS: A System for
Sequential Circuit Synthesis", Memorandum No.
UCB/ERL M92/41
[8] V. S. Rosa, E. Costa, S. Bampi. “A High
Performance Parallel FIR Filters Generation
Tool”. In Iberchip, San Jose:Costa Rica, 2006.
[9] Altera Corporation, 101 Innovation Drive,
San
Jose,California
95134,
USA.
http://www.altera.com
[10] Xilinx, Inc. http://www.xilinx.com
[11] Hamming, R. W. “Digital Filters”, Prentice
Hall, 3rd ed., 1989.
[12] A. K. Sharma, AdvancedSemiconductor
MemoriesArchitectures,
ignsandApplications.
scataway,NJ:IEEEPress,2003.
[13]K.Meher,―NewapproachtoLUTimplementat
ionandaccumulationformemorybased
multiplication,‖inProc. IEEEISCAS,May2009
[14] P.K.Meher,―Memorybasedhardwareforresource-constrained digital
signalprocessingsystems,‖inProc.6thInt.Conf
.ICI CS,Dec.2007, pp.1–4.
[15] International TechnologyRoadmap for
Semiconductors. [Online].
[16]P.K.Meher,―Newlook-up-Table
optimizationsformemoryb
based
multiplication,‖inProc.ISIC,Dec.2009,pp. 663–
666.
[17] D. F. Chiper, M. N. S. Swamy, M. O.
Ahmad, and T. Stouraitis, ―A systolic array
architecture for the discrete sine transform,‖
IEEE Trans. Signal Process., vol. 50, no. 9, pp.
2347–2354, Sep. 2002.
[18] H.-C. Chen, J.-I. Guo, T.-S. Chang and C.W. Jen, ―A memory-efficient realization of
cyclic convolution and its application to discrete
cosine transform,‖ IEEE Trans. Circuits Syst.
Video Technol., vol. 15, no. 3, pp. 445–453,
Mar. 2005
[19] P. K. Meher, ―Systolic designs for DCT
using a lowcomplexity
concurrent
convolutional
formulation,‖ IEEE
Trans. Circuits Syst. Video Technol., vol. 16, no.
9, pp.
1041–1050, Sep. 2006.
[20] P. K. Meher, ―New approach to LUT
implementation
and
accumulation
for
memory-based
multiplication,‖ in
Proc. IEEE ISCAS, May 2009, pp. 453–456.
http://www.ijettjournal.org
Page 376
Download