Design and Implementation of an Enhanced LUT System in Security... International Journal of Engineering Trends and Technology (IJETT) - Volume4...

advertisement
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013
Design and Implementation of an Enhanced LUT System in Security Based Computation
dama.dhanalakshmi1, K.Annapurna2
Vignan Universit, Guntur district
ABSTRACT: In this project, the anti-symmetric
process generations, and so do not take deep-submicron
product coding (APC) and odd-multiple-storage (OMS)
electrical effects into account. In the present work, we
techniques for lookup-table (LUT) design for memory-
perform detailed transistor-level design of circuits and
based multipliers are presented to be used in digital
perform appropriate buffer and transistor sizing for all the
signal processing applications. All these techniques
logic and routing elements. Field Programmable Gate
results in the reduction of the LUT size by a factor of
Arrays (FPGAs) are an attractive hardware design option,
two. We present a different form of APC and a modified
making technology mapping for FPGAs an important
OMS scheme, in order to combine them for efficient
EDA problem. For an excellent overview of the classical
memory-based multiplication. The proposed combined
and recent work on FPGA technology mapping, focusing
approach provides a reduction in LUT size to one-fourth
on area, delay, and power minimization, the reader is
of the conventional LUT. It has also suggested a simple
referred to [2]. The recent advanced algorithms for FPGA
technique for selective sign reversal to be used in the
mapping, such as [2][12][16][23], focus on area
proposed design. It is shown that the proposed LUT
minimization under delay constraints. If delay constraints
design for small input sizes can be used for efficient
are not given, first the optimum delay for the given logic
implementation of high-precision multiplication by
structure is found and then area is minimized without
input operand decomposition.
changing delay. In terms of the algorithms employed, the
Keywords:
anti-symmetric
odd-
mappers are divided into structural and functional.
multiple-storage, lookup-table, Digital signal processing
Structural mappers consider the circuit graph as a given
INTRODUCTION: A look-up table (LUT) size of 4 is
and find a covering of the graph with K-input subgraphs
the most area efficient in a non clustered context. A
corresponding to LUTs. Since functional mappers explore
LUT size of 5 to 6 gave the best performance. The work
a larger solution space, they tend to be time-consuming,
in [12] has suggested that using a heterogeneous
which limits their use to small designs. In practice, FPGA
mixture of LUT sizes of 2 and 3 was equivalent in area
mapping for large designs is done using structural
efficiency to a LUT size of 4 and, hence, could be a
mappers, whereas the functional mappers are used for
good choice. In addition, [1] states that a logic structure
resynthesis after technology mapping. In this paper, we
using two three-input LUTs was most beneficial in
consider
terms of area and speed. However, it must be noted that
representative of the advanced structural technology
both these last two papers did not perform a full area or
mapping for LUT-based FPGAs and refer to it as “the
delay study where a range of LUT sizes was examined.
previous work” and discuss several ways of improving it.
First, prior work focused on non clustered logic blocks,
LOOK UP TABLE:
which are known to have a significant impact on the
LUT means “Look Up Table.” It’s helpful to think of it
area and delay [21]. Second, most prior studies tended
like a math problem: R= S+L
to look at area or delay, but not both as we will here.
“R” being your result or what you want to attain.“S”
Third, prior results were based on IC process
being your source or what you start with. “L” being your
generations that are several factors larger than current
LUT or the difference needed to make up between your
ISSN: 2231-5381
product
coding,
the
http://www.ijettjournal.org
recent
work
on
DAOmap
Page 3308
[2]
as
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013
source and your desired outcome. In all cases of LUT
as the address for the LUT, then the corresponding
use, the LUT is the means to make up the difference
product value A · Xi is available as its output. Let input be
between source and result.((All cases assume the
“X” , and it should be multiplied with A. The products are
colorist (or you) is grading through a correctly
as shown in second column of above table. In our design
calibrated monitor for evaluation and finishing. LUTs in
product values are stored in LUT’S. Each product value is
no way replace proper calibration or color correction.
stored in separate row. For the selection of product value,
In computer science, a lookup table is an array that
input data is acts as a address. If the input size is of length
replaces runtime computation with a simpler array
5 then 25 values are to be stored. If the input length
indexing operation. The savings in terms of processing
increases more number of data is to be stored and it
time can be significant, since retrieving a value from
requires more memory.
memory is often faster than undergoing an 'expensive'
PROPOSED TECHNIQUE:
computation or input/output operation.
[1]
The tables may
Present technique:
be precalculated and stored in static program storage,
LUT optimization is the main key factor in our project, in
calculated (or "pre-fetched") as part of a program's
order to reduce power and area. The following techniques
initialization phase (memoization), or even stored in
have to be implemented in LUT to get exact optimized
hardware in application-specific platforms. Lookup
results.
tables are also used extensively to validate input values
1. Anti symmetric Product coding (A.P.C)
by matching against a list of valid (or invalid) items in
2. Modified Odd multiple storage (O.M.S)
an array and, in some programming languages, may
include pointer functions (or offsets to labels) to process
the matching input.
A conventional lookup-table (LUT)-based multiplier is
shown in Fig. 1, where A is a fixed coefficient, and X is
an input word to be multiplied with A. Assuming X to be
a positive binary number of word length L, there can be
2L possible values of X, and accordingly, there can be
2L possible values of product C = A · X. Therefore, for
memory-based multiplication, an LUT of 2L words,
consisting of precomputed. Fig 1: Conventional LUTbased multiplier product values corresponding to all
possible values of X, is conventionally used. The
product word A · Xi is stored at the location Xi for 0 ≤ Xi
≤ 2L − 1, such that if an L-bit binary value of Xi is used
In this project, for the reduction of look-up-table (LUT)
size of memory-based multipliers to be used in digital
signal processing applications. It is shown that by simple
sign-bit exclusion, the LUT size is reduced by half at the
cost of a marginal area overhead. Moreover, a novel antisymmetric product coding (APC) scheme is proposed to
ISSN: 2231-5381
http://www.ijettjournal.org
Page 3309
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013
reduce the LUT size by further half, where the LUT
found to involve less area and less multiplication time
output is added with or subtracted from a fixed value. It
than the existing LUT-multipliers.
is shown that the
optimized LUTs for small input width could be used for
efficient
implementation
of
high-precision
LUT-
multipliers, where the total contribution of all such fixed
offsets could be added to the final result or could be
initialized for successive accumulations. The proposed
LUT-multiplier and the existing ones are coded in
VHDL and synthesized by Synopsys Design Compiler
using TSMC 90 nanometer library. The proposed
optimized LUT-multiplier is
Table 1.1: General LUT table
The proposed APC–OMS combined design of
control circuit for generating the RESET signal
the LUT for L = 5 and for any coefficient
and control word (s1s0) for the barrel shifter.
width W is shown in Fig. 3. It consists of an
The precomputed values of A × (2i + 1) are
LUT of nine words of (W + 4)-bit width, a
stored as Pi, for i = 0, 1, 2, . . . , 7, at the eight
four-to-nine-line address decoder, a barrel
consecutive locations of the memory array, as
shifter, an address generation circuit, and a
specified in Table II, while 2A is stored for
ISSN: 2231-5381
http://www.ijettjournal.org
Page 3310
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8August 2013
input X = (00000) at LUT address “1000,” as
simple modification of 3-to-8-line decoder, as
specified in Table III. The decoder takes the 4-
shown in Fig. 4(a). The control bits s0 and s1
bit address from the address generator and
to be used by the barrel shifter to produce the
generates nine word-select signals, i.e., {wi,
desired number of shifts of the LUT output are
for 0 ≤ i ≤ 8}, to select the referenced word
generated by the control circuit, according to
from the LUT. The 4-to-9-line decoder is a
the relations
ASM chart of LUT optimization
Step2: Calculate APC word of X
Step3: If X(4)=1 then
ALGORITHM:
output <= 16A - APC word(X)
Step1: Load input multiplicand value into X
Else
register
Output <= 16A + APC word(X)
Step2: Deside whether APC or OMS technique
OMS:
Step3: If X(4)=1 then select APC technique
Step1:Takes last four bits of X
Step4: Else select OMS technique
Step2: Calculate s0, s1 and address
APC:
Step3: Depends on s0, s1 output is shifted and
Step1: Take 2’s complement of X and pass to
stored into final output
next block
Fig: Flow chart of proposed technique
ISSN: 2231-5381
http://www.ijettjournal.org
Page 3311
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8August 2013
SIMULATION RESULT OF LUT OPTIMIZATION:
RTL Internal block:
APPLICATIONS:
less area and less multiplication time than the
The applications of LUT optimization for
existing LUT-multipliers.
memory based computation are:
1.
Communications:
Finally, combined approach
The future wireless
provides a reduction in LUT size to one-fourth
mutually
conflicting
of the conventional LUT. We will design a
demands, e.g., high computational-bandwith,
simple technique for selective sign reversal to be
low-power consumption and reconfigurability.
used in the proposed design. In future, we are
Such a set of demands will continue to be a
further going to reduce the power consumption
challenge to the designers of computing circuits
that has been consumed by the proposed LUT.
and systems for the next generation wireless
REFERENCES:
communication. The lookup-table (LUT)-based
[1] International Technology Roadmap for Semiconductors.
arithmetic circuits have significant potential to
[Online].
systems
have
three
Available: http://public.itrs.net/
satisfy these requirements to a great extent.
2.
3.
[2] J.-I. Guo, C.-M. Liu, and C.-W. Jen, “The efficient
This is also applicable in the DSP
memory-based VLSI array design for DFT and DCT,” IEEE
processors.
Trans. Circuits Syst. II, Analog Digit. Signal Process., vol.
This project is also useful in FIR,
39, no. 10, pp. 723–733, Oct. 1992.
[3] H.-R. Lee, C.-W. Jen, and C.-M. Liu, “On the design
FFT processors.
automation of the memory-based VLSI architectures for FIR
CONCLUSION & FUTURE SCOPE:
The proposed LUT-multiplier and the
existing ones are coded in VHDL and
filters,” IEEE Trans. Consum.Electron., vol. 39, no. 3, pp.
619–629, Aug. 1993.
[4] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T.
Stouraitis, “A systolic array architecture for the discrete sine
synthesized by Synopsys Design Compiler using
transform,” IEEE Trans. Signal Process., vol. 50, no. 9, pp.
TSMC 90 nanometer library. The proposed
2347–2354, Sep. 2002.
optimized LUT-multiplier is found to involve
[5] H.-C. Chen, J.-I. Guo, T.-S. Chang, and C.-W. Jen, “A
memory-efficient realization of cyclic convolution and its
application to discrete cosine transform,” IEEE Trans.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 3312
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8August 2013
Circuits Syst. Video Technol., vol. 15, no. 3, pp. 445–453,
Mar. 2005.
[6] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T.
Stouraitis, “Systolic algorithms and a memory-based design
approach for a unified architecture for the computation of
DCT/DST/IDCT/IDST,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 52, no. 6, pp. 1125–1137, Jun. 2005.
[7] P. K. Meher, “Systolic designs for DCT using a lowcomplexity concurrent convolutional formulation,” IEEE
Trans. Circuits Syst. Video Technol., vol. 16, no. 9, pp.
1041–1050, Sep. 2006.
[8] P. K. Meher, “Memory-based hardware for resourceconstrained digital signal processing systems,” in Proc. 6th
Int. Conf. ICICS, Dec. 2007, pp. 1–4.
[9] P. K. Meher, “New approach to LUT implementation
and accumulation for memory-based multiplication,” in
Proc. IEEE ISCAS, May 2009,pp. 453–456.
[10] P. K. Meher, “New look-up-table optimizations for
memory-based multiplication,” in Proc. ISIC, Dec. 2009, pp.
663–666.
[11] A. K. Sharma, Advanced Semiconductor Memories:
Architectures, Designs, andApplications. Piscataway, NJ:
IEEE Press, 2003.
[12] TSC4000 0.35m CMOS Standard Cell, Macro Library
Summary, Texas Instmments, Application Specific
Integrated Circuits, 1995.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 3313
Download