International Journal of Engineering Trends and Technology (IJETT) – Volume... Thamma Sai Sireesha G. Malyadri

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
Modular delay Commutator for DHT algorithm
Thamma Sai Sireesha 1 G. Malyadri 2
1
PG Student (M.Tech), Dept. Of ECE, KKR & KSR Institute of Technology & Sciences, Guntur
2
Assistant Professor, Dept. Of ECE, KKR & KSR Institute of Technology & Sciences, Guntur
____________________________________________________________________________________
Abstract— In this paper, a new VLSI DHT algorithm that is well suited for a VLSI implementation on a highly
parallel and modular architecture is proposed. It can be used for designing a completely novel VLSI architecture for
DHT. Proposed discrete Hartley transform (DHT) that can be efficiently implemented on a highly modular and
parallel VLSI architecture having a regular structure is presented. The DHT algorithm can be efficiently split on
several parallel parts that can be executed concurrently. Moreover, the proposed algorithm is well suited for the
subexpression sharing technique that can be used to significantly reduce the hardware complexity of the highly
parallel VLSI implementation. Using the advantages of the proposed algorithm and the fact that we can efficiently
share the multipliers with the same constant, the number of the multipliers has been significantly reduced such that
the number of multipliers is very small comparing with that of the existing algorithms.
Keywords—Fast Fourier transforms (FFT), Discrete Hartley Transform (DHT), VLSI architecture, Domain
Processing.
I. INTRODUCTION
Image compression, the art and science of reducing
the amount of data required to represent an image, is one
of the most useful and commercially successful
technologies in the field of digital image processing.
Digital image and video compression is now very
essential. Internet teleconferencing, High Definition
Television (HDTV), satellite communications and
digital storage of movies would not be feasible unless a
high degree of compression is achieved.
Compression is useful as it helps in reduction
of the usage of expensive resources, such as memory
(hard disks), or the transmission bandwidth required. In
today’s age of competition where everything is reducing
its size every minute, the smaller is the better. But on the
downside, compression techniques result in distortion
(due to lossy compression schemes) and also additional
computational resources are required for compressiondecompression of the data.
The Discrete Fourier transform (DFT) is used in
many digital signal processing applications as in signal
and image compression techniques, filter banks [1],
signal representation, or harmonic analysis [2].
ISSN: 2231-5381
The discrete Hartley transform (DHT) can be used
to efficiently replace the DFT when the input sequence
is real. The classical split-radix algorithm is difficult to
implement on VLSI due to its irregular computational
structure and due to the fact that the butterflies
significantly differ from stage to stage. Thus, it is
necessary to derive new such algorithms that are suited
for a parallel VLSI system.
In the first step of encoding process the image f(x, y)
is mapped to a format to reduce spatial redundancy [2].
The various transforms used for mapping are
• Discrete cosine transform
• Discrete wavelet transform
• Discrete Hartley transform
Next quantization is done, where the loss of
information takes place. Since it is an irreversible
process, we can omit this step for a lossless coding
technique.
The final step is symbol coding, where various
coding techniques can be used to represent the
information in minimum possible number of bits. The
various coding techniques used are Huffman coding,
run-length coding, LZW coding, bit plane coding, block
transform coding and many other.
http://www.ijettjournal.org
Page 34
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
Formula
Formally, the discrete Hartley transform is a linear,
invertible function H: R→R (where R denotes the set of
real numbers). The N real numbers 0 1… −1 are
transformed into N real numbers 0 1… −1
according to the formula
N 1
Figure 1 Functional block diagram of a general image
compression system
The FPGAs need to be programmed i.e.
configuring the logic circuits and interconnection
switches to implement a desired structural circuit.
Applications of FPGAs include digital signal processing,
software-defined radio, aerospace and defence systems,
ASIC prototyping, medical imaging, computer vision,
speech recognition, cryptography, bioinformatics,
computer hardware emulation, radio astronomy, metal
detection and a growing range of other areas.
The human eye is fairly good at seeing small
differences in brightness over a relatively large area, but
not so good at distinguishing the exact strength of a high
frequency brightness variation.
This fact allows one to get away with a greatly
reduced amount of information in the high frequency
components. This is done by simply dividing each
component in the frequency domain by a constant for
that component, and then rounding to the nearest integer.
This is the main lossy operation in the whole
process. As a result of this, it is typically the case that
many of the higher frequency components are rounded
to zero, and many of the rest become small positive or
negative numbers.
H k   xn (cos
n 0
2 nk
2 nk
 sin
)
N
N
The inverse transform is given by:
xn 
1
N
N 1
H
k 0
k
(cos
2 nk
2 nk
 sin
)
N
N
The cas function is given by:
cas(
2 nk
2 nk
2 nk
)  cos
 sin
N
N
N
And one of the properties of cas function is:
2cas(a  b)  cas(a)cas(b)  cas(a)cas(b)  cas(a)cas(b)  cas(a)cas(b)
2 –Dimensional DHT of an array x (m, n) of size
MxN may be defined as:
M
N
X (k , l )   x(m, n)cas (
m 0 n 0
2 mk 2 nl

)
M
N
The inverse transform is given by the same formula
along with a scaling factor of 1/MN i.e.
X (k , l ) 
1
MN
M
N
 x(m, n)cas(
m0 n 0
2 mk 2 nl

)
M
N
Image compression is minimizing the size in bytes of
a graphics file without degrading the quality of the
image to an unacceptable level. The reduction in file size
allows more images to be stored in a given amount of
disk or memory space. It also reduces the time and
II. DISCRETE HARTLEY TRANSFORM
bandwidth required for images to be sent over the
The Hartley transform is an integral transform Internet or downloaded from Web pages.
closely related to the Fourier transform, but which
There are several different ways in which image
transforms real-valued functions to real-valued functions. files can be compressed. For Internet use, the two most
It was proposed as an alternative to the Fourier common compressed graphic image formats are the
transform by R. V. L. Hartley in 1942[8]. Compared to JPEG format and the GIF format. The JPEG method is
the Fourier transform, the Hartley transform has the more often used for photographs, while the GIF method
advantages of transforming real functions to real is commonly used for line art and other images in which
functions (as opposed to requiring complex numbers) geometric shapes are relatively simple.
and of being its own inverse. The discrete version of the
First of all the image is divided into blocks of
transform, the Discrete Hartley transform, was 8x8 pixel values. These blocks are then fed to the
introduced by R. N. Brace well in 1983.
encoder from where we obtain the compressed image.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 35
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
The next step is mapping of the pixel intensity value
to another domain. The mapper transforms images into a
(usually non-visual) format designed to reduce spatial
and temporal redundancy.
Quantizing the transformed coefficients results
in the loss of irrelevant information for the specified
purpose. Source coding is the process of encoding
information using fewer bits (or other informationbearing units) than an encoded representation would use,
through use of specific encoding schemes.
The block diagram of the steps
Figure 2 Energy quantization based image compression
encoder
For retrieving the image back, the steps have to be
reversed from the forward process. First the data is
decoded using the decoder. Next inverse transform
(IDHT) is calculated to get the 8x8 blocks. These blocks
are then connected to form the final image. From the
reconstructed image pixel values it is clear that some of
the high frequency components are preserved. This
indicates that the edge property of the image is
preserved.
It is required to convert the pixel values into another
domain so that it is easier to compress. A transform
operates on an image’s pixel values and converts them
to a set of less correlated transformed coefficients.
Natural images (which are the most common images to
be compressed) have a lot of spatial correlation between
the pixel intensities in its neighborhood. These
correlations can be exploited by using the transform and
so the spatial and temporal redundancy is reduced. This
operation is generally reversible and may or may not
reduce the data content of the images. Here discrete
Hartley transform (DHT) is used for generating the
coefficients
Figure 3 Energy quantization based image compression
decoder
Quantization is the process of approximating a
continuous range of values (or a very large set of
possible discrete values) by a relatively small ("finite")
set of discrete symbols or values. In other words it
means mapping a broad range of input values to a
limited number of output values. It reduces the accuracy
of the transformed coefficients in accordance with a preestablished fidelity criterion. The goal is to reduce the
amount of irrelevant information present in the image.
The human eye is fairly good at seeing small
differences in brightness over a relatively large area, but
not so good at distinguishing the exact strength of a high
frequency brightness variation. This fact allows one to
get away with a greatly reduced amount of information
in the high frequency components. This is done by
simply dividing each component in the frequency
domain by a constant for that component, and then
rounding to the nearest integer.
This is the main lossy operation in the whole
process. As a result of this, it is typically the case that
many of the higher frequency components are rounded
to zero, and many of the rest become small positive or
negative numbers.
The quantization matrices are formed for
different transforms according to their frequency
distribution in the coefficient matrix. Quantization
matrix for DCT can be easily obtained but it is difficult
for DHT since the scanning order is special for DHT.
Figure 4 Scanning Order for DHT
ISSN: 2231-5381
http://www.ijettjournal.org
Page 36
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
Block diagram description
Because computing the DFT of an N point
sequence requires N summations each involving N
operations, the total computation requires O(N2)
operations. Writing out the entire computation by hand
will show, however, that many of these operations are
redundant and can be eliminated.
Using Danielson and Lanczos’ [3] observation
that an N point DFT can be expressed as the summation
of two N/2 point DFTs, these redundancies can be
eliminated as we now show in above block diagram by
Adopting the conventional definition. The basic, radix-2
FFT algorithm is very symmetrical, but it accepts
general complex input when all that we need here is the
ability to transform real sequences. The Fourier
transform of areal sequence has conjugate symmetry
(the real part of the transform is even while the
imaginary part is odd) which can be exploited to reduce
the number of computations in an FFT by one half the
image.
There are several different ways in which image
files can be compressed. For Internet use, the two most
common compressed graphic image formats are the
JPEG format and the GIF format. The JPEG method is
more often used for photographs, while the GIF method
is commonly used for line art and other images in which
geometric shapes are relatively simple
ALU + adders (only 4 ALUs+7 adders in proposed DHT
but 9 ALUs +6 adders in DCT).
The blocks indicates the multiplications involved in
the convolution of the inputs with twiddle factors .
The FFT operation is show in fig 7 in which the
imaginary multiplication involved is restricted which
reduces the complexity as well area.
The Discrete Fourier Transform converts discrete
data from a time wave into a frequency spectrum. Using
the DFT implies that the finite segment that is analysed
is one period of an infinitely extended periodic signal.
The DFT equation
N 1
F ( n )   x ( k )e
 j 2 kn
N
k 0
x(k) is the time wave that is converted to a frequency
spectrum by the DFT. Here are key concepts required to
understand a DFT: The "sampling rate", sr.
The sampling rate is the number of samples taken
over a time period. For simplicity we will make the time
interval between samples equal. This is the "sample
interval".
Figure 7 Multiplications involved in the DHT algorithm
Figure 5 Functional block diagram
Figure 6 Twiddle factor constant multiplier
The fundamental period, T, is the period of all the
samples taken.
This is also called the "window". The "fundamental
frequency" is f0, which is 1/T. f0 is the first harmonic,
the second harmonic is 2*f0, the third is 3*f0, etc.
The number of samples is N.
The "Nyquist Frequency", fc, is half the sampling
rate.
The Nyquist frequency is the maximum frequency
that can be detected for a given sampling rate. This is
because in order to measure a wave you need at least
two sample points to identify it (trough and peak).
1. "Euler's formula"
2. The sampled part of the time wave, x(t), should
be "typical" of how the wave behaves over all time that
it exists.
Compared to DCT adder/ sub tractor of existing
algorithms, DHT adder/subtracter requires less no. of
ISSN: 2231-5381
http://www.ijettjournal.org
Page 37
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
MDC Architecture
The goal is to convert the input streams in Fig. 8
to the format in Fig.9. There are 12 memory banks at the
input stage for converting the parallel input streams into
serial blocks, such that one butterfly at each stage can
compute the four data streams without idle period.
The 12 memory banks are grouped into four memory
sets as shown in Fig. 4(a), that is, memory sets a, b, c,
and, which are used to store the input streams A, B,C,
and, respectively. There are two kinds of grouping
methods, namely grouping for even indexed symbols
and grouping for odd indexed symbols. Let the index of
OFDM symbol begin from 0. For even-indexed OFDM
symbols, the grouping method in the left side of Fig. 8 is
used and for odd indexed OFDM symbols, the grouping
method in the right side of Fig.8 is used. Fig. 8 illustrate
the memory scheduling for even-indexed OFDM
symbols. The scheduling for odd-indexed OFDM
symbols will become clear after the illustration for evenindexed OFDM symbols. Let us take N = 2048 as an
example and explain the input scheduling as follows.
Initially the 12 memory banks are logically grouped into
four sets {a1, a2, a3}, {b1, b2, b3}, {c1, c2, c3}, and
{d1, d2,d3} as shown in Fig. 4.1(a). Each set is in
charge of one input stream. From the first to the 3N/4th
cycle, the memory banks keep the first to 3N/4th
samples of each input stream. For the case of N = 2048,
the memory banks {a1, a2, a3}, {b1,b2, b3}, {c1, c2,
c3}, and {d1, d2, d3} store the samples 1th–512th,
513th–1024th, 1025th–1536th} of the first, the second,
the third, and the fourth input streams, respectively.
From the (3N/4+1)th to the Nth cycle shown in Fig.
4.1(b),the radix-4 butterfly processes the read-out data
from the memory set {a1, a2, a3} and then this memory
set are updated with the incoming samples from stream
B,C, and D. That is, together with the previously stored
first to 3N/4thsamples, now the radix-4 butterfly can
process the samples of stream A, because the (3N/4 +
1)th to the Nth samples are ready at this moment, also,
since only one butterfly isused at each stage, the (3N/4 +
1)th to the Nth samples for input streams B, C, and D are
stored in the vacated memories a1, a2, and a3,
respectively. Continuing with the example of N = 2048,
at the end of the 2048th clock cycle, the radix-4 butterfly
has computed the 2048 samples of stream A, and the
memory set {a1, a2, a3} is updated with the 1537th to
the 2048th samples of stream B,C, and D, respectively.
ISSN: 2231-5381
Similarly, in the next N/4 cycles, the contents in
memory set b are updated as shown in Fig. 4(c). The
processor readout the 2048 samples of stream B from
the memory banks a1and {b1, b2, b3} and sends it to the
radix-4 butterfly. Then, the empty memories a1 and {b1,
b2, b3} are updated by the first to the N/4th samples of
streams A, B, C and D, respectively, of the second
OFDM symbols. Continuing with the example of N =
2048, at the end of the 2560th clock cycle, the radix4butterfly has computed the 2048 samples of stream B,
and the memories a1 and {b1, b2, b3} are updated with
the first to the512th samples of stream A, B,C, and D
respectively, of the second OFDM symbols.
Continuing with the example of N = 2048, at the end of
the3072th and the 3584th clock cycles, the radix-4
butterfly has handled streams C and D, respectively.
Moreover, at the end of the 3584th clock cycle, all the
memories are updated with the first to the 1536th
samples of the second OFDM symbol. Next, similar
procedures mentioned above are used to handle the
second OFDM symbol. For a practical implementation,
the control mechanism of the proposed input scheduling
is summarized in Fig. 4.2, where the switch-boxat stage
s updates the routing rule every N/4s+1 OFDM symbol
time. Each of the four scheduled sequences occupies 1/4
of one OFDM symbol time, hence all four scheduled
sequences can be handled within one OFDM symbol
duration using one radix-4 butterfly at each stage. As a
result, the utilization rates for adders, multipliers and
memories are 100%. The computational complexity for
each stage is thus one radix-4 butterfly, three twiddlefactor multipliers, and a switch-box with first in firstouts (FIFOs). Since stage s needs 3N/4s words of FIFOs,
together with the input scheduling memory that is of3N
words, the overall required memory size of the
proposedradix-4 MDC FFT/IFFT processor with four
parallel input streams is 3N +log4 N−1s=1 3N/4s .
Butterfly Operations:
The proposed FFT/IFFT processor uses radix-4
butterflies as fundamental computing elements as shown
in fig 8 . Each stage adopts the sameradix-4 butterfly,
while the last stage uses a radix-8 butterfly which can
also be configured as a radix-4 butterfly. As for the
storage requirement of the twiddle factors, Lin suggested
to keep only the twiddle factors whose phase indices are
within N/8 , the rest of the twiddle factors can be derived
from quadrant conversion. As for the complex
http://www.ijettjournal.org
Page 38
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
multiplications, eachradix-4 butterfly needs three
multipliers and five real adders. We adopted the routing
rule for switch-box proposed.
We propose a configurable radix-8/radix-4
butterfly for the last stage, where the multiplications of
twiddle factor can be realized by constant multipliers.
This butterfly is composed by one radix-4 and four
radix-2 butterflies as shown in Fig. 4.2.When a radix-4
instead of a radix-8 computation is needed, this butterfly
enables only the internal radix-4 computations and
disables the other radix-2 computation
environment could be used to exploit the advantages of
the larger and more complex algorithms like vector radix
techniques. Many promising hybrid techniques have
been also developed and deserve attention.
Image should also support saving the FHT Buffer to
disk as well as its power spectrum. The ability to view
and alter the amplitude and phase of images should also
be supported. Finally, the dyadic frequency domain
operations deserve optimization, since their speed could
doubled with- out too much difficulty.
Figure 8 MDC architecture for the Proposed System
Figure 10 Stage box 1
Figure 9 Memory scheduling for the Even and Odd
indexed terms.
III.
The above simulation shows the twiddle factor
distributed for the multipliers W0..W3 represents the
twiddle factors in memory .f0 ....f3 represents the
switching activity of multiplier
RESULTS AND CONCLUSIONS
A DHT transform of 8 bit input is being implemented
with radix 4 implementation. the selection line is the
switch for DHT and its inverse .a variable length of
inputs have been tested and synthesized .
FFT has consumed 8 adders and 4 multipliers
where as the proposed scheme has only 4 adders and 2
sharing multipliers as one of the twiddle factor is one.
Implementation is done in verilog language using
Xilinx tool.
Split-radix techniques are very attractive since they
provide both compact size and minimum operation
counts. As processors evolve, the finite register set
limitation also becomes less stringent. Such an
ISSN: 2231-5381
Figure 11 Stage box 2
The above simulation shows the twiddle factor
distributed for the multipliers W0..W3 represents the
twiddle factors in memory .f0 ....f3 represents the
http://www.ijettjournal.org
Page 39
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
switching activity of multiplier.the multiplier consists of
1 and -1
The above simulation is the multiplier output for
the twiddle factors and the inputs .the outputs are shown
by out1 to out4.
Figure 8 Stage 3
The above simulation shows the twiddle factor
distributed for the multipliers W0..W3 represents the
twiddle factors in memory .f0 ....f3 represents the
switching activity of multiplier. As the multiplier
consists of common digits -1 and 1 it can be re shared
Figure 11 Imaginary multiplication
The above simulation is the multiplier output for the
twiddle factors and the inputs .the imaginary
multiplication is done by using the polarity in inverse.
The below fig.16 & 17 shows RTL Schematics of the
DHT Module
Figure 9 Top module
The above simulation is the real values obtained
and the imaginary vales are set to zero .two sucessive
inputs are given the sel line represents the inverse and
normal DHT output
Figure 12 RTL schematic DHT Module
Figure 13 RTL schematic DHT Module
Figure 10 Multiplication
ISSN: 2231-5381
http://www.ijettjournal.org
Page 40
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
7. P. K. Meher, ―LUT optimization for memorybased computation,‖ IEEE Trans. Circuits Syst.
II, Exp. Briefs, vol. 57, no. 4, pp. 285–289, Apr.
2010.
8. R.E. Crochiere and L.R.Rabiner, Multirate
Digital Signal Processing. Englewood Cliffs, NJ,
USA: Prentice-Hall, 1983.
Authors Profile:
Thamma Sai Sireesha is pursuing her
Master degree M.Tech in VERY LARGE
SCALE INTEGRATION (VLSI) SYSTEMS
KKR & KSR Institute
Technology & Science.
in
of
Figure 14 Comparison for Modular DHT and MDC DHT
REFERENCES
1. G. Bi, Y. Chen, and Y. Zeng, ―Fast algorithms
for generalized discrete Hartley transform of
composite sequence length,‖ IEEE Trans.
Circuits Syst. II, Analog Digit. Signal Process.,
vol. 47, no. 9, pp. 893–901, Sep. 2000.
2. D. F. Chiper, ―Radix-2 fast algorithm for
computing discrete Hartley transform of type
III,‖ IEEE Trans. Circuits Syst. II, Exp. Briefs,
vol. 59, no. 5, pp. 297–301, May 2012.
3. H. Z. Shu, J. S. Wu, C. F. Yang, and L.
Senhadji, ―Fast radix-3 algorithm for the
generalized discrete Hartley transform of type
II,‖ IEEE Signal Process. Lett., vol. 19, no. 6, pp.
348–351, Jun. 2012.
4. G. Bi, ―New split-radix algorithm for the
discrete Hartley transform,‖ IEEE Trans. Signal
Process., vol. 45, no. 2, pp. 297–302, Feb. 1997.
5. P. K. Meher, J. C. Patra, and M. N. S. Swamy,
―High throughput memorybased architecture for
DHT using a new convolutional formulation,‖
IEEE Trans. Circuits Syst. II, Exp. Briefs, vol.
54, no. 7, pp. 606–610, Jul. 2007.
6. P. K. Meher, T. Srikanthan, and J. C. Patra,
―Scalable and modular memory-based systolic
array architectures for discrete Hartley
transform,‖ IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 53, no. 5, pp. 1065–1077, May 2006.
ISSN: 2231-5381
http://www.ijettjournal.org
G. Malyadri is working as Assistant
Professor in KKR & KSR Institute of
Technology & Science. He has over
seven years of teaching experience.
Page 41
Download