Design and Implementation of VLSI DHT Algorithm for Image Compression

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume23 Number 5- May 2015
Design and Implementation of VLSI DHT
Algorithm for Image Compression
Vimal kumar C1, Raghavendra D2, M N Eshwarappa3
1
PG student (VLSI Design and Embedded systems), SIET, Tumkur, Karnataka, India
2
Assistant Prof., Dept. of E&C, SIET, Tumkur, Karnataka, India
3
HOD, Dept. of E&C, SIET, Tumkur, Karnataka, India
Abstract— A new very large scale integration (VLSI) algorithm
for a discrete Hartley transform (DHT) that can be efficiently
implemented on a highly modular and parallel VLSI architecture
having a regular structure is presented. The concurrent
execution of the DHT algorithm can be achieved by splitting on
several parallel parts efficiently. In addition, the hardware
complexity can be significantly reduced using sub expression
sharing technique of the proposed algorithm in highly parallel
VLSI implementation. With efficient sharing of multipliers
having the same constant and using the advantages of the
proposed algorithm, the numbers of multipliers and adders used
has been significantly reduced and is kept at a minimum
compared with that of the existing algorithms. Efficient
implementation of multipliers with a constant is possible in
VLSI. Digital image processing is the use of computer algorithms
to perform image processing on digital images. The basic
operation performed by a simple digital camera is to convert the
light energy to electrical energy, then the energy is converted to
digital format and a compression algorithm is used to reduce
memory requirement for storing the image. In this project, image
compression has been taken as an application to prove the
functionality of DHT algorithm in the field of digital signal
processing.
Keywords— Discrete Hartley
processing, image compression.
transform,
DHT
domain
I. INTRODUCTION
The Discrete Fourier transform (DFT) is used in
many digital signal processing applications like
signal and image compression, filter banks [1],
signal representation, or harmonic analysis [2]. The
DHT [2], [3] can be replaced efficiently in place of
DFT whenever the input sequence is real.
A DHT is a Fourier-related transform of discrete,
periodic data similar to the DFT, with analogous
applications in signal processing and related fields.
The DFT and DHT play an important role in many
Digital signal processing applications like image
processing, filter banks, etc. In contrast to Discrete
Cosine Transform (DCT) and DFT where complex
computations are required, DHT involves only real
computations of the input which is one of the main
ISSN: 2231-5381
attractive. The attention was drawn by Bracewell,
where DFT was replaced by DHT and has been an
efficient replacement where the real input sequence
is real. The necessity to derive the dedicated
hardware implementations using the VLSI
technology comes from the fact that the DHT is
computationally intensive.
Some fast algorithms have been derived in
literature for the computation of DHT [4]-[7] and
some algorithms for the computation of generalized
DHT [8]-[10]. Several Split-radix algorithms are
also available for computing DHT with a low
arithmetic cost. Systolic array is a representation of
one of the method of VLSI implementations. Using
parallelism Systolic array architectures achieve
modular and regular structure but lacks parallel
processing. A large portion of the chip area will be
consumed by multipliers and introduce significant
delays in a VLSI structure. Due to this fact,
memory-based solutions using multipliers have
been more in literature.
In [8], a highly parallel and modular solution for
the implementation of type-III DHT based on a new
VLSI algorithm is proposed. In [15], we have a
highly parallel solution for the implementation of
DHT based on a direct implementation of fast
Hartley transform (FHT). The hardware
implementations of FHT are rare considering the
literatures.
Compression is useful as it helps in reduction of
the transmission bandwidth required or the usage of
expensive resources, such as memory (hard disks).
The term data compression refers to the process of
reducing the required data to represent a quantity of
information. Because various amounts of data can
be used to represent the same amount of
information, representations that contain irrelevant
http://www.ijettjournal.org
Page 218
International Journal of Engineering Trends and Technology (IJETT) – Volume23 Number 5- May 2015
or repeated information are said to contain
The matrix defined above helps one to
redundant data. Digital image compression is the understand and implement the DHT algorithm in a
most important part in applications like multimedia better way to optimize the design as far as possible.
which aims to minimize the number of bits in an
image data for its efficient storage (less storage
and i and j are integers from 0 to N-1
area).
The cas function is given by:
Two-dimensional intensity arrays suffer from
three principle data redundancies that can be
identified and exploited:
Spatial and temporal redundancy
Irrelevant information
Coding redundancy.
In this brief, a new VLSI DHT algorithm that is
well suited for a VLSI implementation on a highly
parallel and modular architecture is proposed. It can
be used for designing a completely novel VLSI
architecture for DHT. Furthermore, using
subexpression sharing technique and sharing the
multipliers with the same constant, the hardware
complexity can be reduced significantly because of
the reduction in the number of multipliers and
adders. In the proposed solution, we have used both
multipliers with a constant and sharing the common
blocks that can be efficiently implemented in VLSI.
The designed DHT algorithm with reduction in
operational complexity is been implemented on
Fig. 1 DHT matrix, H7.
image compression which is a standard application.
II. VLSI ALGORITHM FOR DHT
The DHT is a linear, invertible function H: R→R
(where R denotes the set of real numbers). The N
real numbers 0 1… −1 are transformed into N real
numbers 0 1…. −1 according to the formula:
For DHT matrix of H7 shown in Fig. 1, one can
observe some of the symmetries in the forward and
backward diagonal as well as the number of
computations having the same value. This can be
exploited to reduce the number of operators
required.
III. COMMON SUBEXPRESSION SHARING
for k=0, 1.....N-1.
Equation (1) can be represented in matrix form
as follows:
h0
1
1
h1
1
h2
1
h3
1
1
Where HN is the NxN DHT matrix.
Fig. 2 Common Subexpression sharing
ISSN: 2231-5381
http://www.ijettjournal.org
Page 219
International Journal of Engineering Trends and Technology (IJETT) – Volume23 Number 5- May 2015
V. ARITHMETIC COST
One of the ways used to reduce the hardware
complexity is by using common subexpression
Table I lists the required number of
technique. Common subexpression sharing shares multiplications and additions for the proposed
the subexpression among several multiplication algorithm, the Sorensen one and Bi algorithm,
operations so that there can be a reduction in the where rotations are implemented with four
total operation counts. The circled teams of digits multiplications and two additions (Radix-2[13]∗)
have an equivalent subexpression, in order that they and with three multiplications and three additions
will share an equivalent computation unit as shown (Radix-2[13]∗∗).
in Fig. 2.
TABLE I
COMPUTATIONAL COMPLEXITY
This approach effectively reduces the hardware
value of multiple constant multiplications,
Radix
Radix
Radix
Proposed
2[13]
2[13]*
2[11]**
particularly for the filter-like operation. The number
N
M
A
M
A
M
A
M
A
of additions is also reduced by sharing the common
subexpression.
8
4
26
2
24
First calculation of common subexpression part is
16
10
62
10
62
20
74
been done by cos and sin terms using CORDIC
algorithm pre-existing in Xilinx tool and further has
32
40
168
38
174
69
194
32
31
been instantiated in the main program for several
64
118
418
98
438
196
482
computations. If one can maximally find these
common subexpressions, much computation can be
128 320
1008
258
1070
516
1154
saved by using subexpression sharing.
IV. ALGORITHM FOR A SMALL DHT
From Sections II and III, it can be understood that
the operators can be reduced. This can be illustrated
using an 8-point DHT algorithm which has been
presented below.
With c= 2
Where,
Due to the fact that we have the same constant „c‟
being multiplied, sharing of multipliers would be
possible enabling reduction in the number of
multipliers.
ISSN: 2231-5381
256
806
2354
642
2518
1284
2690
-
-
M Multipliers, A Adders
The values of M in the proposed algorithm are
computed considering both the multipliers with the
same constant and common subexpression sharing.
The number of multipliers in Sorensen algorithm
[11] is significantly greater than that in the
proposed one. However, the split-radix algorithm
has an irregular structure and is difficult to be
implemented in hardware as opposed to the
algorithm been proposed that has a regular and
modular structure and can be very easily
implemented in parallel for a DHT of length N = 32.
By reducing the number of number of multipliers
and adders as shown in Table I mean that the cost
as well as the hardware complexity will be reduced
significantly.
VI. PARALLEL VLSI ARCHITECTURE OF DHT
The multiplier „MUL‟ blocks are used to
implement the shared multipliers with a constant.
This block contains two multipliers with a constant.
Each multiplier is shared by four input sequences
and are multiplied with the same constant. The
http://www.ijettjournal.org
Page 220
International Journal of Engineering Trends and Technology (IJETT) – Volume23 Number 5- May 2015
process is done in an interleaved manner using intensities in its neighbourhood. The spatial and
multiplexers and demultiplexers controlled by two temporal redundancy is reduced by using the
clocks. One of the advantages of this algorithm and transform where it exploits the correlations just
architecture is the fact that the multiplications with mentioned. This operation is generally reversible
the same constant are shared in the MUL blocks. and may or may not reduce the data content of the
Thus, the number of multipliers is significantly less images. Here discrete Hartley transform (DHT) has
than that of the existing algorithms given in Table I been used for generating the coefficients.
which has become now only 32.
Quantization is the process of approximating a
continuous range of values (or a very large set of
possible discrete values) by a relatively small
("finite") set of discrete symbols or values. In other
words it means limited number of output values are
mapped by a broad range of input values. The
accuracy of the transformed coefficients will be
reduced in accordance with a pre-established
fidelity criterion. The goal is to reduce the
irrelevant amount of information present in the
image. The process is irreversible since the
information is lost in this process. So this step must
be avoided to keep the whole information intact in
error-free techniques.
Quantization matrix for DCT can be easily
obtained but the scanning order is special for DHT
and hence is difficult for DHT. Since the
quantization matrix is difficult to design, energy
quantization method can be applied. In this method,
Fig. 3 VLSI architecture for DHT of length N = 32.
the energy content of the transformed coefficients
VII.
PROPOSED OVERALL SYSTEM
of each matrix is obtained by using the following
formula. The normalized energy is given by:
QUANTISATIO
DHT
N&
THRESHOLDIN
G
Source
image data
HUFFMAN
ENCODER
Compressed
image data
Fig. 4 Block diagram of the proposed system
Since the image data is easier to compress when
pixel values are converted into another domain,
hence the need for transformation. The image‟s
pixel values is operated by the transform and
converts them to a set of less correlated transformed
coefficients. Natural images (which are the most
common images to be compressed) usually have a
lot of spatial correlation between the pixel
ISSN: 2231-5381
Where M and N are the widths of the sample
block and x (m, n) is the transformed sample.
Next a threshold value is selected and
transformed values will be neglected or kept intact
according to this threshold value. Since the
threshold value is determined as a percentage of the
energy content of the matrix i.e. the intensity of
pixel values, hence the threshold value is not a
global value and hence varies for each matrix. The
percentage value is only pre-decided. The
transformed co-efficient is truncated if the
transformed co-efficient is less than the threshold
value otherwise it is kept intact. By using the
method just stated, helps in sustaining the required
http://www.ijettjournal.org
Page 221
International Journal of Engineering Trends and Technology (IJETT) – Volume23 Number 5- May 2015
information in different regions of the images and
also helps in treating the image in segments.
An entropy encoding is a lossless data
compression scheme that is independent of the
specific characteristics of the medium. The main
objective of entropy coding is to encode the
quantized coefficients. In one of the main types of
entropy coding, a unique prefix code will be
generated and assigned to each unique symbol that
Fig. 7 Simulated waveform of output from the proposed System
occurs in the input.
Huffman coding is one of the most popular
The RTL Schematic of the proposed system is
techniques for removing coding redundancy. The shown in Fig. 5. Fig. 6 and Fig. 7 shows the 8x8
term refers to encoding a source symbol (such as a image‟s pixel values and the output respectively,
character in a file) with the use of a variable-length obtained from the proposed system. data_in and
code table. Based on the estimated probability of data_out are the inputs and outputs of the system
occurrence for each possible value of the source respectively. An 8x8 image applied is 8x8 image,
symbol, the variable-length code table has been so a total of 64 pixel values is been applied. In the
derived in a particular way. A Huffman coder forms output, we have another output „i‟ displaying the
a data tree from the original data symbols and their count value, where it can be clearly seen that it
associated probabilities and determines the shows 49 which are the pixel values obtained at the
compressed symbols.
output.
IX. CONCLUSION
VIII.
RESULTS
A. Advanced HDL synthesis report
1) Macro Statistics (32-bit DHT):
Multipliers
10x10-bit multiplier
10x2-bit multiplier
Adder Trees
10-bit/63-inputs adder tree
:
:
:
:
:
32
31
01
31
01
B. Advanced HDL synthesis report
In this brief, a new highly parallel VLSI
algorithm for the computation of a length-N = 2n
DHT having a modular and regular structure has
been presented. Moreover, this algorithm can be
implemented on a highly parallel architecture
having a modular and regular structure with a low
hardware complexity by extensively using a
subexpression sharing technique and the sharing of
multipliers having the same constant. So from the
obtained results, it can be concluded that even
though after optimizing the DHT, it can be used for
applications in signal processing domain. In this
paper, the DHT algorithm has been used for image
compression as an application.
ACKNOWLEDGMENT
I would like to thank Mr. Raghavendra. D,
Assistant Professor, department of Electronics and
Communication Engineering, for accepting to be
my internal guide and for his constant and valuable
guidance in preparing this paper and I express my
deep sense of gratitude to Dr. M.N. Eshwarappa,
Professor and Head, department of Electronics and
Communication Engineering for providing me the
Fig. 5 RTL Schematic of the proposed System
Fig. 6 Simulated waveform of output from the proposed System
ISSN: 2231-5381
http://www.ijettjournal.org
Page 222
International Journal of Engineering Trends and Technology (IJETT) – Volume23 Number 5- May 2015
support for correcting my mistakes and guide me in
preparing this paper. I take this opportunity to
extend
my
whole
hearted
thanks
to
all the teaching staff of Electronics and
Communication Engineering department.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal Processing.
Englewood Cliffs, NJ, USA: Prentice-Hall, 1983.
Z. Wang, “Harmonic analysis with a real frequency function, I
Aperiodic case, II Periodic and bounded cases, and III data sequence,”
Appl. Math Comput., vol. 9, no. 1, pp. 53–73, Jul. 1981.
J. Xi and J. F. Chicharo, “Computing running discrete Hartley
transform and running discreteWtransforms based on the adaptive
LMS algorithm,”IEEE Trans. Circuits Syst. II, Analog Digit. Signal
Process., vol. 44, no. 3,pp. 257–260, Mar. 1997.
S.K.Pattanaik and K.K.Mahapatra,“DHT Based JPEG Image
Compression Using a Novel Energy Quantization Method”IEEE
International conference on industrial technology, pp. 2827 – 2832,
Dec 2006.
R.C. Gonzalez, R.E. Woods, Digital Image Processing, Pearson
Education 3rd Edition 2008.
F. Vahid and T. Givargis, Embedded system design: A unified
hardware/software introduction, Wiley India (P.) Ltd, 3rd edition 2009.
A. Amira, “An FPGA based system for discrete hartley transforms.”
IEEE publication, pp. 137-140, 2003.
D. F. Chiper, “Radix-2 fast algorithm for computing discrete Hartley
transform of type III,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 9,
no. 5, pp. 297–301, May 2012.
H. Z. Shu, J. S. Wu, C. F. Yang, and L. Senhadji, “Fast radix-3
algorithm for the generalized discrete Hartley transform of type II,”
IEEE Signal Process. Lett., vol. 19, no. 6, pp. 348–351, Jun. 2012.
D. F. Chiper, “Fast radix-2 algorithm for the discrete Hartley transform
of type II,” IEEE Signal Process. Lett., vol. 18, no. 11, pp. 687–689,
Nov. 2011.
H. V. Sorensen, D. L. Jones, C. S. Burrus, and M. T. Heideman, “On
computing the discrete Hartley transform,” IEEE Trans. Acoust.,
Speech, Signal Process., vol. ASSP-33, no. 5, pp. 1231–1238, Oct.
1985.
H. S. Malvar, Signal Processing With Lapped Transforms. Norwood,
MA, USA: Artech House, 1992.
G. Bi, “New split-radix algorithm for the discrete Hartley transform,”
IEEE Trans. Signal Process., vol. 45, no. 2, pp. 297–302, Feb. 1997.
C. W. Kok, “Fast algorithm for computing discrete cosine transform,”
IEEE Trans. Signal Process., vol. 45, no. 3, pp. 757–760, Mar. 1997.
A. Erickson and B. S. Fagin, “Calculating the FHT in hardware,” IEEE
Trans. Signal Process., vol. 40, no. 6, pp. 1341–1353, Jun. 1992.
K. K. Parhi, VLSI Digital Signal Processing. New York, NY,
USA:Wiley,1999.
J.-I. Guo, C.-M. Liu, and C.-W. Jen, “The efficient memory-based
VLSI array design for DFT and DCT,” IEEE Trans. Circuits Syst. II,
Analog Digit. Signal Process., vol. 39, no. 10, pp. 723–733, Oct. 1992.
P. K. Meher, “LUT optimization for memory-based computation,”
IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 57, no. 4, pp. 285–289,
Apr. 2010.
P. K. Meher, “New approach to look-up table design and memorybased realization of FIR digital filters,” IEEE Trans. Circuits Syst. I,
Reg. Papers, vol. 57, no. 3, pp. 592–603, Mar. 2010.
D. F. Chiper and P. Ungureanu, “Novel VLSI algorithm and
architecture with good quantization properties for a high-throughput
area efficient systolic array implementation of DCT,” EURASIP J. Adv.
Signal Process., vol. 2011, no. 1, pp. 1–14, Jan. 2011.
R. I. Hartley, “Subexpression sharing in filters using canonic signed
digit multipliers,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal
Process., vol. 43, no. 10, pp. 677–688, Oct. 1996.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 223
Download