International Journal of Engineering Trends and Technology (IJETT) – Volume23 Number 5- May 2015 Design and Implementation of VLSI DHT Algorithm for Image Compression Vimal kumar C1, Raghavendra D2, M N Eshwarappa3 1 PG student (VLSI Design and Embedded systems), SIET, Tumkur, Karnataka, India 2 Assistant Prof., Dept. of E&C, SIET, Tumkur, Karnataka, India 3 HOD, Dept. of E&C, SIET, Tumkur, Karnataka, India Abstract— A new very large scale integration (VLSI) algorithm for a discrete Hartley transform (DHT) that can be efficiently implemented on a highly modular and parallel VLSI architecture having a regular structure is presented. The concurrent execution of the DHT algorithm can be achieved by splitting on several parallel parts efficiently. In addition, the hardware complexity can be significantly reduced using sub expression sharing technique of the proposed algorithm in highly parallel VLSI implementation. With efficient sharing of multipliers having the same constant and using the advantages of the proposed algorithm, the numbers of multipliers and adders used has been significantly reduced and is kept at a minimum compared with that of the existing algorithms. Efficient implementation of multipliers with a constant is possible in VLSI. Digital image processing is the use of computer algorithms to perform image processing on digital images. The basic operation performed by a simple digital camera is to convert the light energy to electrical energy, then the energy is converted to digital format and a compression algorithm is used to reduce memory requirement for storing the image. In this project, image compression has been taken as an application to prove the functionality of DHT algorithm in the field of digital signal processing. Keywords— Discrete Hartley processing, image compression. transform, DHT domain I. INTRODUCTION The Discrete Fourier transform (DFT) is used in many digital signal processing applications like signal and image compression, filter banks [1], signal representation, or harmonic analysis [2]. The DHT [2], [3] can be replaced efficiently in place of DFT whenever the input sequence is real. A DHT is a Fourier-related transform of discrete, periodic data similar to the DFT, with analogous applications in signal processing and related fields. The DFT and DHT play an important role in many Digital signal processing applications like image processing, filter banks, etc. In contrast to Discrete Cosine Transform (DCT) and DFT where complex computations are required, DHT involves only real computations of the input which is one of the main ISSN: 2231-5381 attractive. The attention was drawn by Bracewell, where DFT was replaced by DHT and has been an efficient replacement where the real input sequence is real. The necessity to derive the dedicated hardware implementations using the VLSI technology comes from the fact that the DHT is computationally intensive. Some fast algorithms have been derived in literature for the computation of DHT [4]-[7] and some algorithms for the computation of generalized DHT [8]-[10]. Several Split-radix algorithms are also available for computing DHT with a low arithmetic cost. Systolic array is a representation of one of the method of VLSI implementations. Using parallelism Systolic array architectures achieve modular and regular structure but lacks parallel processing. A large portion of the chip area will be consumed by multipliers and introduce significant delays in a VLSI structure. Due to this fact, memory-based solutions using multipliers have been more in literature. In [8], a highly parallel and modular solution for the implementation of type-III DHT based on a new VLSI algorithm is proposed. In [15], we have a highly parallel solution for the implementation of DHT based on a direct implementation of fast Hartley transform (FHT). The hardware implementations of FHT are rare considering the literatures. Compression is useful as it helps in reduction of the transmission bandwidth required or the usage of expensive resources, such as memory (hard disks). The term data compression refers to the process of reducing the required data to represent a quantity of information. Because various amounts of data can be used to represent the same amount of information, representations that contain irrelevant http://www.ijettjournal.org Page 218 International Journal of Engineering Trends and Technology (IJETT) – Volume23 Number 5- May 2015 or repeated information are said to contain The matrix defined above helps one to redundant data. Digital image compression is the understand and implement the DHT algorithm in a most important part in applications like multimedia better way to optimize the design as far as possible. which aims to minimize the number of bits in an image data for its efficient storage (less storage and i and j are integers from 0 to N-1 area). The cas function is given by: Two-dimensional intensity arrays suffer from three principle data redundancies that can be identified and exploited: Spatial and temporal redundancy Irrelevant information Coding redundancy. In this brief, a new VLSI DHT algorithm that is well suited for a VLSI implementation on a highly parallel and modular architecture is proposed. It can be used for designing a completely novel VLSI architecture for DHT. Furthermore, using subexpression sharing technique and sharing the multipliers with the same constant, the hardware complexity can be reduced significantly because of the reduction in the number of multipliers and adders. In the proposed solution, we have used both multipliers with a constant and sharing the common blocks that can be efficiently implemented in VLSI. The designed DHT algorithm with reduction in operational complexity is been implemented on Fig. 1 DHT matrix, H7. image compression which is a standard application. II. VLSI ALGORITHM FOR DHT The DHT is a linear, invertible function H: R→R (where R denotes the set of real numbers). The N real numbers 0 1… −1 are transformed into N real numbers 0 1…. −1 according to the formula: For DHT matrix of H7 shown in Fig. 1, one can observe some of the symmetries in the forward and backward diagonal as well as the number of computations having the same value. This can be exploited to reduce the number of operators required. III. COMMON SUBEXPRESSION SHARING for k=0, 1.....N-1. Equation (1) can be represented in matrix form as follows: h0 1 1 h1 1 h2 1 h3 1 1 Where HN is the NxN DHT matrix. Fig. 2 Common Subexpression sharing ISSN: 2231-5381 http://www.ijettjournal.org Page 219 International Journal of Engineering Trends and Technology (IJETT) – Volume23 Number 5- May 2015 V. ARITHMETIC COST One of the ways used to reduce the hardware complexity is by using common subexpression Table I lists the required number of technique. Common subexpression sharing shares multiplications and additions for the proposed the subexpression among several multiplication algorithm, the Sorensen one and Bi algorithm, operations so that there can be a reduction in the where rotations are implemented with four total operation counts. The circled teams of digits multiplications and two additions (Radix-2[13]∗) have an equivalent subexpression, in order that they and with three multiplications and three additions will share an equivalent computation unit as shown (Radix-2[13]∗∗). in Fig. 2. TABLE I COMPUTATIONAL COMPLEXITY This approach effectively reduces the hardware value of multiple constant multiplications, Radix Radix Radix Proposed 2[13] 2[13]* 2[11]** particularly for the filter-like operation. The number N M A M A M A M A of additions is also reduced by sharing the common subexpression. 8 4 26 2 24 First calculation of common subexpression part is 16 10 62 10 62 20 74 been done by cos and sin terms using CORDIC algorithm pre-existing in Xilinx tool and further has 32 40 168 38 174 69 194 32 31 been instantiated in the main program for several 64 118 418 98 438 196 482 computations. If one can maximally find these common subexpressions, much computation can be 128 320 1008 258 1070 516 1154 saved by using subexpression sharing. IV. ALGORITHM FOR A SMALL DHT From Sections II and III, it can be understood that the operators can be reduced. This can be illustrated using an 8-point DHT algorithm which has been presented below. With c= 2 Where, Due to the fact that we have the same constant „c‟ being multiplied, sharing of multipliers would be possible enabling reduction in the number of multipliers. ISSN: 2231-5381 256 806 2354 642 2518 1284 2690 - - M Multipliers, A Adders The values of M in the proposed algorithm are computed considering both the multipliers with the same constant and common subexpression sharing. The number of multipliers in Sorensen algorithm [11] is significantly greater than that in the proposed one. However, the split-radix algorithm has an irregular structure and is difficult to be implemented in hardware as opposed to the algorithm been proposed that has a regular and modular structure and can be very easily implemented in parallel for a DHT of length N = 32. By reducing the number of number of multipliers and adders as shown in Table I mean that the cost as well as the hardware complexity will be reduced significantly. VI. PARALLEL VLSI ARCHITECTURE OF DHT The multiplier „MUL‟ blocks are used to implement the shared multipliers with a constant. This block contains two multipliers with a constant. Each multiplier is shared by four input sequences and are multiplied with the same constant. The http://www.ijettjournal.org Page 220 International Journal of Engineering Trends and Technology (IJETT) – Volume23 Number 5- May 2015 process is done in an interleaved manner using intensities in its neighbourhood. The spatial and multiplexers and demultiplexers controlled by two temporal redundancy is reduced by using the clocks. One of the advantages of this algorithm and transform where it exploits the correlations just architecture is the fact that the multiplications with mentioned. This operation is generally reversible the same constant are shared in the MUL blocks. and may or may not reduce the data content of the Thus, the number of multipliers is significantly less images. Here discrete Hartley transform (DHT) has than that of the existing algorithms given in Table I been used for generating the coefficients. which has become now only 32. Quantization is the process of approximating a continuous range of values (or a very large set of possible discrete values) by a relatively small ("finite") set of discrete symbols or values. In other words it means limited number of output values are mapped by a broad range of input values. The accuracy of the transformed coefficients will be reduced in accordance with a pre-established fidelity criterion. The goal is to reduce the irrelevant amount of information present in the image. The process is irreversible since the information is lost in this process. So this step must be avoided to keep the whole information intact in error-free techniques. Quantization matrix for DCT can be easily obtained but the scanning order is special for DHT and hence is difficult for DHT. Since the quantization matrix is difficult to design, energy quantization method can be applied. In this method, Fig. 3 VLSI architecture for DHT of length N = 32. the energy content of the transformed coefficients VII. PROPOSED OVERALL SYSTEM of each matrix is obtained by using the following formula. The normalized energy is given by: QUANTISATIO DHT N& THRESHOLDIN G Source image data HUFFMAN ENCODER Compressed image data Fig. 4 Block diagram of the proposed system Since the image data is easier to compress when pixel values are converted into another domain, hence the need for transformation. The image‟s pixel values is operated by the transform and converts them to a set of less correlated transformed coefficients. Natural images (which are the most common images to be compressed) usually have a lot of spatial correlation between the pixel ISSN: 2231-5381 Where M and N are the widths of the sample block and x (m, n) is the transformed sample. Next a threshold value is selected and transformed values will be neglected or kept intact according to this threshold value. Since the threshold value is determined as a percentage of the energy content of the matrix i.e. the intensity of pixel values, hence the threshold value is not a global value and hence varies for each matrix. The percentage value is only pre-decided. The transformed co-efficient is truncated if the transformed co-efficient is less than the threshold value otherwise it is kept intact. By using the method just stated, helps in sustaining the required http://www.ijettjournal.org Page 221 International Journal of Engineering Trends and Technology (IJETT) – Volume23 Number 5- May 2015 information in different regions of the images and also helps in treating the image in segments. An entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium. The main objective of entropy coding is to encode the quantized coefficients. In one of the main types of entropy coding, a unique prefix code will be generated and assigned to each unique symbol that Fig. 7 Simulated waveform of output from the proposed System occurs in the input. Huffman coding is one of the most popular The RTL Schematic of the proposed system is techniques for removing coding redundancy. The shown in Fig. 5. Fig. 6 and Fig. 7 shows the 8x8 term refers to encoding a source symbol (such as a image‟s pixel values and the output respectively, character in a file) with the use of a variable-length obtained from the proposed system. data_in and code table. Based on the estimated probability of data_out are the inputs and outputs of the system occurrence for each possible value of the source respectively. An 8x8 image applied is 8x8 image, symbol, the variable-length code table has been so a total of 64 pixel values is been applied. In the derived in a particular way. A Huffman coder forms output, we have another output „i‟ displaying the a data tree from the original data symbols and their count value, where it can be clearly seen that it associated probabilities and determines the shows 49 which are the pixel values obtained at the compressed symbols. output. IX. CONCLUSION VIII. RESULTS A. Advanced HDL synthesis report 1) Macro Statistics (32-bit DHT): Multipliers 10x10-bit multiplier 10x2-bit multiplier Adder Trees 10-bit/63-inputs adder tree : : : : : 32 31 01 31 01 B. Advanced HDL synthesis report In this brief, a new highly parallel VLSI algorithm for the computation of a length-N = 2n DHT having a modular and regular structure has been presented. Moreover, this algorithm can be implemented on a highly parallel architecture having a modular and regular structure with a low hardware complexity by extensively using a subexpression sharing technique and the sharing of multipliers having the same constant. So from the obtained results, it can be concluded that even though after optimizing the DHT, it can be used for applications in signal processing domain. In this paper, the DHT algorithm has been used for image compression as an application. ACKNOWLEDGMENT I would like to thank Mr. Raghavendra. D, Assistant Professor, department of Electronics and Communication Engineering, for accepting to be my internal guide and for his constant and valuable guidance in preparing this paper and I express my deep sense of gratitude to Dr. M.N. Eshwarappa, Professor and Head, department of Electronics and Communication Engineering for providing me the Fig. 5 RTL Schematic of the proposed System Fig. 6 Simulated waveform of output from the proposed System ISSN: 2231-5381 http://www.ijettjournal.org Page 222 International Journal of Engineering Trends and Technology (IJETT) – Volume23 Number 5- May 2015 support for correcting my mistakes and guide me in preparing this paper. I take this opportunity to extend my whole hearted thanks to all the teaching staff of Electronics and Communication Engineering department. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal Processing. Englewood Cliffs, NJ, USA: Prentice-Hall, 1983. Z. Wang, “Harmonic analysis with a real frequency function, I Aperiodic case, II Periodic and bounded cases, and III data sequence,” Appl. Math Comput., vol. 9, no. 1, pp. 53–73, Jul. 1981. J. Xi and J. F. Chicharo, “Computing running discrete Hartley transform and running discreteWtransforms based on the adaptive LMS algorithm,”IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 44, no. 3,pp. 257–260, Mar. 1997. S.K.Pattanaik and K.K.Mahapatra,“DHT Based JPEG Image Compression Using a Novel Energy Quantization Method”IEEE International conference on industrial technology, pp. 2827 – 2832, Dec 2006. R.C. Gonzalez, R.E. Woods, Digital Image Processing, Pearson Education 3rd Edition 2008. F. Vahid and T. Givargis, Embedded system design: A unified hardware/software introduction, Wiley India (P.) Ltd, 3rd edition 2009. A. Amira, “An FPGA based system for discrete hartley transforms.” IEEE publication, pp. 137-140, 2003. D. F. Chiper, “Radix-2 fast algorithm for computing discrete Hartley transform of type III,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 9, no. 5, pp. 297–301, May 2012. H. Z. Shu, J. S. Wu, C. F. Yang, and L. Senhadji, “Fast radix-3 algorithm for the generalized discrete Hartley transform of type II,” IEEE Signal Process. Lett., vol. 19, no. 6, pp. 348–351, Jun. 2012. D. F. Chiper, “Fast radix-2 algorithm for the discrete Hartley transform of type II,” IEEE Signal Process. Lett., vol. 18, no. 11, pp. 687–689, Nov. 2011. H. V. Sorensen, D. L. Jones, C. S. Burrus, and M. T. Heideman, “On computing the discrete Hartley transform,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-33, no. 5, pp. 1231–1238, Oct. 1985. H. S. Malvar, Signal Processing With Lapped Transforms. Norwood, MA, USA: Artech House, 1992. G. Bi, “New split-radix algorithm for the discrete Hartley transform,” IEEE Trans. Signal Process., vol. 45, no. 2, pp. 297–302, Feb. 1997. C. W. Kok, “Fast algorithm for computing discrete cosine transform,” IEEE Trans. Signal Process., vol. 45, no. 3, pp. 757–760, Mar. 1997. A. Erickson and B. S. Fagin, “Calculating the FHT in hardware,” IEEE Trans. Signal Process., vol. 40, no. 6, pp. 1341–1353, Jun. 1992. K. K. Parhi, VLSI Digital Signal Processing. New York, NY, USA:Wiley,1999. J.-I. Guo, C.-M. Liu, and C.-W. Jen, “The efficient memory-based VLSI array design for DFT and DCT,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 10, pp. 723–733, Oct. 1992. P. K. Meher, “LUT optimization for memory-based computation,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 57, no. 4, pp. 285–289, Apr. 2010. P. K. Meher, “New approach to look-up table design and memorybased realization of FIR digital filters,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 592–603, Mar. 2010. D. F. Chiper and P. Ungureanu, “Novel VLSI algorithm and architecture with good quantization properties for a high-throughput area efficient systolic array implementation of DCT,” EURASIP J. Adv. Signal Process., vol. 2011, no. 1, pp. 1–14, Jan. 2011. R. I. Hartley, “Subexpression sharing in filters using canonic signed digit multipliers,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 43, no. 10, pp. 677–688, Oct. 1996. ISSN: 2231-5381 http://www.ijettjournal.org Page 223