International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013 An Efficient Architecture for 3-D Discrete Wavelet Transform Bala Tejavath1 N.Suresh Babu2 1 PG Student (M. Tech), Dept. of ECE, Chirala Engineering College, Chirala, A.P, India. 2 Professor, Vice Principal & HOD-ECE, Chirala Engineering College, Chirala, A.P, India . * Abstract: The 3-D discrete wavelet transforms (DWT) have been widely used in many applications like image compression, signal processing, speech compression because of their multi-resolution of signals with localization both in time and frequency. In the past, many architectures were proposed aimed at providing high – speed 3-D DWT computation with the requirement of utilizing a reasonable amount of hard ware resources. These architectures can be broadly classified into separable and non separable architectures. The separable method is the most straight forward implementation method. In separable method, a 3-D filtering operations, one for processing the data row-wise and the other column-wise. In this method the intermediate coefficients stores in a frame memory first. Then it performs 1-D DWT in other direction with these intermediate coefficients to complete one-level 3-D DWT .Because the size of this frame memory is usually assumed to off chip. However, the separable method performs 1-D DWT in both directions simultaneously. In this paper, separable pipeline architecture for fast computation of the 3-D DWT with a less memory and low latency is proposed. The low latency and less memory is achieved by proper designing of three 1-D DWT filtering processes and also efficiently transferring the data between the three 1-D DWT filters. Keywords: Discrete wavelet transforms, image compression, lifting, video, VLSI architecture. ensures high resource utilization, that 1. Introduction Nowadays, most of the too in cost effective platforms like field applications require real-time DWT programmable engines computing designing such architecture does offer potentiality for which a fast and some flexibilities like speeding up the dedicated very-large-scale integration computation (VLSI) architecture appears to be the pipelined best processing, possibilities of reduced with possible large solution. ISSN: 2231-5381 While it gate by array adopting structures http://www.ijettjournal.org (FPGA), and more parallel Page 69 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013 memory consumptions through better literatures task scheduling or low-power and requirements. portability features. To overcome one with reduced Nevertheless, following attempt with 3-D-DWT architectures—viz., the computation and reduce the storage memory requirement, block based [7], requirement thereafter, computation [8] or scan-based architectures with of a lifting step is carried out in two independent group of pictures (GOP) stages and performed sequentially. In transform effect, been reported. it regularize doubles the memory referencing quality while the independent GOPs consumption introduce annoying jerks in video required processing speed by two fold. playback at Besides, those are merely temporal [1]. transform methods; and clearly, there Alternatively, some successful scan- is a gap in the literature for a based transform complete 3-D-DWT architecture which convolution employs lifting and running transform transform to PSNR drop boundaries running architectures with filtering have been reported avoiding with infinite these limitations. principle. After the advent of the lifting while related lifting However, blocking degrades the PSNR due and the their of the toughest problems associated have to memory power increasing the GOP in its working 2. Pipeline for the 3-D Dwt scheme in 1994, the computation of Computation DWT has experienced a sea change. In a pipeline structure for the DWT While providing a computation, multiple stages are used complexity, to carry out the computations of the in-place computation, ease in building various decomposition levels of the non linear and inverse wavelets [6], transform. the lifting also reduces the memory corresponding to each decomposition requirement. Thus, it has become a level needs to be mapped to a stage or powerful tool to the researchers for stages of the pipeline. In order to computation of both 2-D and 3-D- design a pipeline structure capable of DWT in several applications. Some performing a fast computation of the lifting-based temporal DWT with low expense on hardware infinite resources and low design complexity, reduced computational transform GOPs facilities solely techniques have been ISSN: 2231-5381 with like reported in The computation an optimal mapping of the overall http://www.ijettjournal.org Page 70 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013 task of the DWT computation to the available for its operation. Once the various stages of the pipeline needs to operation of a stage is started, it must be determined. Any distribution of the continue until the task assigned to it overall task of the DWT computation is fully completed. to stages must consider the inherent Consider the timing diagram given nature of the sequential computations in Fig. 2 for the operation of the three of the decomposition levels that limit stages, where t1,t2 and t3 are the the computational parallelism of the times taken individually by stages 1,2 pipeline stages, and consequently the and 3, respectively, to complete their latency of the pipeline. Further, in assigned tasks, and ta and tb are the order to minimize the expense on the times elapsed between the starting hardware resources of the pipeline, points of the tasks, by stages 1 and 2, the number of filter units used by and that stages 2 and 3 respectively. each stage ought to be minimum and proportional to the amount of the task assigned to the stage. Figure 2 Timing Diagram for the operations of three stages Figure 1 Pipeline structure with N stages Note that the lengths of the times t1,t2 and t3 to complete the tasks by 3. Synchronization of stages The distribution the the same, since the ratios of the tasks computational load among the three assigned and the resources made stages, and the hardware resources available to the three stages are the made available to them are in the same. The average times to compute ratio 8:2:1. The stages of pipeline one output sample by stages 1,2 and need to be synchronized in such a 3 are in the ratio 1:4:8. In Fig. 2 the way the relative widths of the slots in the three operation at an earliest possible time stages are shown to reflect this ratio. when Our objective is to minimise the total that the each stage required ISSN: 2231-5381 of individual stages are approximately starts data become http://www.ijettjournal.org Page 71 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013 computation time ta+tb+t3 by matrix, which can be computed more minimizing t,t and t individually. quickly than the analogous Fourier Design of stages matrix. Most notably, the discrete In the proposed three-stage wavelet transform is used for signal architecture, stages 1 and 2 perform coding, where the properties of the the computations of levels 1 and 2 transform are exploited to represent a respectively, and stage 3 that of all discrete signal in a more redundant the remaining levels. Fig. 3 shows the form, often as a preconditioning for block data diagram of the three-stage architecture. compression. The discrete wavelet transform has a huge number of applications Engineering, in Science, Mathematics and Computer Science. Wavelet compression is a form of data compression well suited for image compression (sometimes also video compression and audio compression). The goal is to store image data in as little space as possible in a file. A certain loss of Figure 3 Block Diagram of the three-stage architecture 4. Different types of transforms quality is accepted Compression). 1. FT (Fourier Transform). 2. DCT (Discrete Cosine Transform). 3. DWT (Discrete Wavelet Transform). Discrete Wavelet Transform (DWT) The discrete wavelet transform (DWT) refers to wavelet transforms for which the wavelets are discretely sampled. A transform which localizes a function both in space and scaling and has some desirable properties compared to the Fourier transform. The transform is based on a wavelet ISSN: 2231-5381 Figure 4 PROPOSED ARCHITECTURE http://www.ijettjournal.org Page 72 (lossy International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013 Using a wavelet transform, the wavelet compression methods are the implemented 3-D,2-D and 1-D Wavelet transforms respectively. better at representing transients, such Fig.8 Shows the RTL schematic of as percussion sounds in audio, or high- the frequency utilization components in two- dimensional images, for example an proposed system. summary is The device shown Table-1 image of stars on a night sky. Signal can be represented by a smaller amount of information than would be transform, the case such if as some other the more widespread discrete cosine transform, had been used. First a wavelet transform is applied. This produces as many coefficients as there are pixels in the image (i.e.: there is no compression yet since it is only a transform). These coefficients can then be compressed Figure 5 TOP MODULE 3-d DWT more easily because the information is statistically concentrated in just a few coefficients. 5. Results & Conclusions In this paper, fast computation of the 3-D DWT with a less memory and low latency is proposed. The low latency and less memory is achieved by proper designing of three 1-D DWT Figure 6 Simulation result for 2-d DWT filtering processes and also efficiently transferring the data between the three 1-D DWT architectures. This architecture is simulated, synthesized and implemented by VERILOG language using XILINX ISE Tool. Fig. 5,6,7 shows the simulation results for ISSN: 2231-5381 http://www.ijettjournal.org Page 73 in International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013 Total memory usage is 198724 kilobytes Acknowledgements The authors would like to thank the anonymous reviewers for their comments which were very helpful in improving the quality and presentation of this paper. Figure 7 Simulation result for 1-d DWT References: [1] M. Vishwanath, R. Owens, and M. J. Irwin, ―VLSI architectures for the discrete wavelet transform,‖ IEEE Trans. Circuits Syst. II, Analog. Digit. Signal Process., vol. 42, no. 5, pp. 305–316, May 1995 [2] C. Chakrabarti and M. Vishwanath, ―Efficient realizations of the discrete and continuous wavelet transforms: From single chip implementations to mapping Figure 8 RTL schematic of the proposed system on SIMD array computers,‖ IEEE Trans. Table-1 Device Utilization Summary (estimated values) Logic Utilization Used Available Signal Process., vol. 43, no. 3, pp. 759– 771, Mar. 1995. Utiliz ation Number of Slices 202 4656 4% Number of Slice Flip Flops 215 9312 2% Number of 4 input 355 LUTs 9312 3% 232 35% Number of bonded IOBs 82 Number of GCLKs 1 [3] H. Y. Liao, M. K. Mandal, and B. F. Cockburn, ―Efficient architectures for 1D and 2-D 4% Table 1Device Utilization summary for the device xc3s500e4fg320 Total 16.917ns (11.928ns logic, 4.990ns wavelet transforms,‖ IEEE Trans. Signal Process., vol. 52, no. 5, pp. 1315–1326, May 2004. [4] D. Guevorkian, Launiainen, ―Architectures 24 lifting-based and for P. V. Liuha, Lappalainen, Discrete Wavelet Transforms,‖ U.S. 6976046, Dec. 13, 2005 [5] M. Alam,W. Badawy, V. Dimitrov, and G. Jullien, ―An efficient architecture route) (70.5% logic, 29.5% route) ISSN: 2231-5381 A. http://www.ijettjournal.org Page 74 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013 for a lifted 2-D biorthogonal DWT,‖ J. Authors Profile: VLSI Signal Process., vol. 40, pp. 333– 342, 2005. [6] C. Bala Yu and S.-J. Chen, ―VLSI Tejavath is Pursuing his M. Tech implementation of 2-D discrete wavelet from Chirala transform Engineering College, for processing,‖ real-time IEEE video Trans. signal Consum. Chirala of in the Electronics & Electron., vol. 43, no. 4, pp. 1270–79, department Nov. 1997. Communications Engineering (ECE) with [7] P.-C. Wu and L.-G. Chen, ―An efficient specialization architecture for two-dimensional discrete Systems in VLSI & Embedded wavelet transform,‖ IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 536– Prof. N.Suresh Babu is 545, Apr. 2001. vice-Principal & HOD of [8]. C.-T. Huang, P.-C. Tseng, and L.-G. Chen, ―Memory Architecture Discrete for Wavelet Analysis ECE and Chirala. Three-Dimensional Transform,‖ Dept. M.Tech in in He in Engineering got CEC his Microwave from Birla Proceedings of the IEEE Int. Conf. on Institute of Technology, Ranchi. He has Acoustics, Speech and Signal Processing, 14 years of Teaching Experience and 2 2004, pp. V13–V16. years of Industrial Experience in various [9]. P.-C. Tseng, C.-T. Huang, and L.-G. organisations Chen, ―Generic RAMBased Architecture for Three-Dimensional Discrete Wavelet Transform with Line-Based Method,‖ in Proceedings of the Asia-Pacific Conference on Circuits and Systems, 2002, pp. 363– 366. [10] M. Vishwanath, R. Owens, and M. J. Irwin, ―VLSI architectures for the discrete wavelet transform,‖ IEEE Trans. Circuits Syst. II, Analog. Digit. Signal Process., vol. 42, no. 5, pp. 305–316, May 1995. ISSN: 2231-5381 http://www.ijettjournal.org Page 75