Implementation of Discrete Wavelet Transform for Embedded Applications using TMS320VC5510 Abhisek Ukil, Member, IEEE Adrian Bärlocher Integrated sensor Systems (RD.S4) Corporate Research, ABB Switzerland Ltd CH-5405 Baden-Daettwil, Switzerland abhisek.ukil@ch.abb.com Student, Dept. of Electrical Engineering Swiss Federal Institute of Technology (ETH) Zurich, Switzerland adrianb@ee.ethz.ch Abstract—Wavelet transform is an important signal analysis tool. However, it is not well-supported like the Fourier transform in embedded systems domain. Few implementations of the wavelet transform in the embedded systems domain are too much concentrated on specific applications like image processing, etc. In this paper, we present two fixed-point implementation frameworks for the discrete wavelet transform for real-time applications of one-dimensional embedded signal processing. These are based on the convolution and the lifting scheme. The implementation frameworks are realized and tested using 16-bit fixed-point Texas Instruments TMS320VC5510 DSP. Application examples, performance statistics and future scopes are presented in the scope of this paper. Keywords- Discrete wavelet transform; DWT; embedded application; fixed-point implementation; TMS320VC5510 I. INTRODUCTION The wavelet transform is an important and powerful signal analysis tool. Oftentimes it also provides a lot of upper-hand over the traditional Fourier transform (FFT)-based signal analysis. Despite being highly prospective in terms of its applicability, the wavelet transform is rather a relatively new field compared to the established Fourier transform domain. Hence, from implementation point of view it is not yet welladapted in the semiconductor industry. For example, we can find the FFT library function in most of the digital signal processors (DSPs) like Texas Instruments-TMS family, Analog Devices-ADSP family and so on. However, it's rare or nothing really available as standard library function for the DSPs when it comes to the discrete wavelet transform (DWT), apart from rather dedicated image-processing or video application specific DSPs. II. WAVELET TRANSFORM A. Continuous and Discrete Wavelet Transform The Wavelet transform (WT) is a mathematical tool, like the Fourier transform (FT) for signal analysis. A wavelet is an oscillatory waveform of effectively limited duration that has an average value of zero. Fourier analysis consists of breaking up a signal into sine waves of various frequencies. Similarly, wavelet analysis is the breaking up of a signal into shifted and scaled versions of the original (or mother) wavelet. Fig. 1 shows the basis functions for the FT (Sine wave) and the WT (db10: Daubechies 10 mother wavelet) [1]. The Continuous Wavelet Transform (CWT) is defined as the sum over all time of the signal multiplied by the scaled and shifted versions of the wavelet function ψ . The CWT of a signal x(t) is defined as ∞ CWT (a, b) = ∫ x(t )ψ a*,b (t ) dt , (1) ψ ((t − b) / a ) . (2) −∞ where, ψ a ,b (t ) = a −1 / 2 ψ (t ) is the mother wavelet, the asterisk in (1) denotes a complex conjugate, and a, b ∈ R , a ≠ 0 , (R is a real continuous number system) are the scaling and shifting parameters respectively. a −1 / 2 is the normalization value of ψ a ,b (t ) so that if ψ (t ) has a unit length, then its scaled version ψ a ,b (t ) also has a unit length. In this paper, we present generic implementation of DWT for the embedded platforms. The implementation framework for the DWT for the embedded platforms would be developed as a standard library which could be, in the ideal case, utilized generically in future projects much like the library function FFT and the like. In this paper, the wavelet transform is briefly reviewed in section II. The implementation details are provided in section III, followed by results and conclusion in section IV and V. This work was supported by Corporate Research, ABB Switzerland Ltd. Figure 1. Basis functions for the Fourier transform & the wavelet transform. Instead of continuous scaling and shifting, the mother wavelet maybe scaled and shifted discretely by choosing a = a 0m , b = na 0m b0 , t = kT in (1) & (2), where T = 1.0 and k , m, n ∈ Z , (Z is the set of positive integers). Then, the Discrete Wavelet Transform (DWT) is given by. DWT ( m, n) = a0− m / 2 (∑ x[k ]ψ * [(k − na0m b0 ) / a0m ]) . (3) B. Multiresolution Signal Decomposition By careful selection of a0 and b0 , the family of scaled and shifted mother wavelets constitutes an orthonormal basis. With this choice of a0 and b0 , there exists a novel algorithm, known as multiresolution signal decomposition [2] technique, to decompose a signal into scales with different time and frequency resolution. The MSD [2] technique decomposes a given signal into its detailed and smoothed versions at scale 1 into c1[n] and d1[n], where c1[n] is the smoothed version of the original signal, and d1[n] is the detailed version of the original signal x[n]. They are defined as c1[n] = ∑ h[ k − 2n] x[ k ] , (4) d1[n] = ∑ g[k − 2n] x[k ] , (5) k k where h[n] and g[n] are the associated filter coefficients that decompose x[n] into c1[n] and d1[n] respectively. The next higher scale decomposition will be based on the smoothed version c1[n]. This process can be iterative. The MSD technique can be realized with the cascaded quadrature mirror filter (QMF) [3] banks. A QMF pair consists of two finite impulse response filters, one being a low-pass filter (LPF) and the other a high-pass filter (HPF) as shown in Fig. 2. C. Example Applications We consider a signal with varying frequency as shown in Fig. 3, plot (i). Then using the traditional FT, we can only get the information of the frequency contents of the signal, however, not the instant of the frequency change (Fig. 3, plot (ii)). However, this can be done aptly using the WT, as shown in Fig. 3, plot (iii). Here, Daubechies 5 (db5) [1] mother wavelet has been used. Therefore, by combining the WT description with the FT one, we get more powerful insight into the signal. WT has been quite successful in the fields of transient analysis, fast change detection, image processing, signal denoising, data compression and so forth. Figure 2. Muliresolution signal decomposition and QMF. Figure 3. Varying frequency signal analysis using the FT and the WT. III. IMPLEMENTATION OF DWT Although the WT, specifically the DWT has certain advantages over the FT-based signal analysis, the former is yet to be adapted fully in the embedded systems domain. There has been ongoing focus on utilizations of DWT in many embedded systems, like image processing, video applications, etc [4]. However, for other embedded scenario like process instrumentation, signal analysis etc, instances of availability of DWT as a standard optimized library function in rather scarce. In this paper, we focus on implementations of DWT as a standard library function. We focus on implementation of the DWT using the lifting scheme [5] and the filter bank [3,5] scheme. We looked into implementations of the standard Haar [1], Daubechies 4 (db4) [1] mother wavelets. The algorithms are first tested in Matlab® and then implemented using C on a TMS320VC5510 DSK board [6], comprising of a 16-bit fixedpoint Texas Instruments TMS320VC5510 DSP [6]. Details of the DSP’s core can be referred to in [6]. A. Fixed-point Implementation A main focus of this paper is dealing with the fixed-point implementation on the DSP. Basically it is possible to apply the wavelet transform to floating point signals, however, studies showed that it takes much more (about a factor of ten) cycles to calculate the decomposition coefficients this way. Therefore, we concentrate on the fixed-point calculations for both the implementations. Under these conditions, the decomposition filters are multiplied by a factor (preferably a power of two) to achieve filter coefficients which are larger than one. There is a tradeoff, in that the filter coefficients and the input signal should have a value range as large as possible for accurate decomposition values and the increasing risk of an overflow or the increasing execution time if using data types with an even larger value range. In both the implementations, the input and the output data structures are of the type DATA [6], which is a 16 bit fixed-point signed integer type. Compared to this, the LDATA [6] data structure is a 32 bit fixed-point signed integer type. B. Convolution Approach The convolution-based approach is concentrated on the convolution sum of the smoothed and the detailed version filter coefficients, shown in (4) & (5). The fixed-point implementation is shown in the flow chart in Fig. 4. The implementation utilizes the convolution function, ‘convol2’ [6], from the TMS320C55x DSP library. The current implementation has no overflow handling (neither the convol2 function) and is therefore left to the programmer. As in the current implementation, we do not perform any memory allocation, the programmer has to provide sufficient memory space as well. As described above, the DSP performs its calculations in fixed-point, therefore the filter coefficients are multiplied by 215 . The implemented DWT function expects four input arguments: the input vector, already extended, number of values in the original input vector (without counting the extended values), pointer to the output vector and the wavelet name. Then, the pointers to the decomposition filters (smoothed and detailed) and the corresponding lengths are set, based on the wavelet name. The ‘convol2’ [6] convolution function is used to calculate the two decomposition vectors, filtered by the low and highpass filters. The resulting convolved vector is right-shifted by 15 bits, which leads, with the left-shifted filter coefficients, to a fixed-point output result. After the convolution, the output vectors are decimated and stored consecutively. C. Lifting Scheme The lifting scheme takes the iterative approach based on the MSD and the QMF structure as shown in Fig. 2. The lifting scheme is also utilized in hardware implementations of DWT [4]. Fig. 5 shows the flowchart of the lifting scheme-based implementation. For solving the fixed-point problem of the coefficients, they are multiplied by a factor of 2 4 = 16 . If the coefficients are multiplied by a factor, also the other operands in the addition have to be multiplied by the same value for conserving the proportion of the operands. A relatively small multiplication factor 16 is chosen, to be certain not to exceed the value range of the data types DATA [6] and LDATA [6]. The implemented DWT function expects six input arguments: the input vector of length N, two (decomposition) output vectors, the wavelet name and the extension mode. Compared to the convolution approach, the extension is performed within the function in this implementation. In the difference equation-based [5] decomposition stage, we shifted/multiplied the input values and the coefficients several times. For the final result, these factors have to be canceled out by right shifting. IV. RESULTS A. Performance on TMS320VC5510 We tested the implementations on the TMS320VC5510 DSP [6]. The performance characteristics in terms of the code size, execution cycle times (including and excluding external function calls) are shown in Table I. Figure 5. Flowchart of the implementation of the lifting scheme. Figure 4. Flowchart of the implementation of the convolution approach. TABLE I. PERFORMANCE OF DIFFERENT IMPLEMENTATIONS Obviously, the convolution-based approach is about three times faster than the lifting scheme. The decimation of the output vector in the convolution approach is even slower than the convolution. In the lifting scheme implementation, a lot of external function calls are performed. B. Discussion • In this paper, we considered two implementation frameworks: convolution-based approach and lifting scheme. Convolution function is generally available as standard library function in DSPs. Therefore, it could be utilized generically, independent of the DSP. Lifting scheme approach is based on difference equation-based implementation of the wavelet function [5]. • Implementation and testing of the DWT on TMS320VC5510 is promising. Additionally, the implementation schemes could be extended to other DSP platforms by customizing the function inputoutput values, keeping the framework same. • Fixed-point implementation is particularly important for real-time embedded applications, e.g., in process instrumentation industry and the like. • Any assembler level optimization, special instructions (e.g., MAC) or memory access (e.g., ring buffers) have not been performed. This leaves scopes for further performance upgrade. • The implementations were tested (in the scope of this paper) using fixed input signals, without using the internal DMA controller of the DSP. Concentrating on the fact that we are interested in real-time processing of small and medium-sized one-dimensional signals, memory accessing in this context does not play a very important role compared to transforming images or videos. C. Future Work • Further optimization of the implementations at the assembler level. More look into particularly the overflow handling. • Testing the implementation for different real-time process instrumentation and embedded onedimensional signal processing applications, e.g., fast transient detection, abrupt change detection, adaptive signal denoising, time-frequency analysis, analysis of multi-frequency signals and so on. • Extending and testing the implementation framework from the TMS320VC5510 DSP [6] to other platforms. • Implementations of other important mother wavelets like the daubechies 10 [1], coiflets [2], symmlets [1] and the like. • Implementation of the inverse discrete wavelet transform (IDWT) for the convolution and the lifting scheme. V. CONCLUSION The wavelet transform is an important and powerful signal analysis tool, oftentimes providing a lot of upper-hand over the traditional Fourier transform-based signal analysis. Despite being highly prospective in terms of its applicability, the wavelet transform is rather a relatively new field and hence, from implementation point of view it is not yet well-adapted in the embedded systems domain. In this paper, we presented two generic frameworks for implementing the discrete wavelet transform (DWT) as a standard library function. In the first approach, we utilized the convolution sum approach for the decomposition filters (smoothed: lowpass, detailed: highpass). In the second approach, we used the lifting scheme, utilizing the concept of multiresolution signal decomposition and iterative quadrature mirror filter tree-structure. Difference equation-based descriptions of the wavelet functions and iterative techniques are utilized. Both the frameworks are implemented as fixed-point implementations using 16-bit fixed-point Texas Instruments TMS320VC5510 DSP. Performances of the implementations are promising towards developing further generic and standard fixed-point DWT library function for embedded applications. REFERENCES [1] [2] [3] [4] [5] [6] I. Daubechies, Ten Lectures on Wavelets, Society for Industrial and Applied Mathematics, Philadelphia, 1992. S. Mallat, "A theory for multiresolution signal decomposition: the wavelet representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674-693, 1989. G. Strang and T. Nguyen, Wavelets and filter banks, WellesleyCambridge Press, Wellesley, 1996. K. Andra, C. Chakrabarti, T. Acharya, "A VLSI architecture for liftingbased forward and inverse wavelet transform," IEEE Transactions on Signal Processing, vol. 50, no. 4, pp. 966-977, 2002. A. Jensen and A. la Cour-Harbo, Ripples in Mathematics: The Discrete Wavelet Transform, Springer, Heidelberg, 2001. Texas Instruments, TMS320VC5510 datasheet and manual, 2006. Available: http://www.ti.com