Implementation of Discrete Wavelet Transform for Embedded

advertisement
Implementation of Discrete Wavelet Transform for
Embedded Applications using TMS320VC5510
Abhisek Ukil, Member, IEEE
Adrian Bärlocher
Integrated sensor Systems (RD.S4)
Corporate Research, ABB Switzerland Ltd
CH-5405 Baden-Daettwil, Switzerland
abhisek.ukil@ch.abb.com
Student, Dept. of Electrical Engineering
Swiss Federal Institute of Technology (ETH)
Zurich, Switzerland
adrianb@ee.ethz.ch
Abstract—Wavelet transform is an important signal analysis tool.
However, it is not well-supported like the Fourier transform in
embedded systems domain. Few implementations of the wavelet
transform in the embedded systems domain are too much
concentrated on specific applications like image processing, etc.
In this paper, we present two fixed-point implementation
frameworks for the discrete wavelet transform for real-time
applications of one-dimensional embedded signal processing.
These are based on the convolution and the lifting scheme. The
implementation frameworks are realized and tested using 16-bit
fixed-point Texas Instruments TMS320VC5510 DSP. Application
examples, performance statistics and future scopes are presented
in the scope of this paper.
Keywords- Discrete wavelet transform; DWT; embedded
application; fixed-point implementation; TMS320VC5510
I.
INTRODUCTION
The wavelet transform is an important and powerful signal
analysis tool. Oftentimes it also provides a lot of upper-hand
over the traditional Fourier transform (FFT)-based signal
analysis. Despite being highly prospective in terms of its
applicability, the wavelet transform is rather a relatively new
field compared to the established Fourier transform domain.
Hence, from implementation point of view it is not yet welladapted in the semiconductor industry. For example, we can
find the FFT library function in most of the digital signal
processors (DSPs) like Texas Instruments-TMS family, Analog
Devices-ADSP family and so on. However, it's rare or nothing
really available as standard library function for the DSPs when
it comes to the discrete wavelet transform (DWT), apart from
rather dedicated image-processing or video application specific
DSPs.
II.
WAVELET TRANSFORM
A. Continuous and Discrete Wavelet Transform
The Wavelet transform (WT) is a mathematical tool, like
the Fourier transform (FT) for signal analysis. A wavelet is an
oscillatory waveform of effectively limited duration that has an
average value of zero. Fourier analysis consists of breaking up
a signal into sine waves of various frequencies. Similarly,
wavelet analysis is the breaking up of a signal into shifted and
scaled versions of the original (or mother) wavelet. Fig. 1
shows the basis functions for the FT (Sine wave) and the WT
(db10: Daubechies 10 mother wavelet) [1].
The Continuous Wavelet Transform (CWT) is defined as
the sum over all time of the signal multiplied by the scaled and
shifted versions of the wavelet function ψ . The CWT of a
signal x(t) is defined as
∞
CWT (a, b) = ∫ x(t )ψ a*,b (t ) dt ,
(1)
ψ ((t − b) / a ) .
(2)
−∞
where,
ψ a ,b (t ) = a
−1 / 2
ψ (t ) is the mother wavelet, the asterisk in (1) denotes a
complex conjugate, and a, b ∈ R , a ≠ 0 , (R is a real
continuous number system) are the scaling and shifting
parameters respectively. a
−1 / 2
is the normalization value of
ψ a ,b (t ) so that if ψ (t ) has a unit length, then its scaled
version ψ a ,b (t ) also has a unit length.
In this paper, we present generic implementation of DWT
for the embedded platforms. The implementation framework
for the DWT for the embedded platforms would be developed
as a standard library which could be, in the ideal case, utilized
generically in future projects much like the library function
FFT and the like.
In this paper, the wavelet transform is briefly reviewed in
section II. The implementation details are provided in section
III, followed by results and conclusion in section IV and V.
This work was supported by Corporate Research, ABB Switzerland Ltd.
Figure 1.
Basis functions for the Fourier transform & the wavelet transform.
Instead of continuous scaling and shifting, the mother
wavelet maybe scaled and shifted discretely by choosing
a = a 0m , b = na 0m b0 , t = kT in (1) & (2), where T = 1.0 and
k , m, n ∈ Z , (Z is the set of positive integers). Then, the
Discrete Wavelet Transform (DWT) is given by.
DWT ( m, n) = a0− m / 2 (∑ x[k ]ψ * [(k − na0m b0 ) / a0m ]) .
(3)
B. Multiresolution Signal Decomposition
By careful selection of a0 and b0 , the family of scaled and
shifted mother wavelets constitutes an orthonormal basis. With
this choice of a0 and b0 , there exists a novel algorithm,
known as multiresolution signal decomposition [2] technique,
to decompose a signal into scales with different time and
frequency resolution. The MSD [2] technique decomposes a
given signal into its detailed and smoothed versions at scale 1
into c1[n] and d1[n], where c1[n] is the smoothed version of the
original signal, and d1[n] is the detailed version of the original
signal x[n]. They are defined as
c1[n] = ∑ h[ k − 2n] x[ k ] ,
(4)
d1[n] = ∑ g[k − 2n] x[k ] ,
(5)
k
k
where h[n] and g[n] are the associated filter coefficients
that decompose x[n] into c1[n] and d1[n] respectively. The next
higher scale decomposition will be based on the smoothed
version c1[n]. This process can be iterative. The MSD
technique can be realized with the cascaded quadrature mirror
filter (QMF) [3] banks. A QMF pair consists of two finite
impulse response filters, one being a low-pass filter (LPF) and
the other a high-pass filter (HPF) as shown in Fig. 2.
C. Example Applications
We consider a signal with varying frequency as shown in
Fig. 3, plot (i). Then using the traditional FT, we can only get
the information of the frequency contents of the signal,
however, not the instant of the frequency change (Fig. 3, plot
(ii)). However, this can be done aptly using the WT, as shown
in Fig. 3, plot (iii). Here, Daubechies 5 (db5) [1] mother
wavelet has been used. Therefore, by combining the WT
description with the FT one, we get more powerful insight into
the signal. WT has been quite successful in the fields of
transient analysis, fast change detection, image processing,
signal denoising, data compression and so forth.
Figure 2.
Muliresolution signal decomposition and QMF.
Figure 3.
Varying frequency signal analysis using the FT and the WT.
III.
IMPLEMENTATION OF DWT
Although the WT, specifically the DWT has certain
advantages over the FT-based signal analysis, the former is yet
to be adapted fully in the embedded systems domain. There has
been ongoing focus on utilizations of DWT in many embedded
systems, like image processing, video applications, etc [4].
However, for other embedded scenario like process
instrumentation, signal analysis etc, instances of availability of
DWT as a standard optimized library function in rather scarce.
In this paper, we focus on implementations of DWT as a
standard library function. We focus on implementation of the
DWT using the lifting scheme [5] and the filter bank [3,5]
scheme. We looked into implementations of the standard Haar
[1], Daubechies 4 (db4) [1] mother wavelets. The algorithms
are first tested in Matlab® and then implemented using C on a
TMS320VC5510 DSK board [6], comprising of a 16-bit fixedpoint Texas Instruments TMS320VC5510 DSP [6]. Details of
the DSP’s core can be referred to in [6].
A. Fixed-point Implementation
A main focus of this paper is dealing with the fixed-point
implementation on the DSP. Basically it is possible to apply the
wavelet transform to floating point signals, however, studies
showed that it takes much more (about a factor of ten) cycles to
calculate the decomposition coefficients this way. Therefore,
we concentrate on the fixed-point calculations for both the
implementations.
Under these conditions, the decomposition filters are
multiplied by a factor (preferably a power of two) to achieve
filter coefficients which are larger than one. There is a tradeoff, in that the filter coefficients and the input signal should
have a value range as large as possible for accurate
decomposition values and the increasing risk of an overflow or
the increasing execution time if using data types with an even
larger value range. In both the implementations, the input and
the output data structures are of the type DATA [6], which is a
16 bit fixed-point signed integer type. Compared to this, the
LDATA [6] data structure is a 32 bit fixed-point signed integer
type.
B. Convolution Approach
The convolution-based approach is concentrated on the
convolution sum of the smoothed and the detailed version filter
coefficients, shown in (4) & (5). The fixed-point
implementation is shown in the flow chart in Fig. 4.
The implementation utilizes the convolution function,
‘convol2’ [6], from the TMS320C55x DSP library. The current
implementation has no overflow handling (neither the convol2
function) and is therefore left to the programmer. As in the
current implementation, we do not perform any memory
allocation, the programmer has to provide sufficient memory
space as well. As described above, the DSP performs its
calculations in fixed-point, therefore the filter coefficients are
multiplied by 215 .
The implemented DWT function expects four input
arguments: the input vector, already extended, number of
values in the original input vector (without counting the
extended values), pointer to the output vector and the wavelet
name. Then, the pointers to the decomposition filters
(smoothed and detailed) and the corresponding lengths are set,
based on the wavelet name. The ‘convol2’ [6] convolution
function is used to calculate the two decomposition vectors,
filtered by the low and highpass filters. The resulting
convolved vector is right-shifted by 15 bits, which leads, with
the left-shifted filter coefficients, to a fixed-point output result.
After the convolution, the output vectors are decimated and
stored consecutively.
C. Lifting Scheme
The lifting scheme takes the iterative approach based on the
MSD and the QMF structure as shown in Fig. 2. The lifting
scheme is also utilized in hardware implementations of DWT
[4]. Fig. 5 shows the flowchart of the lifting scheme-based
implementation.
For solving the fixed-point problem of the coefficients, they
are multiplied by a factor of 2 4 = 16 . If the coefficients are
multiplied by a factor, also the other operands in the addition
have to be multiplied by the same value for conserving the
proportion of the operands. A relatively small multiplication
factor 16 is chosen, to be certain not to exceed the value range
of the data types DATA [6] and LDATA [6].
The implemented DWT function expects six input
arguments: the input vector of length N, two (decomposition)
output vectors, the wavelet name and the extension mode.
Compared to the convolution approach, the extension is
performed within the function in this implementation. In the
difference equation-based [5] decomposition stage, we
shifted/multiplied the input values and the coefficients several
times. For the final result, these factors have to be canceled out
by right shifting.
IV.
RESULTS
A. Performance on TMS320VC5510
We tested the implementations on the TMS320VC5510
DSP [6]. The performance characteristics in terms of the code
size, execution cycle times (including and excluding external
function calls) are shown in Table I.
Figure 5. Flowchart of the implementation of the lifting scheme.
Figure 4. Flowchart of the implementation of the convolution approach.
TABLE I.
PERFORMANCE OF DIFFERENT IMPLEMENTATIONS
Obviously, the convolution-based approach is about three
times faster than the lifting scheme. The decimation of the
output vector in the convolution approach is even slower than
the convolution. In the lifting scheme implementation, a lot of
external function calls are performed.
B. Discussion
• In this paper, we considered two implementation
frameworks: convolution-based approach and lifting
scheme. Convolution function is generally available as
standard library function in DSPs. Therefore, it could
be utilized generically, independent of the DSP. Lifting
scheme approach is based on difference equation-based
implementation of the wavelet function [5].
•
Implementation and testing of the DWT on
TMS320VC5510 is promising. Additionally, the
implementation schemes could be extended to other
DSP platforms by customizing the function inputoutput values, keeping the framework same.
•
Fixed-point implementation is particularly important
for real-time embedded applications, e.g., in process
instrumentation industry and the like.
•
Any assembler level optimization, special instructions
(e.g., MAC) or memory access (e.g., ring buffers) have
not been performed. This leaves scopes for further
performance upgrade.
•
The implementations were tested (in the scope of this
paper) using fixed input signals, without using the
internal DMA controller of the DSP. Concentrating on
the fact that we are interested in real-time processing of
small and medium-sized one-dimensional signals,
memory accessing in this context does not play a very
important role compared to transforming images or
videos.
C. Future Work
• Further optimization of the implementations at the
assembler level. More look into particularly the
overflow handling.
•
Testing the implementation for different real-time
process instrumentation and embedded onedimensional signal processing applications, e.g., fast
transient detection, abrupt change detection, adaptive
signal denoising, time-frequency analysis, analysis of
multi-frequency signals and so on.
•
Extending and testing the implementation framework
from the TMS320VC5510 DSP [6] to other platforms.
•
Implementations of other important mother wavelets
like the daubechies 10 [1], coiflets [2], symmlets [1]
and the like.
•
Implementation of the inverse discrete wavelet
transform (IDWT) for the convolution and the lifting
scheme.
V.
CONCLUSION
The wavelet transform is an important and powerful signal
analysis tool, oftentimes providing a lot of upper-hand over the
traditional Fourier transform-based signal analysis. Despite
being highly prospective in terms of its applicability, the
wavelet transform is rather a relatively new field and hence,
from implementation point of view it is not yet well-adapted in
the embedded systems domain. In this paper, we presented two
generic frameworks for implementing the discrete wavelet
transform (DWT) as a standard library function. In the first
approach, we utilized the convolution sum approach for the
decomposition filters (smoothed: lowpass, detailed: highpass).
In the second approach, we used the lifting scheme, utilizing
the concept of multiresolution signal decomposition and
iterative quadrature mirror filter tree-structure. Difference
equation-based descriptions of the wavelet functions and
iterative techniques are utilized. Both the frameworks are
implemented as fixed-point implementations using 16-bit
fixed-point Texas Instruments TMS320VC5510 DSP.
Performances of the implementations are promising towards
developing further generic and standard fixed-point DWT
library function for embedded applications.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
I. Daubechies, Ten Lectures on Wavelets, Society for Industrial and
Applied Mathematics, Philadelphia, 1992.
S. Mallat, "A theory for multiresolution signal decomposition: the
wavelet representation," IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 11, no. 7, pp. 674-693, 1989.
G. Strang and T. Nguyen, Wavelets and filter banks, WellesleyCambridge Press, Wellesley, 1996.
K. Andra, C. Chakrabarti, T. Acharya, "A VLSI architecture for liftingbased forward and inverse wavelet transform," IEEE Transactions on
Signal Processing, vol. 50, no. 4, pp. 966-977, 2002.
A. Jensen and A. la Cour-Harbo, Ripples in Mathematics: The Discrete
Wavelet Transform, Springer, Heidelberg, 2001.
Texas Instruments, TMS320VC5510 datasheet and manual, 2006.
Available: http://www.ti.com
Download