Academic Press is an imprint of Elsevier
30 Corporate Drive, Suite 400
Burlington, MA 01803
This book is printed on acid-free paper. ⬁
Copyright © 2009 by Elsevier Inc. All rights reserved.
Designations used by companies to distinguish their products are often claimed as trade-marks or
registered trademarks. In all instances in which Academic Press is aware of a claim, the product
names appear in initial capital or all capital letters. Readers, however, should contact the appropriate
companies for more complete information regarding trademarks and registration.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
or by any means, electronic, mechanical, photocopying, scanning, or otherwise, without prior written
permission of the publisher.
Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in
Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: permissions@elsevier.com.
You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by
selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.”
Library of Congress Cataloging-in-Publication Data
Application submitted
ISBN 13: 978-0-12-374370-1
For information on all Academic Press publications,
visit our Website at www.books.elsevier.com
Printed in the United States
08 09 10 11 12 10 9 8 7 6 5 4 3 2 1
À la mémoire de mon père, Alexandre.
Pour ma mère, Francine.
Preface to the Sparse Edition
I cannot help but find striking resemblances between scientific communities and
schools of fish. We interact in conferences and through articles, and we move
together while a global trajectory emerges from individual contributions. Some of
us like to be at the center of the school, others prefer to wander around, and a few
swim in multiple directions in front. To avoid dying by starvation in a progressively
narrower and specialized domain, a scientific community needs also to move on.
Computational harmonic analysis is still very much alive because it went beyond
wavelets. Writing such a book is about decoding the trajectory of the school and
gathering the pearls that have been uncovered on the way. Wavelets are no longer
the central topic, despite the previous edition’s original title. It is just an important
tool, as the Fourier transform is. Sparse representation and processing are now at
the core.
In the 1980s,many researchers were focused on building time-frequency decompositions,trying to avoid the uncertainty barrier,and hoping to discover the ultimate
representation. Along the way came the construction of wavelet orthogonal bases,
which opened new perspectives through collaborations with physicists and mathematicians. Designing orthogonal bases with Xlets became a popular sport with
compression and noise-reduction applications. Connections with approximations
and sparsity also became more apparent. The search for sparsity has taken over,
leading to new grounds where orthonormal bases are replaced by redundant dictionaries of waveforms.
During these last seven years, I also encountered the industrial world. With
a lot of naiveness, some bandlets, and more mathematics, I cofounded a start-up
with Christophe Bernard, Jérome Kalifa, and Erwan Le Pennec. It took us some
time to learn that in three months good engineering should produce robust algorithms that operate in real time, as opposed to the three years we were used
to having for writing new ideas with promising perspectives. Yet, we survived
because mathematics is a major source of industrial innovations for signal processing. Semiconductor technology offers amazing computational power and flexibility.
However, ad hoc algorithms often do not scale easily and mathematics accelerates
the trial-and-error development process. Sparsity decreases computations, memory,
and data communications. Although it brings beauty, mathematical understanding
is not a luxury. It is required by increasingly sophisticated information-processing
devices.
New Additions
Putting sparsity at the center of the book implied rewriting many parts and
adding sections. Chapters 12 and 13 are new. They introduce sparse representations in redundant dictionaries, and inverse problems, super-resolution, and
xv
xvi
Preface to the Sparse Edition
compressive sensing. Here is a small catalog of new elements in this third
edition:
■
Radon transform and tomography
■
Lifting for wavelets on surfaces, bounded domains, and fast computations
■
JPEG-2000 image compression
■
Block thresholding for denoising
■
Geometric representations with adaptive triangulations, curvelets, and
bandlets
■
Sparse approximations in redundant dictionaries with pursuit algorithms
■
Noise reduction with model selection in redundant dictionaries
■
Exact recovery of sparse approximation supports in dictionaries
■
Multichannel signal representations and processing
■
Dictionary learning
■
Inverse problems and super-resolution
■
Compressive sensing
■
Source separation
Teaching
This book is intended as a graduate-level textbook. Its evolution is also the result
of teaching courses in electrical engineering and applied mathematics. A new
website provides software for reproducible experimentations, exercise solutions,
together with teaching material such as slides with figures and MATLAB software
for numerical classes of http://wavelet-tour.com.
More exercises have been added at the end of each chapter, ordered by level
of difficulty. Level1 exercises are direct applications of the course. Level2 exercises
requires more thinking. Level3 includes some technical derivation exercises. Level4
are projects at the interface of research that are possible topics for a final course
project or independent study. More exercises and projects can be found in the
website.
Sparse Course Programs
The Fourier transform and analog-to-digital conversion through linear sampling
approximations provide a common ground for all courses (Chapters 2 and 3).
It introduces basic signal representations and reviews important mathematical
and algorithmic tools needed afterward. Many trajectories are then possible to
explore and teach sparse signal processing. The following list notes several topics that can orient a course’s structure with elements that can be covered along
the way.
Preface to the Sparse Edition
Sparse representations with bases and applications:
Principles of linear and nonlinear approximations in bases (Chapter 9)
Lipschitz regularity and wavelet coefficients decay (Chapter 6)
■ Wavelet bases (Chapter 7)
■ Properties of linear and nonlinear wavelet basis approximations (Chapter 9)
■ Image wavelet compression (Chapter 10)
■ Linear and nonlinear diagonal denoising (Chapter 11)
■
■
Sparse time-frequency representations:
Time-frequency wavelet and windowed Fourier ridges for audio processing
(Chapter 4)
■ Local cosine bases (Chapter 8)
■ Linear and nonlinear approximations in bases (Chapter 9)
■ Audio compression (Chapter 10)
■ Audio denoising and block thresholding (Chapter 11)
■ Compression and denoising in redundant time-frequency dictionaries with
best bases or pursuit algorithms (Chapter 12)
■
Sparse signal estimation:
Bayes versus minimax and linear versus nonlinear estimations (Chapter 11)
Wavelet bases (Chapter 7)
■ Linear and nonlinear approximations in bases (Chapter 9)
■ Thresholding estimation (Chapter 11)
■ Minimax optimality (Chapter 11)
■ Model selection for denoising in redundant dictionaries (Chapter 12)
■ Compressive sensing (Chapter 13)
■
■
Sparse compression and information theory:
Wavelet orthonormal bases (Chapter 7)
Linear and nonlinear approximations in bases (Chapter 9)
■ Compression and sparse transform codes in bases (Chapter 10)
■ Compression in redundant dictionaries (Chapter 12)
■ Compressive sensing (Chapter 13)
■ Source separation (Chapter 13)
■
■
Dictionary representations and inverse problems:
Frames and Riesz bases (Chapter 5)
Linear and nonlinear approximations in bases (Chapter 9)
■ Ideal redundant dictionary approximations (Chapter 12)
■ Pursuit algorithms and dictionary incoherence (Chapter 12)
■ Linear and thresholding inverse estimators (Chapter 13)
■ Super-resolution and source separation (Chapter 13)
■ Compressive sensing (Chapter 13)
■
■
xvii
xviii
Preface to the Sparse Edition
Geometric sparse processing:
Time-frequency spectral lines and ridges (Chapter 4)
Frames and Riesz bases (Chapter 5)
■ Multiscale edge representations with wavelet maxima (Chapter 6)
■ Sparse approximation supports in bases (Chapter 9)
■ Approximations with geometric regularity,curvelets,and bandlets (Chapters 9
and 12)
■ Sparse signal compression and geometric bit budget (Chapters 10 and 12)
■ Exact recovery of sparse approximation supports (Chapter 12)
■ Super-resolution (Chapter 13)
■
■
ACKNOWLEDGMENTS
Some things do not change with new editions, in particular the traces left by the
ones who were, and remain, for me important references. As always, I am deeply
grateful to Ruzena Bajcsy and Yves Meyer.
I spent the last few years with three brilliant and kind colleagues—Christophe
Bernard, Jérome Kalifa, and Erwan Le Pennec—in a pressure cooker called a “startup.” Pressure means stress, despite very good moments. The resulting sauce was a
blend of what all of us could provide,which brought new flavors to our personalities.
I am thankful to them for the ones I got, some of which I am still discovering.
This new edition is the result of a collaboration with Gabriel Peyré, who made
these changes not only possible, but also very interesting to do. I thank him for his
remarkable work and help.
Stéphane Mallat
Preface to the Sparse Edition
ACKNOWLEDGMENTS
Some things do not change with new editions, in particular the traces left by the
ones who were, and remain, for me important references. As always, I am deeply
grateful to Ruzena Bajcsy and Yves Meyer.
I spent the last few years with three brilliant and kind colleagues—Christophe
Bernard, Jérome Kalifa, and Erwan Le Pennec—in a pressure cooker called a “startup.” Pressure means stress, despite very good moments. The resulting sauce was a
blend of what all of us could provide,which brought new flavors to our personalities.
I am thankful to them for the ones I got, some of which I am still discovering.
This new edition is the result of a collaboration with Gabriel Peyré, who made
these changes not only possible, but also very interesting to do. I thank him for his
remarkable work and help.
Stéphane Mallat
xix
Notations
f , g
Inner product (A.6)
f
Euclidean or Hilbert space norm
f 1
L 1 or l1 norm
L ⬁ norm
f ⬁
f [n] ⫽ O(g[n]) Order of: there exists K such that f [n] ⭐ Kg[n]
f [n]
f [n] ⫽ o(g[n]) Small order of: limn→⫹⬁ g[n]
⫽0
f [n] ∼ g[n]
Equivalent to: f [n] ⫽ O( g[n]) and g[n] ⫽ O( f [n])
A ⬍ ⫹⬁
A is finite
AB
A is much bigger than B
z∗
Complex conjugate of z ∈ C
x
Largest integer n ⭐ x
x
Smallest integer n ⭓ x
(x)⫹
max(x, 0)
n mod N
Remainder of the integer division of n modulo N
Sets
N
Z
R
R⫹
C
|⌳|
Positive integers including 0
Integers
Real numbers
Positive real numbers
Complex numbers
Number of elements in a set ⌳
Signals
f (t)
f [n]
␦(t)
␦[n]
1[a,b]
Continuous time signal
Discrete signal
Dirac distribution (A.30)
Discrete Dirac (3.32)
Indicator of a function that is 1 in [a, b] and 0 outside
Spaces
C0
Cp
C⬁
W s (R)
L 2 (R)
L p (R)
2 (Z)
p (Z)
CN
U ⊕V
Uniformly continuous functions (7.207)
p times continuously differentiable functions
Infinitely differentiable functions
Sobolevs times differentiable
functions (9.8)
2 dt ⬍ ⫹⬁
Finite energy functions
|
f
(t)|
p
Functions such that | f (t)| dt ⬍ ⫹⬁
Finite energy discrete signals ⫹⬁
|f [n]|2 ⬍ ⫹⬁
⫹⬁ n⫽⫺⬁ p
Discrete signals such that n⫽⫺⬁ |f [n]| ⬍ ⫹⬁
Complex signals of size N
Direct sum of two vector spaces
xix
xx
Notations
U ⊗V
NullU
ImU
Tensor product of two vector spaces (A.19)
Null space of an operator U
Image space of an operator U
Operators
Id
f ⬘ (t)
f (p) (t)
(x, y)
ⵜf
f g(t)
f g[n]
f
g[n]
Identity
Derivative dfdt(t)
p
Derivative d dtf p(t) of order p
Gradient vector (6.51)
Continuous time convolution (2.2)
Discrete convolution (3.33)
Circular convolution (3.73)
Transforms
f̂ ()
Fourier transform (2.6), (3.39)
f̂ [k]
Discrete Fourier transform (3.49)
Sf (u, s)
Short-time windowed Fourier transform (4.11)
PS f (u, )
Spectrogram (4.12)
Wf (u, s)
Wavelet transform (4.31)
PW f (u, )
Scalogram (4.55)
PV f (u, )
Wigner-Ville distribution (4.120)
Probability
X
E{X}
H(X)
Hd (X)
Cov(X1 , X2 )
F [n]
RF [k]
Random variable
Expected value
Entropy (10.4)
Differential entropy (10.20)
Covariance (A.22)
Random vector
Autocovariance of a stationary process (A.26)
CHAPTER
Sparse Representations
1
Signals carry overwhelming amounts of data in which relevant information is often
more difficult to find than a needle in a haystack. Processing is faster and simpler
in a sparse representation where few coefficients reveal the information we are
looking for. Such representations can be constructed by decomposing signals over
elementary waveforms chosen in a family called a dictionary. But the search for
the Holy Grail of an ideal sparse transform adapted to all signals is a hopeless quest.
The discovery of wavelet orthogonal bases and local time-frequency dictionaries has
opened the door to a huge jungle of new transforms. Adapting sparse representations to signal properties, and deriving efficient processing operators, is therefore a
necessary survival strategy.
An orthogonal basis is a dictionary of minimum size that can yield a sparse representation if designed to concentrate the signal energy over a set of few vectors.This
set gives a geometric signal description. Efficient signal compression and noisereduction algorithms are then implemented with diagonal operators computed
with fast algorithms. But this is not always optimal.
In natural languages, a richer dictionary helps to build shorter and more precise
sentences. Similarly, dictionaries of vectors that are larger than bases are needed
to build sparse representations of complex signals. But choosing is difficult and
requires more complex algorithms. Sparse representations in redundant dictionaries
can improve pattern recognition,compression,and noise reduction,but also the resolution of new inverse problems. This includes superresolution, source separation,
and compressive sensing.
This first chapter is a sparse book representation, providing the story line and
the main ideas. It gives a sense of orientation for choosing a path to travel.
1.1 COMPUTATIONAL HARMONIC ANALYSIS
Fourier and wavelet bases are the journey’s starting point. They decompose signals over oscillatory waveforms that reveal many signal properties and provide
a path to sparse representations. Discretized signals often have a very large
size N ⭓ 106 , and thus can only be processed by fast algorithms, typically implemented with O(N log N ) operations and memories. Fourier and wavelet transforms
1
2
CHAPTER 1 Sparse Representations
illustrate the strong connection between well-structured mathematical tools and
fast algorithms.
1.1.1 The Fourier Kingdom
The Fourier transform is everywhere in physics and mathematics because it diagonalizes time-invariant convolution operators. It rules over linear time-invariant signal
processing, the building blocks of which are frequency filtering operators.
Fourier analysis represents any finite energy function f (t) as a sum of sinusoidal
waves eit :
⫹⬁
1
f̂ () eit d.
f (t) ⫽
(1.1)
2 ⫺⬁
The amplitude f̂ () of each sinusoidal wave eit is equal to its correlation with f ,
also called Fourier transform:
⫹⬁
f̂ () ⫽
f (t) e⫺it dt.
(1.2)
⫺⬁
The more regular f (t), the faster the decay of the sinusoidal wave amplitude | f̂ ()|
when frequency increases.
When f (t) is defined only on an interval, say [0, 1], then the Fourier transform
becomes a decomposition in a Fourier orthonormal basis {ei2mt }m∈Z of L 2 [0, 1].
If f (t) is uniformly regular, then its Fourier transform coefficients also have a fast
decay when the frequency 2m increases, so it can be easily approximated with
few low-frequency Fourier coefficients. The Fourier transform therefore defines a
sparse representation of uniformly regular functions.
Over discrete signals, the Fourier transform is a decomposition in a discrete
orthogonal Fourier basis {ei2kn/N }0⭐k⬍N of CN , which has properties similar to a
Fourier transform on functions. Its embedded structure leads to fast Fourier transform (FFT) algorithms,which compute discrete Fourier coefficients with O(N log N )
instead of N 2 . This FFT algorithm is a cornerstone of discrete signal processing.
As long as we are satisfied with linear time-invariant operators or uniformly
regular signals, the Fourier transform provides simple answers to most questions.
Its richness makes it suitable for a wide range of applications such as signal
transmissions or stationary signal processing. However, to represent a transient
phenomenon—a word pronounced at a particular time, an apple located in the
left corner of an image—the Fourier transform becomes a cumbersome tool that
requires many coefficients to represent a localized event. Indeed, the support of
eit covers the whole real line, so f̂ () depends on the values f (t) for all times
t ∈ R. This global “mix” of information makes it difficult to analyze or represent any
local property of f (t) from f̂ ().
1.1.2 Wavelet Bases
Wavelet bases, like Fourier bases, reveal the signal regularity through the amplitude of coefficients, and their structure leads to a fast computational algorithm.
1.1 Computational Harmonic Analysis
However, wavelets are well localized and few coefficients are needed to represent
local transient structures. As opposed to a Fourier basis, a wavelet basis defines a
sparse representation of piecewise regular signals,which may include transients and
singularities. In images, large wavelet coefficients are located in the neighborhood
of edges and irregular textures.
The story began in 1910, when Haar [291] constructed a piecewise constant
function
⎧
⎨ 1 if 0 ⭐ t ⬍ 1/2
(t) ⫽ ⫺1 if 1/2 ⭐ t ⬍ 1
⎩
0 otherwise
the dilations and translations of which generate an orthonormal basis
1
t ⫺2 jn
j,n (t) ⫽ √
2j
2j
( j,n)∈Z2
of the space L 2 (R) of signals having a finite energy
⫹⬁
f 2 ⫽
| f (t)|2 dt ⬍ ⫹⬁.
⫺⬁
⫹⬁
∗
2
⫺⬁ f (t) g (t) dt—the inner product in L (R). Any finite energy
Let us write f, g ⫽
signal f can thus be represented by its wavelet inner-product coefficients
⫹⬁
f , j,n ⫽
f (t) j,n (t) dt
⫺⬁
and recovered by summing them in this wavelet orthonormal basis:
⫹⬁
f⫽
⫹⬁
f , j,n j,n .
(1.3)
j⫽⫺⬁ n⫽⫺⬁
Each Haar wavelet j,n (t) has a zero average over its support [2 j n, 2 j (n ⫹ 1)]. If f
is locally regular and 2 j is small, then it is nearly constant over this interval and the
wavelet coefficient f , j,n is nearly zero.This means that large wavelet coefficients
are located at sharp signal transitions only.
With a jump in time, the story continues in 1980, when Strömberg [449] found
a piecewise linear function that also generates an orthonormal basis and gives
better approximations of smooth functions. Meyer was not aware of this result,
and motivated by the work of Morlet and Grossmann over continuous wavelet
transform, he tried to prove that there exists no regular wavelet that generates
an orthonormal basis. This attempt was a failure since he ended up constructing
a whole family of orthonormal wavelet bases, with functions that are infinitely
continuously differentiable [375]. This was the fundamental impulse that led to a
widespread search for new orthonormal wavelet bases, which culminated in the
celebrated Daubechies wavelets of compact support [194].
3
4
CHAPTER 1 Sparse Representations
The systematic theory for constructing orthonormal wavelet bases was established by Meyer and Mallat through the elaboration of multiresolution signal
approximations [362], as presented in Chapter 7. It was inspired by original ideas
developed in computer vision by Burt and Adelson [126] to analyze images at several resolutions. Digging deeper into the properties of orthogonal wavelets and
multiresolution approximations brought to light a surprising link with filter banks
constructed with conjugate mirror filters, and a fast wavelet transform algorithm
decomposing signals of size N with O(N ) operations [361].
Filter Banks
Motivated by speech compression,in 1976 Croisier,Esteban,and Galand [189] introduced an invertible filter bank, which decomposes a discrete signal f [n] into two
signals of half its size using a filtering and subsampling procedure. They showed
that f [n] can be recovered from these subsampled signals by canceling the aliasing
terms with a particular class of filters called conjugate mirror filters. This breakthrough led to a 10-year research effort to build a complete filter bank theory.
Necessary and sufficient conditions for decomposing a signal in subsampled components with a filtering scheme, and recovering the same signal with an inverse
transform, were established by Smith and Barnwell [444],Vaidyanathan [469], and
Vetterli [471].
The multiresolution theory of Mallat [362] and Meyer [44] proves that any
conjugate mirror filter characterizes a wavelet that generates an orthonormal basis
of L 2 (R), and that a fast discrete wavelet transform is implemented by cascading
these conjugate mirror filters [361]. The equivalence between this continuous time
wavelet theory and discrete filter banks led to a new fruitful interface between
digital signal processing and harmonic analysis, first creating a culture shock that is
now well resolved.
Continuous versus Discrete and Finite
Originally, many signal processing engineers were wondering what is the point of
considering wavelets and signals as functions,since all computations are performed
over discrete signals with conjugate mirror filters.Why bother with the convergence
of infinite convolution cascades if in practice we only compute a finite number of
convolutions? Answering these important questions is necessary in order to understand why this book alternates between theorems on continuous time functions
and discrete algorithms applied to finite sequences.
A short answer would be “simplicity.” In L 2 (R), a wavelet basis is constructed
by dilating and translating a single function . Several important theorems relate the
amplitude of wavelet coefficients to the local regularity of the signal f . Dilations
are not defined over discrete sequences, and discrete wavelet bases are therefore
more complex to describe. The regularity of a discrete sequence is not well defined
either, which makes it more difficult to interpret the amplitude of wavelet coefficients. A theory of continuous-time functions gives asymptotic results for discrete
1.2 Approximation and Processing in Bases
sequences with sampling intervals decreasing to zero. This theory is useful because
these asymptotic results are precise enough to understand the behavior of discrete
algorithms.
But continuous time or space models are not sufficient for elaborating discrete
signal-processing algorithms.The transition between continuous and discrete signals
must be done with great care to maintain important properties such as orthogonality. Restricting the constructions to finite discrete signals adds another layer of
complexity because of border problems. How these border issues affect numerical implementations is carefully addressed once the properties of the bases are
thoroughly understood.
Wavelets for Images
Wavelet orthonormal bases of images can be constructed from wavelet orthonormal
bases of one-dimensional signals. Three mother wavelets 1 (x), 2 (x), and 3 (x),
with x ⫽ (x1 , x2 ) ∈ R2 ,are dilated by 2 j and translated by 2 j n with n ⫽ (n1 , n2 ) ∈ Z2 .
This yields an orthonormal basis of the space L 2 (R2 ) of finite energy functions
f (x) ⫽ f (x1 , x2 ):
1
x ⫺2 jn
kj,n (x) ⫽ j k
2
2j
j∈Z,n∈Z2 ,1⭐k⭐3
The support of a wavelet kj,n is a square of width proportional to the scale 2 j .
Two-dimensional wavelet bases are discretized to define orthonormal bases of
images including N pixels. Wavelet coefficients are calculated with the fast O(N )
algorithm described in Chapter 7.
Like in one dimension, a wavelet coefficient f , kj,n has a small amplitude if
f (x) is regular over the support of kj,n . It has a large amplitude near sharp transitions such as edges. Figure 1.1(b) is the array of N wavelet coefficients. Each direction k and scale 2 j corresponds to a subimage, which shows in black the position
of the largest coefficients above a threshold: | f , kj,n | ⭓ T .
1.2 APPROXIMATION AND PROCESSING IN BASES
Analog-to-digital signal conversion is the first step of digital signal processing.
Chapter 3 explains that it amounts to projecting the signal over a basis of an approximation space. Most often, the resulting digital representation remains much too
large and needs to be further reduced. A digital image typically includes more than
106 samples and a CD music recording has 40 ⫻ 103 samples per second. Sparse
representations that reduce the number of parameters can be obtained by thresholding coefficients in an appropriate orthogonal basis. Efficient compression and
noise-reduction algorithms are then implemented with simple operators in this
basis.
5
6
CHAPTER 1 Sparse Representations
(a)
(b)
(c)
(d)
FIGURE 1.1
(a) Discrete image f [n] of N ⫽ 2562 pixels. (b) Array of N orthogonal wavelet coefficients
f , kj,n for k ⫽ 1, 2, 3, and 4 scales 2 j ; black points correspond to | f , kj,n | ⬎ T . (c) Linear
approximation from the N /16 wavelet coefficients at the three largest scales. (d) Nonlinear
approximation from the M ⫽ N /16 wavelet coefficients of largest amplitude shown in (b).
Stochastic versus Deterministic Signal Models
A representation is optimized relative to a signal class, corresponding to all potential signals encountered in an application. This requires building signal models that
carry available prior information.
A signal f can be modeled as a realization of a random process F ,the probability
distribution of which is known a priori. A Bayesian approach then tries to minimize
1.2 Approximation and Processing in Bases
the expected approximation error. Linear approximations are simpler because they
only depend on the covariance. Chapter 9 shows that optimal linear approximations are obtained on the basis of principal components that are the eigenvectors
of the covariance matrix. However,the expected error of nonlinear approximations
depends on the full probability distribution of F . This distribution is most often
not known for complex signals, such as images or sounds, because their transient
structures are not adequately modeled as realizations of known processes such as
Gaussian ones.
To optimize nonlinear representations, weaker but sufficiently powerful deterministic models can be elaborated. A deterministic model specifies a set ⌰, where
the signal belongs.This set is defined by any prior information—for example,on the
time-frequency localization of transients in musical recordings or on the geometric
regularity of edges in images. Simple models can also define ⌰ as a ball in a functional
space, with a specific regularity norm such as a total variation norm. A stochastic
model is richer because it provides the probability distribution in ⌰. When this distribution is not available, the average error cannot be calculated and is replaced by
the maximum error over ⌰. Optimizing the representation then amounts to minimizing this maximum error, which is called a minimax optimization.
1.2.1 Sampling with Linear Approximations
Analog-to-digital signal conversion is most often implemented with a linear approximation operator that filters and samples the input analog signal. From these samples,
a linear digital-to-analog converter recovers a projection of the original analog signal
over an approximation space whose dimension depends on the sampling density.
Linear approximations project signals in spaces of lowest possible dimensions to
reduce computations and storage cost, while controlling the resulting error.
Sampling Theorems
Let us consider finite energy signals f̄ 2 ⫽ | f̄ (x)|2 dx of finite support, which is
normalized to [0, 1] or [0, 1]2 for images. A sampling process implements a filtering
¯ s (x) and a uniform sampling to output
of f̄ (x) with a low-pass impulse response
a discrete signal:
¯ s (ns)
f [n] ⫽ f̄
for
0⭐n⬍N.
In two dimensions, n ⫽ (n1 , n2 ) and x ⫽ (x1 , x2 ). These filtered samples can also be
written as inner products:
¯ s (ns) ⫽ f (u) ¯ s (ns ⫺ u) du ⫽ f (x), s (x ⫺ ns)
f̄
¯ s (⫺x). Chapter 3 explains that s is chosen, like in the claswith s (x) ⫽
sic Shannon–Whittaker sampling theorem, so that a family of functions {s
(x ⫺ ns)}1⭐n⭐N is a basis of an appropriate approximation space UN . The best linear approximation of f̄ in UN recovered from these samples is the orthogonal
7
8
CHAPTER 1 Sparse Representations
projection f̄N of f in UN , and if the basis is orthonormal, then
N ⫺1
f̄N (x) ⫽
f [n] s (x ⫺ ns).
(1.4)
n⫽0
A sampling theorem states that if f̄ ∈ UN then f̄ ⫽ f̄N so (1.4) recovers f̄ (x)
from the measured samples. Most often, f̄ does not belong to this approximation
space. It is called aliasing in the context of Shannon–Whittaker sampling, where
UN is the space of functions having a frequency support restricted to the N lower
frequencies. The approximation error f̄ ⫺ f̄N 2 must then be controlled.
Linear Approximation Error
The approximation error is computed by finding an orthogonal basis B ⫽
{ḡm (x)}0⭐m⬍⫹⬁ of the whole analog signal space L 2 [0, 1]2 , with the first N vector {ḡm (x)}0⭐m⬍N that defines an orthogonal basis of UN . Thus, the orthogonal
projection on UN can be rewritten as
N ⫺1
f̄N (x) ⫽
f̄ , ḡm ḡm (x).
m⫽0
Since f̄ ⫽ ⫹⬁
m⫽0 f̄ , ḡm ḡm , the approximation error is the energy of the removed
inner products:
⫹⬁
l (N , f ) ⫽ f̄ ⫺ f̄N 2 ⫽
| f̄ , ḡm |2 .
m⫽N
This error decreases quickly when N increases if the coefficient amplitudes | f̄ , ḡm |
have a fast decay when the index m increases. The dimension N is adjusted to the
desired approximation error.
Figure 1.1(a) shows a discrete image f [n] approximated with N ⫽ 2562 pixels.
Figure 1.1(c) displays a lower-resolution image fN /16 projected on a space UN /16 of
dimension N /16, generated by N /16 large-scale wavelets. It is calculated by setting
all the wavelet coefficients to zero at the first two smaller scales.The approximation
error is f ⫺ fN /16 2 / f 2 ⫽ 14 ⫻ 10⫺3 . Reducing the resolution introduces more
blur and errors. A linear approximation space UN corresponds to a uniform grid
that approximates precisely uniform regular signals. Since images f̄ are often not
uniformly regular, it is necessary to measure it at a high-resolution N . This is why
digital cameras have a resolution that increases as technology improves.
1.2.2 Sparse Nonlinear Approximations
Linear approximations reduce the space dimensionality but can introduce important
errors when reducing the resolution if the signal is not uniformly regular, as shown
by Figure 1.1(c). To improve such approximations, more coefficients should be
kept where needed—not in regular regions but near sharp transitions and edges.
1.2 Approximation and Processing in Bases
This requires defining an irregular sampling adapted to the local signal regularity.
This optimized irregular sampling has a simple equivalent solution through nonlinear
approximations in wavelet bases.
Nonlinear approximations operate in two stages. First, a linear operator approximates the analog signal f̄ with N samples written f [n] ⫽ f̄ ¯ s (ns). Then, a
nonlinear approximation of f [n] is computed to reduce the N coefficients f [n]
to M N coefficients in a sparse representation.
The discrete signal f can be considered as a vector of CN. Inner products and
norms in CN are written
N ⫺1
f , g ⫽
f [n] g ∗ [n]
N ⫺1
and
f 2 ⫽
n⫽0
| f [n]|2 .
n⫽0
To obtain a sparse representation with a nonlinear approximation,we choose a new
orthonormal basis B ⫽ {gm [n]}m∈⌫ of CN , which concentrates the signal energy as
much as possible over few coefficients. Signal coefficients { f , gm }m∈⌫ are computed from the N input sample values f [n] with an orthogonal change of basis
that takes N 2 operations in nonstructured bases. In a wavelet or Fourier bases, fast
algorithms require, respectively, O(N ) and O(N log2 N ) operations.
Approximation by Thresholding
For M ⬍ N , an approximation fM is computed by selecting the “best”M ⬍ N vectors
within B. The orthogonal projection of f on the space V⌳ generated by M vectors
{gm }m∈⌳ in B is
f⌳ ⫽
f , gm gm .
(1.5)
m∈⌳
Since f ⫽
m∈⌫ f , gm gm , the resulting error is
f ⫺ f⌳ 2 ⫽
| f , gm |2 .
(1.6)
m∈⌳
/
We write |⌳| the size of the set ⌳. The best M ⫽ |⌳| term approximation, which
minimizes f ⫺ f⌳ 2 , is thus obtained by selecting the M coefficients of largest
amplitude. These coefficients are above a threshold T that depends on M:
fM ⫽ f ⌳ T ⫽
f , gm gm
with
⌳T ⫽ {m ∈ ⌫ : | f , gm | ⭓ T }.
(1.7)
m∈⌳T
This approximation is nonlinear because the approximation set ⌳T changes with
f . The resulting approximation error is:
n (M, f ) ⫽ f ⫺ fM 2 ⫽
| f , gm |2 .
(1.8)
m∈⌳
/ T
Figure 1.1(b) shows that the approximation support ⌳T of an image in a wavelet
orthonormal basis depends on the geometry of edges and textures. Keeping large
9
10
CHAPTER 1 Sparse Representations
wavelet coefficients is equivalent to constructing an adaptive approximation grid
specified by the scale–space support ⌳T . It increases the approximation resolution
where the signal is irregular. The geometry of ⌳T gives the spatial distribution of
sharp image transitions and edges, and their propagation across scales. Chapter 6
proves that wavelet coefficients give important information about singularities
and local Lipschitz regularity. This example illustrates how approximation support
provides“geometric”information on f ,relative to a dictionary,that is a wavelet basis
in this example.
Figure 1.1(d) gives the nonlinear wavelet approximation fM recovered from the
M ⫽ N /16 large-amplitude wavelet coefficients, with an error f ⫺ fM 2 / f 2 ⫽
5 ⫻ 10⫺3 . This error is nearly three times smaller than the linear approximation
error obtained with the same number of wavelet coefficients, and the image quality
is much better.
An analog signal can be recovered from the discrete nonlinear approximation fM :
N ⫺1
f̄M (x) ⫽
fM [n] s (x ⫺ ns).
n⫽0
Since all projections are orthogonal, the overall approximation error on the original analog signal f̄ (x) is the sum of the analog sampling error and the discrete
nonlinear error:
f̄ ⫺ f̄M 2 ⫽ f̄ ⫺ f̄N 2 ⫹ f ⫺ fM 2 ⫽ l (N , f ) ⫹ n (M, f ).
In practice, N is imposed by the resolution of the signal-acquisition hardware, and
M is typically adjusted so that n (M, f ) ⭓ l (N , f ).
Sparsity with Regularity
Sparse representations are obtained in a basis that takes advantage of some form
of regularity of the input signals, creating many small-amplitude coefficients. Since
wavelets have localized support, functions with isolated singularities produce few
large-amplitude wavelet coefficients in the neighborhood of these singularities. Nonlinear wavelet approximation produces a small error over spaces of functions that
do not have “too many” sharp transitions and singularities. Chapter 9 shows that
functions having a bounded total variation norm are useful models for images with
nonfractal (finite length) edges.
Edges often define regular geometric curves. Wavelets detect the location of
edges but their square support cannot take advantage of their potential geometric
regularity. More sparse representations are defined in dictionaries of curvelets or
bandlets, which have elongated support in multiple directions, that can be adapted
to this geometrical regularity. In such dictionaries,the approximation support ⌳T is
smaller but provides explicit information about edges’ local geometrical properties
such as their orientation. In this context, geometry does not just apply to multidimensional signals. Audio signals, such as musical recordings, also have a complex
geometric regularity in time-frequency dictionaries.
1.2 Approximation and Processing in Bases
1.2.3 Compression
Storage limitations and fast transmission through narrow bandwidth channels
require compression of signals while minimizing degradation. Transform codes
compress signals by coding a sparse representation. Chapter 10 introduces the
information theory needed to understand these codes and to optimize their
performance.
In a compression framework, the analog signal has already been discretized into
a signal f [n] of size N . This discrete signal is decomposed in an orthonormal basis
B ⫽ {gm }m∈⌫ of CN :
f⫽
f , gm gm .
m∈⌫
Coefficients f , gm are approximated by quantized values Q( f , gm ). If Q is a
uniform quantizer of step ⌬, then |x ⫺ Q(x)| ⭐ ⌬/2; and if |x| ⬍ ⌬/2, then Q(x) ⫽ 0.
The signal f̃ restored from quantized coefficients is
f̃ ⫽
Q( f , gm ) gm .
m∈⌫
An entropy code records these coefficients with R bits. The goal is to minimize the
signal-distortion rate d(R, f ) ⫽ f̃ ⫺ f 2 .
The coefficients not quantized to zero correspond to the set ⌳T ⫽ {m ∈ ⌫ :
| f , gm | ⭓ T } with T ⫽ ⌬/2. For sparse signals,Chapter 10 shows that the bit budget
R is dominated by the number of bits to code ⌳T in ⌫, which is nearly proportional
to its size |⌳T |. This means that the “information” about a sparse representation is
mostly geometric. Moreover, the distortion is dominated by the nonlinear approximation error f ⫺ f⌳T 2 , for f⌳T ⫽ m∈⌳T f , gm gm . Compression is thus a sparse
approximation problem. For a given distortion d(R, f ), minimizing R requires
reducing |⌳T | and thus optimizing the sparsity.
The number of bits to code ⌳T can take advantage of any prior information on
the geometry. Figure 1.1(b) shows that large wavelet coefficients are not randomly
distributed. They have a tendency to be aggregated toward larger scales, and at fine
scales they are regrouped along edge curves or in texture regions. Using such prior
geometric models is a source of gain in coders such as JPEG-2000.
Chapter 10 describes the implementation of audio transform codes. Image transform codes in block cosine bases and wavelet bases are introduced, together with
the JPEG and JPEG-2000 compression standards.
1.2.4 Denoising
Signal-acquisition devices add noise that can be reduced by estimators using prior
information on signal properties. Signal processing has long remained mostly
Bayesian and linear. Nonlinear smoothing algorithms existed in statistics, but
these procedures were often ad hoc and complex. Two statisticians, Donoho and
Johnstone [221], changed the “game” by proving that simple thresholding in sparse
11
12
CHAPTER 1 Sparse Representations
representations can yield nearly optimal nonlinear estimators. This was the beginning of a considerable refinement of nonlinear estimation algorithms that is still
ongoing.
Let us consider digital measurements that add a random noise W [n] to the
original signal f [n]:
X[n] ⫽ f [n] ⫹ W [n]
for
0⭐n⬍N.
The signal f is estimated by transforming the noisy data X with an operator D:
F̃ ⫽ DX.
The risk of the estimator F̃ of f is the average error, calculated with respect to the
probability distribution of noise W :
r(D, f ) ⫽ E{ f ⫺ DX2 }.
Bayes versus Minimax
To optimize the estimation operator D,one must take advantage of prior information
available about signal f . In a Bayes framework, f is considered a realization of a
random vector F and the Bayes risk is the expected risk calculated with respect to
the prior probability distribution of the random signal model F :
r(D, ) ⫽ E {r(D, F )}.
Optimizing D among all possible operators yields the minimum Bayes risk:
rn () ⫽ inf r(D, ).
all D
In the 1940s,Wald brought in a new perspective on statistics with a decision theory partly imported from the theory of games. This point of view uses deterministic
models, where signals are elements of a set ⌰, without specifying their probability
distribution in this set. To control the risk for any f ∈ ⌰, we compute the maximum
risk:
r(D, ⌰) ⫽ sup r(D, f ).
f ∈⌰
The minimax risk is the lower bound computed over all operators D:
rn (⌰) ⫽ inf r(D, ⌰).
all D
In practice, the goal is to find an operator D that is simple to implement and yields
a risk close to the minimax lower bound.
Thresholding Estimators
It is tempting to restrict calculations to linear operators D because of their simplicity.
Optimal linear Wiener estimators are introduced in Chapter 11. Figure 1.2(a) is an
image contaminated by Gaussian white noise. Figure 1.2(b) shows an optimized
1.2 Approximation and Processing in Bases
(a)
(b)
(c)
(d)
FIGURE 1.2
(a) Noisy image X. (b) Noisy wavelet coefficients above threshold, |X, j,n | ⭓ T . (c) Linear
estimation X h. (d) Nonlinear estimator recovered from thresholded wavelet coefficients over
several translated bases.
linear filtering estimation F̃ ⫽ X h[n],which is therefore diagonal in a Fourier basis
B. This convolution operator averages the noise but also blurs the image and keeps
low-frequency noise by retaining the image’s low frequencies.
If f has a sparse representation in a dictionary, then projecting X on the
vectors of this sparse support can considerably improve linear estimators. The difficulty is identifying the sparse support of f from the noisy data X. Donoho and
13
14
CHAPTER 1 Sparse Representations
Johnstone [221] proved that, in an orthonormal basis, a simple thresholding of
noisy coefficients does the trick. Noisy signal coefficients in an orthonormal basis
B ⫽ {gm }m∈⌫ are
X, gm ⫽ f , gm ⫹ W, gm for
m ∈ ⌫.
Thresholding these noisy coefficients yields an orthogonal projection estimator
F̃ ⫽ X⌳˜ T ⫽
X, gm gm
with
˜ T ⫽ {m ∈ ⌫ : |X, gm | ⭓ T }.
⌳
(1.9)
˜T
m∈⌳
˜ T is an estimate of an approximation support of f . It is hopefully close to
The set ⌳
the optimal approximation support ⌳T ⫽ {m ∈ ⌫ : | f , gm | ⭓ T }.
˜ T of noisy-wavelet coefFigure 1.2(b) shows the estimated approximation set ⌳
ficients, |X, j,n | ⭓ T , that can be compared to the optimal approximation support
⌳T shown in Figure 1.1(b). The estimation in Figure 1.2(d) from wavelet coeffi˜ T has considerably reduced the noise in regular regions while keeping
cients in ⌳
the sharpness of edges by preserving large-wavelet coefficients. This estimation is
improved with a translation-invariant procedure that averages this estimator over
several translated wavelet bases. Thresholding wavelet coefficients implements an
adaptive smoothing, which averages the data X with a kernel that depends on the
estimated regularity of the original signal f .
Donoho and Johnstone proved that for Gaussian white noise of variance 2 ,
choosing T ⫽ 2 loge N yields a risk E{ f ⫺ F̃ 2 } of the order of f ⫺ f⌳T 2 , up to
˜ T does
a loge N factor. This spectacular result shows that the estimated support ⌳
nearly as well as the optimal unknown support ⌳T . The resulting risk is small if the
representation is sparse and precise.
˜ T in Figure 1.2(b) “looks” different from the ⌳T in Figure 1.1(b)
The set ⌳
because it has more isolated points. This indicates that some prior information
on the geometry of ⌳T could be used to improve the estimation. For audio noisereduction,thresholding estimators are applied in sparse representations provided by
time-frequency bases. Similar isolated time-frequency coefficients produce a highly
annoying “musical noise.” Musical noise is removed with a block thresholding that
˜ T and avoids leaving isolated
regularizes the geometry of the estimated support ⌳
points. Block thresholding also improves wavelet estimators.
If W is a Gaussian noise and signals in ⌰ have a sparse representation in B, then
Chapter 11 proves that thresholding estimators can produce a nearly minimax risk.
In particular, wavelet thresholding estimators have a nearly minimax risk for large
classes of piecewise smooth signals, including bounded variation images.
1.3 TIME-FREQUENCY DICTIONARIES
Motivated by quantum mechanics, in 1946 the physicist Gabor [267] proposed
decomposing signals over dictionaries of elementary waveforms which he called
1.3 Time-Frequency Dictionaries
time-frequency atoms that have a minimal spread in a time-frequency plane.
By showing that such decompositions are closely related to our perception of
sounds, and that they exhibit important structures in speech and music recordings,
Gabor demonstrated the importance of localized time-frequency signal processing. Beyond sounds, large classes of signals have sparse decompositions as sums of
time-frequency atoms selected from appropriate dictionaries. The key issue is to
understand how to construct dictionaries with time-frequency atoms adapted to
signal properties.
1.3.1 Heisenberg Uncertainty
A time-frequency dictionary D ⫽ {␥ }␥∈⌫ is composed of waveforms of unit norm
␥ ⫽ 1, which have a narrow localization in time and frequency. The time localization u of ␥ and its spread around u, are defined by
2
u ⫽ t|␥ (t)|2 dt and t,␥
⫽ |t ⫺ u|2 |␥ (t)|2 dt.
Similarly, the frequency localization and spread of ˆ ␥ are defined by
2
⫽ (2)⫺1
|ˆ ␥ ()|2 d and ,␥
| ⫺ |2 |ˆ ␥ ()|2 d.
⫽ (2)⫺1
The Fourier Parseval formula
⫹⬁
⫹⬁
1
∗
f , ␥ ⫽
f̂ () ˆ ∗␥ () d
f (t) ␥ (t) dt ⫽
2 ⫺⬁
⫺⬁
(1.10)
shows that f , ␥ depends mostly on the values f (t) and f̂ (), where ␥ (t) and
ˆ ␥ () are nonnegligible , and hence for (t, ) in a rectangle centered at (u, ), of
size t,␥ ⫻ ,␥ . This rectangle is illustrated by Figure 1.3 in this time-frequency
plane (t, ). It can be interpreted as a “quantum of information”over an elementary
^
|␥ ()|
t
|␥ (t)|
0
FIGURE 1.3
Heisenberg box representing an atom ␥ .
u
t
15
16
CHAPTER 1 Sparse Representations
resolution cell. The uncertainty principle theorem proves (see Chapter 2) that this
rectangle has a minimum surface that limits the joint time-frequency resolution:
1
t,␥ ,␥ ⭓ .
2
(1.11)
Constructing a dictionary of time-frequency atoms can thus be thought of as
covering the time-frequency plane with resolution cells having a time width t,␥ and
a frequency width ,␥ which may vary but with a surface larger than one-half.
Windowed Fourier and wavelet transforms are two important examples.
1.3.2 Windowed Fourier Transform
A windowed Fourier dictionary is constructed by translating in time and frequency
a time window g(t), of unit norm g ⫽ 1, centered at t ⫽ 0:
D ⫽ gu, (t) ⫽ g(t ⫺ u) eit
.
2
(u,)∈R
The atom gu, is translated by u in time and by in frequency.The time-and-frequency
spread of gu, is independent of u and .This means that each atom gu, corresponds
to a Heisenberg rectangle that has a size t ⫻ independent of its position (u, ),
as shown by Figure 1.4.
The windowed Fourier transform projects f on each dictionary atom gu, :
⫹⬁
Sf (u, ) ⫽ f , gu, ⫽
f (t) g(t ⫺ u) e⫺it dt.
(1.12)
⫺⬁
It can be interpreted as a Fourier transform of f at the frequency , localized by
the window g(t ⫺ u) in the neighborhood of u. This windowed Fourier transform
is highly redundant and represents one-dimensional signals by a time-frequency
image in (u, ). It is thus necessary to understand how to select many fewer timefrequency coefficients that represent the signal efficiently.
␥
t
^ ()|
|g
v,␥
|g^u, ()|
t
|gu, (t)|
0
u
|gv, ␥ (t)|
v
t
FIGURE 1.4
Time-frequency boxes (“Heisenberg rectangles”) representing the energy spread of two
windowed Fourier atoms.
1.3 Time-Frequency Dictionaries
When listening to music, we perceive sounds that have a frequency that varies
in time. Chapter 4 shows that a spectral line of f creates high-amplitude windowed Fourier coefficients Sf (u, ) at frequencies (u) that depend on time u.
These spectral components are detected and characterized by ridge points, which
are local maxima in this time-frequency plane. Ridge points define a time-frequency
approximation support ⌳ of f with a geometry that depends on the time-frequency
evolution of the signal spectral components. Modifying the sound duration or audio
transpositions are implemented by modifying the geometry of the ridge support in
time frequency.
A windowed Fourier transform decomposes signals over waveforms that have
the same time and frequency resolution. It is thus effective as long as the signal does
not include structures having different time-frequency resolutions, some being very
localized in time and others very localized in frequency. Wavelets address this issue
by changing the time and frequency resolution.
1.3.3 Continuous Wavelet Transform
In reflection seismology, Morlet knew that the waveforms sent underground have a
duration that is too long at high frequencies to separate the returns of fine, closely
spaced geophysical layers. Such waveforms are called wavelets in geophysics.
Instead of emitting pulses of equal duration, he thought of sending shorter waveforms at high frequencies. These waveforms were obtained by scaling the mother
wavelet, hence the name of this transform. Although Grossmann was working in
theoretical physics, he recognized in Morlet’s approach some ideas that were close
to his own work on coherent quantum states.
Nearly forty years after Gabor, Morlet and Grossmann reactivated a fundamental collaboration between theoretical physics and signal processing, which led to
the formalization of the continuous wavelet transform [288]. These ideas were not
totally new to mathematicians working in harmonic analysis, or to computer vision
researchers studying multiscale image processing. It was thus only the beginning of
a rapid catalysis that brought together scientists with very different backgrounds.
A wavelet dictionary is constructed from a mother wavelet of zero average
⫹⬁
(t) dt ⫽ 0,
⫺⬁
which is dilated with a scale parameter s, and translated by u:
1
t ⫺u D ⫽ u,s (t) ⫽ √
.
u∈R,s⬎0
s
s
(1.13)
The continuous wavelet transform of f at any scale s and position u is the projection
of f on the corresponding wavelet atom:
⫹⬁
t ⫺u
1
W f (u, s) ⫽ f , u,s ⫽
f (t) √ ∗
dt.
(1.14)
s
s
⫺⬁
It represents one-dimensional signals by highly redundant time-scale images in (u, s).
17
18
CHAPTER 1 Sparse Representations
Varying Time-Frequency Resolution
As opposed to windowed Fourier atoms, wavelets have a time-frequency resolution that changes. The wavelet u,s has a time support centered at u and
ˆ
proportional to s. Let us choose a wavelet whose Fourier transform ()
is
ˆ
nonzero in a positive frequency interval centered at . The Fourier transform u,s ()
is dilated by 1/s and thus is localized in a positive frequency interval centered at
⫽ /s; its size is scaled by 1/s. In the time-frequency plane, the Heisenberg box of
a wavelet atom u,s is therefore a rectangle centered at (u, /s), with time and frequency widths, respectively, proportional to s and 1/s. When s varies, the time and
frequency width of this time-frequency resolution cell changes, but its area remains
constant, as illustrated by Figure 1.5.
Large-amplitude wavelet coefficients can detect and measure short highfrequency variations because they have a narrow time localization at high frequencies. At low frequencies their time resolution is lower, but they have a better
frequency resolution. This modification of time and frequency resolution is adapted
to represent sounds with sharp attacks, or radar signals having a frequency that may
vary quickly at high frequencies.
Multiscale Zooming
A wavelet dictionary is also adapted to analyze the scaling evolution of transients
with zooming procedures across scales. Suppose now that is real. Since it has a zero
average,a wavelet coefficient Wf (u, s) measures the variation of f in a neighborhood
of u that has a size proportional to s. Sharp signal transitions create large-amplitude
wavelet coefficients.
^ ()|
|
u, s
s
s
s t
s0
0
^
|u0, s0()|
u, s
s0 t
s0
u0, s0
u
u0
t
FIGURE 1.5
Heisenberg time-frequency boxes of two wavelets, u,s and u0 ,s0 . When the scale s
decreases, the time support is reduced but the frequency spread increases and covers an
interval that is shifted toward high frequencies.
1.3 Time-Frequency Dictionaries
Signal singularities have specific scaling invariance characterized by Lipschitz
exponents. Chapter 6 relates the pointwise regularity of f to the asymptotic decay
of the wavelet transform amplitude |Wf (u, s)| when s goes to zero. Singularities are detected by following the local maxima of the wavelet transform across
scales.
In images, wavelet local maxima indicate the position of edges, which are sharp
variations of image intensity. It defines scale–space approximation support of f
from which precise image approximations are reconstructed. At different scales,
the geometry of this local maxima support provides contours of image structures
of varying sizes. This multiscale edge detection is particularly effective for pattern
recognition in computer vision [146].
The zooming capability of the wavelet transform not only locates isolated singular events, but can also characterize more complex multifractal signals having
nonisolated singularities. Mandelbrot [41] was the first to recognize the existence
of multifractals in most corners of nature. Scaling one part of a multifractal produces a signal that is statistically similar to the whole. This self-similarity appears in
the continuous wavelet transform, which modifies the analyzing scale. From global
measurements of the wavelet transform decay, Chapter 6 measures the singularity distribution of multifractals. This is particularly important in analyzing their
properties and testing multifractal models in physics or in financial time series.
1.3.4 Time-Frequency Orthonormal Bases
Orthonormal bases of time-frequency atoms remove all redundancy and define stable representations.A wavelet orthonormal basis is an example of the time-frequency
basis obtained by scaling a wavelet with dyadic scales s ⫽ 2 j and translating it by
2 j n, which is written j,n . In the time-frequency plane, the Heisenberg resolution
box of j,n is a dilation by 2 j and translation by 2 j n of the Heisenberg box of .
A wavelet orthonormal is thus a subdictionary of the continuous wavelet transform
dictionary, which yields a perfect tiling of the time-frequency plane illustrated in
Figure 1.6.
One can construct many other orthonormal bases of time-frequency atoms, corresponding to different tilings of the time-frequency plane. Wavelet packet and local
cosine bases are two important examples constructed in Chapter 8, with timefrequency atoms that split the frequency and the time axis, respectively, in intervals
of varying sizes.
Wavelet Packet Bases
Wavelet bases divide the frequency axis into intervals of 1 octave bandwidth.
Coifman, Meyer, and Wickerhauser [182] have generalized this construction with
bases that split the frequency axis in intervals of bandwidth that may be adjusted.
Each frequency interval is covered by the Heisenberg time-frequency boxes of
wavelet packet functions translated in time, in order to cover the whole plane,
as shown by Figure 1.7.
19
20
CHAPTER 1 Sparse Representations
t
j, n(t)
j11, p(t)
t
FIGURE 1.6
The time-frequency boxes of a wavelet basis define a tiling of the time-frequency plane.
0
t
FIGURE 1.7
A wavelet packet basis divides the frequency axis in separate intervals of varying sizes. A tiling
is obtained by translating in time the wavelet packets covering each frequency interval.
As for wavelets, wavelet-packet coefficients are obtained with a filter bank of
conjugate mirror filters that split the frequency axis in several frequency intervals.
Different frequency segmentations correspond to different wavelet packet bases.
For images, a filter bank divides the image frequency support in squares of dyadic
sizes that can be adjusted.
Local Cosine Bases
Local cosine orthonormal bases are constructed by dividing the time axis instead
of the frequency axis. The time axis is segmented in successive intervals [ap , ap⫹1 ].
The local cosine bases of Malvar [368] are obtained by designing smooth windows
gp (t) that cover each interval [ap , ap⫹1 ], and by multiplying them by cosine functions cos(t ⫹ ) of different frequencies. This is yet another idea that has been
independently studied in physics,signal processing,and mathematics. Malvar’s original construction was for discrete signals. At the same time, the physicist Wilson
[486] was designing a local cosine basis, with smooth windows of infinite support,
1.4 Sparsity in Redundant Dictionaries
0
0
t
gp(t)
ap21 ap
ap11
t
1p
FIGURE 1.8
A local cosine basis divides the time axis with smooth windows gp (t) and translates these
windows into frequency.
to analyze the properties of quantum coherent states. Malvar bases were also rediscovered and generalized by the harmonic analysts Coifman and Meyer [181]. These
different views of the same bases brought to light mathematical and algorithmic
properties that opened new applications.
A multiplication by cos(t ⫹ ) translates the Fourier transform ĝp () of gp (t) by
⫾ . Over positive frequencies, the time-frequency box of the modulated window
gp (t) cos(t ⫹ ) is therefore equal to the time-frequency box of gp translated by
along frequencies. Figure 1.8 shows the time-frequency tiling corresponding to
such a local cosine basis. For images, a two-dimensional cosine basis is constructed
by dividing the image support in squares of varying sizes.
1.4 SPARSITY IN REDUNDANT DICTIONARIES
In natural languages, large dictionaries are needed to refine ideas with short sentences, and they evolve with usage. Eskimos have eight different words to describe
snow quality, whereas a single word is typically sufficient in a Parisian dictionary.
Similarly, large signal dictionaries of vectors are needed to construct sparse representations of complex signals. However, computing and optimizing a signal
approximation by choosing the best M dictionary vectors is much more difficult.
1.4.1 Frame Analysis and Synthesis
Suppose that a sparse family of vectors {p }p∈⌳ has been selected to approximate
a signal f . An approximation can be recovered as an orthogonal projection in
21
22
CHAPTER 1 Sparse Representations
the space V⌳ generated by these vectors. We then face one of the following two
problems.
1. In a dual-synthesis problem,the orthogonal projection f⌳ of f in V⌳ must be
computed from dictionary coefficients, { f , p }p∈⌳ , provided by an analysis
operator. This is the case when a signal transform { f , p }p∈⌫ is calculated in
some large dictionary and a subset of inner products are selected. Such inner
products may correspond to coefficients above a threshold or local maxima
values.
2. In a dual-analysis problem, the decomposition coefficients of f⌳ must be
computed on a family of selected vectors {p }p∈⌳ . This problem appears
when sparse representation algorithms select vectors as opposed to inner
products. This is the case for pursuit algorithms, which compute approximation supports in highly redundant dictionaries.
The frame theory gives energy equivalence conditions to solve both problems
with stable operators. A family {p }p∈⌳ is a frame of the space V it generates if there
exists B ⭓ A ⬎ 0 such that
᭙h ∈ V, Ah2 ⭐
|h, p |2 ⭐ Bh2 .
m∈⌳
The representation is stable since any perturbation of frame coefficients implies
a modification of similar magnitude on h. Chapter 5 proves that the existence
˜ p }p∈⌳ that solves both the dual-synthesis and dual-analysis
of a dual frame {
problems:
f , p ˜ p ⫽
f⌳ ⫽
p∈⌳
f , ˜ p p .
(1.15)
p∈⌳
Algorithms are provided to calculate these decompositions. The dual frame is also
stable:
| f , ˜ p |2 ⭐ B⫺1 f 2 .
᭙f ∈ V, B⫺1 f 2 ⭐
m∈⌫
The frame bounds A and B are redundancy factors. If the vectors {p }p∈⌫ are
normalized and linearly independent, then A ⭐ 1 ⭐ B. Such a dictionary is called a
Riesz basis of V and the dual frame is biorthogonal:
᭙( p, p⬘) ∈ ⌳2 , p , ˜ p⬘ ⫽ ␦[ p ⫺ p⬘].
When the basis is orthonormal, then both bases are equal. Analysis and synthesis
problems are then identical.
The frame theory is also used to construct redundant dictionaries that define complete,stable,and redundant signal representations,where V is then the whole signal
space. The frame bounds measure the redundancy of such dictionaries. Chapter 5
studies the construction of windowed Fourier and wavelet frame dictionaries by
1.4 Sparsity in Redundant Dictionaries
sampling their time, frequency, and scaling parameters, while controlling frame
bounds. In two dimensions, directional wavelet frames include wavelets sensitive
to directional image structures such as textures or edges.
To improve the sparsity of images having edges along regular geometric curves,
Candès and Donoho [134] introduced curvelet frames, with elongated waveforms
having different directions, positions, and scales. Images with piecewise regular
edges have representations that are asymptotically more sparse by thresholding
curvelet coefficients than wavelet coefficients.
1.4.2 Ideal Dictionary Approximations
In a redundant dictionary D ⫽ {p }p∈⌫ , we would like to find the best approximation
support ⌳ with M ⫽ |⌳| vectors, which minimize the error f ⫺ f⌳ 2 . Chapter 12
proves that it is equivalent to find ⌳T , which minimizes the corresponding approximation Lagrangian
L0 (T , f , ⌳) ⫽ f ⫺ f⌳ 2 ⫹ T 2 |⌳|,
(1.16)
for some multiplier T .
Compression and denoising are two applications of redundant dictionary
approximations. When compressing signals by quantizing dictionary coefficients,
the distortion rate varies,like the Lagrangian (1.16),with a multiplier T that depends
on the quantization step. Optimizing the coder is thus equivalent to minimizing this
approximation Lagrangian. For sparse representations, most of the bits are devoted
to coding the geometry of the sparse approximation set ⌳T in ⌫.
Estimators reducing noise from observations X ⫽ f ⫹ W are also optimized by
finding a best orthogonal projector over a set of dictionary vectors. The model
˜ T , which
selection theory of Barron, Birgé, and Massart [97] proves that finding ⌳
minimizes this same Lagrangian L0 (T , X, ⌳), defines an estimator that has a risk on
the same order as the minimum approximation error f ⫺ f⌳T 2 up to a logarithmic
factor. This is similar to the optimality result obtained for thresholding estimators
in an orthonormal basis.
The bad news is that minimizing the approximation Lagrangian L0 is an NP-hard
problem and is therefore computationally intractable. It is necessary therefore to
find algorithms that are sufficiently fast to compute suboptimal, but “good enough,”
solutions.
Dictionaries of Orthonormal Bases
To reduce the complexity of optimal approximations, the search can be reduced to
subfamilies of orthogonal dictionary vectors. In a dictionary of orthonormal bases,
any family of orthogonal dictionary vectors can be complemented to form an orthogonal basis B included in D. As a result, the best approximation of f from orthogonal
vectors in B is obtained by thresholding the coefficients of f in a “best basis” in D.
For tree dictionaries of orthonormal bases obtained by a recursive split of
orthogonal vector spaces, the fast, dynamic programming algorithm of Coifman and
23
24
CHAPTER 1 Sparse Representations
Wickerhauser [182] finds such a best basis with O(P) operations, where P is the
dictionary size.
Wavelet packet and local cosine bases are examples of tree dictionaries of timefrequency orthonormal bases of size P ⫽ N log2 N . A best basis is a time-frequency
tiling that is the best match to the signal time-frequency structures.
To approximate geometrically regular edges, wavelets are not as efficient as
curvelets, but wavelets provide more sparse representations of singularities that are
not distributed along geometrically regular curves. Bandlet dictionaries, introduced
by Le Pennec, Mallat, and Peyré [342, 365], are dictionaries of orthonormal bases
that can adapt to the variability of images’ geometric regularity. Minimax optimal
asymptotic rates are derived for compression and denoising.
1.4.3 Pursuit in Dictionaries
Approximating signals only from orthogonal vectors brings rigidity that limits the
ability to optimize the representation. Pursuit algorithms remove this constraint
with flexible procedures that search for sparse, although not necessarily optimal,
dictionary approximations. Such approximations are computed by optimizing the
choice of dictionary vectors {p }p∈⌳ .
Matching Pursuit
Matching pursuit algorithms introduced by Mallat and Zhang [366] are greedy algorithms that optimize approximations by selecting dictionary vectors one by one.
The vector in p0 ∈ D that best approximates a signal f is
p0 ⫽ argmax | f , p |
p∈⌫
and the residual approximation error is
Rf ⫽ f ⫺ f , p0 p0 .
A matching pursuit further approximates the residue Rf by selecting another
best vector p1 from the dictionary and continues this process over next-order
residues Rm f , which produces a signal decomposition:
M⫺1
f⫽
R m f , pm pm ⫹ R M f .
m⫽0
The approximation from the M-selected vectors {pm }0⭐m⬍M can be refined with
an orthogonal back projection on the space generated by these vectors. An orthogonal matching pursuit further improves this decomposition by orthogonalizing
progressively the projection directions pm during the decompositon.The resulting
decompositions are applied to compression, denoising, and pattern recognition of
various types of signals, images, and videos.
1.4 Sparsity in Redundant Dictionaries
Basis Pursuit
Approximating f with a minimum number of nonzero coefficients a[ p] in a dictionary D is equivalent to minimizing the l 0 norm a0 , which gives the number
of nonzero coefficients. This l 0 norm is highly nonconvex, which explains why the
resulting minimization is NP-hard. Donoho and Chen [158] thus proposed replacing the l 0 norm by the l1 norm a1 ⫽ p∈⌫ |a[ p]|, which is convex. The resulting
basis pursuit algorithm computes a synthesis operator
f⫽
a[ p] p , which minimizes a1 ⫽
p∈⌫
|a[ p]|.
(1.17)
p∈⌫
This optimal solution is calculated with a linear programming algorithm.
A basis pursuit is computationally more intense than a matching pursuit, but
it is a more global optimization that yields representations that can be more
sparse.
In approximation, compression, or denoising applications, f is recovered with
an error bounded by a precision parameter .The optimization (1.18) is thus relaxed
by finding a synthesis such that
f ⫺
a[ p] p ⭐ , which minimizes a1 ⫽
p∈⌫
|a[ p]|.
(1.18)
p∈⌫
This is a convex minimization problem, with a solution calculated by minimizing
the corresponding l1 Lagrangian
L1 (T , f , a) ⫽ f ⫺
a[ p] p 2 ⫹ T a1 ,
p∈⌫
where T is a Lagrange multiplier that depends on . This is called an l1 Lagrangian
pursuit in this book. A solution ã[ p] is computed with iterative algorithms that are
guaranteed to converge. The number of nonzero coordinates of ã typically decreases as T increases.
Incoherence for Support Recovery
Matching pursuit and l1 Lagrangian pursuits are optimal if they recover the approximation support ⌳T , which minimizes the approximation Lagrangian
L0 (T , f , ⌳) ⫽ f ⫺ f⌳ 2 ⫹ T 2 |⌳|,
where f⌳ is the orthogonal projection of f in the space V⌳ generated by {p }p∈⌳ .
This is not always true and depends on ⌳T . An Exact Recovery Criteria proved by
Tropp [464] guarantees that pursuit algorithms do recover the optimal support
⌳T if
|˜ p , q | ⬍ 1,
ERC(⌳T ) ⫽ max
q ∈⌳
/ T
p∈⌳T
(1.19)
25
26
CHAPTER 1 Sparse Representations
˜ p }p∈⌳ is the biorthogonal basis of {p }p∈⌳ in V⌳ . This criterion implies
where {
T
T
T
that dictionary vectors q outside ⌳T should have a small inner product with vectors
in ⌳T .
This recovery is stable relative to noise perturbations if {p }p∈⌳ has Riesz bounds
that are not too far from 1. These vectors should be nearly orthogonal and hence
have small inner products. These small inner-product conditions are interpreted
as a form of incoherence. A stable recovery of ⌳T is possible if vectors in ⌳T are
incoherent with respect to other dictionary vectors and are incoherent between
themselves. It depends on the geometric configuration of ⌳T in ⌫.
1.5 INVERSE PROBLEMS
Most digital measurement devices, such as cameras, microphones, or medical imaging systems, can be modeled as a linear transformation of an incoming analog
signal, plus noise due to intrinsic measurement fluctuations or to electronic noises.
This linear transformation can be decomposed into a stable analog-to-digital linear
conversion followed by a discrete operator U that carries the specific transfer function of the measurement device. The resulting measured data can be
written
Y [q] ⫽ Uf [q] ⫹ W [q],
where f ∈ CN is the high-resolution signal we want to recover, and W [q] is the
measurement noise. For a camera with an optic that is out of focus, the operator
U is a low-pass convolution producing a blur. For a magnetic resonance imaging
system, U is a Radon transform integrating the signal along rays and the number
Q of measurements is smaller than N . In such problems, U is not invertible and
recovering an estimate of f is an ill-posed inverse problem.
Inverse problems are among the most difficult signal-processing problems with
considerable applications. When data acquisition is difficult, costly, or dangerous, or
when the signal is degraded, super-resolution is important to recover the highest
possible resolution information.This applies to satellite observations,seismic exploration,medical imaging,radar,camera phones,or degraded Internet videos displayed
on high-resolution screens. Separating mixed information sources from fewer measurements is yet another super-resolution problem in telecommunication or audio
recognition.
Incoherence, sparsity, and geometry play a crucial role in the solution of illdefined inverse problems.With a sensing matrix U with random coefficients,Candès
and Tao [139] and Donoho [217] proved that super-resolution becomes stable for
signals having a sufficiently sparse representation in a dictionary. This remarkable
result opens the door to new compression sensing devices and algorithms that
recover high-resolution signals from a few randomized linear measurements.
1.5 Inverse Problems
1.5.1 Diagonal Inverse Estimation
In an ill-posed inverse problem,
Y ⫽ Uf ⫹ W
the image space ImU ⫽ {U h : h ∈ CN } of U is of dimension Q smaller than the highresolution space N where f belongs. Inverse problems include two difficulties.
In the image space ImU, where U is invertible, its inverse may amplify the noise
W , which then needs to be reduced by an efficient denoising procedure. In the
null space NullU, all signals h are set to zero U h ⫽ 0 and thus disappear in the
measured data Y . Recovering the projection of f in NullU requires using some
strong prior information. A super-resolution estimator recovers an estimation of f
in a dimension space larger than Q and hopefully equal to N , but this is not always
possible.
Singular Value Decompositions
Let f ⫽ m∈⌫ a[m] gm be the representation of f in an orthonormal basis B ⫽
{gm }m∈⌫ . An approximation must be recovered from
Y⫽
a[m] Ugm ⫹ W .
m∈⌫
A basis B of singular vectors diagonalizes U ∗ U. Then U transforms a subset of Q
vectors {gm }m∈⌫Q of B into an orthogonal basis {Ugm }m∈⌫Q of ImU and sets all
other vectors to zero. A singular value decomposition estimates the coefficients
a[m] of f by projecting Y on this singular basis and by renormalizing the resulting
coefficients
᭙m ∈ ⌫,
ã[m] ⫽
Y , Ugm ,
Ugm 2 ⫹ h2m
where h2m are regularization parameters.
Such estimators recover nonzero coefficients in a space of dimension Q and
thus bring no super-resolution. If U is a convolution operator, then B is the
Fourier basis and a singular value estimation implements a regularized inverse
convolution.
Diagonal Thresholding Estimation
The basis that diagonalizes U ∗ U rarely provides a sparse signal representation. For
example,a Fourier basis that diagonalizes convolution operators does not efficiently
approximate signals including singularities.
Donoho [214] introduced more flexibility by looking for a basis B providing a
sparse signal representation, where a subset of Q vectors {gm }m∈⌫Q are transformed
by U in a Riesz basis {Ugm }m∈⌫Q of ImU, while the others are set to zero. With
˜
an appropriate renormalization, {˜ ⫺1
m Ugm }m∈⌫Q has a biorthogonal basis {m }m∈⌫Q
27
28
CHAPTER 1 Sparse Representations
˜ m ⫽ 1.The sparse coefficients of f in B can then be estimated
that is normalized
with a thresholding
᭙m ∈ ⌫Q ,
˜
ã[m] ⫽ Tm (˜ ⫺1
m Y, m )
with T (x) ⫽ x 1|x|⬎T ,
for thresholds Tm appropriately defined.
For classes of signals that are sparse in B, such thresholding estimators may
yield a nearly minimax risk, but they provide no super-resolution since this nonlinear projector remains in a space of dimension Q. This result applies to classes
of convolution operators U in wavelet or wavelet packet bases. Diagonal inverse
estimators are computationally efficient and potentially optimal in cases where
super-resolution is not possible.
1.5.2 Super-resolution and Compressive Sensing
Suppose that f has a sparse representation in some dictionary D ⫽ {gp }p∈⌫ of
P normalized vectors. The P vectors of the transformed dictionary DU ⫽ U D ⫽
{Ugp }p∈⌫ belong to the space ImU of dimension Q ⬍ P and thus define a redundant
dictionary. Vectors in the approximation support ⌳ of f are not restricted a priori
to a particular subspace of CN . Super-resolution is possible if the approximation
support ⌳ of f in D can be estimated by decomposing the noisy data Y over DU .
It depends on the properties of the approximation support ⌳ of f in ⌫.
Geometric Conditions for Super-resolution
Let w⌳ ⫽ f ⫺ f⌳ be the approximation error of a sparse representation f⌳ ⫽
p∈⌳ a[ p] gp of f . The observed signal can be written as
Y ⫽ Uf ⫹ W ⫽
a[ p] Ugp ⫹ Uw⌳ ⫹ W .
p∈⌳
If the support ⌳ can be identified by finding a sparse approximation of Y in DU
Y⌳ ⫽
ã[ p] Ugp ,
p∈⌳
then we can recover a super-resolution estimation of f
F̃ ⫽
ã[ p] gp .
p∈⌳
This shows that super-resolution is possible if the approximation support ⌳ can be
identified by decomposing Y in the redundant transformed dictionary DU . If the
exact recovery criteria is satisfy ERC(⌳) ⬍ 1 and if {Ugp }p∈⌳ is a Riesz basis, then ⌳
can be recovered using pursuit algorithms with controlled error bounds.
For most operator U, not all sparse approximation sets can be recovered. It is
necessary to impose some further geometric conditions on ⌳ in ⌫, which makes
super-resolution difficult and often unstable. Numerical applications to sparse spike
deconvolution, tomography, super-resolution zooming, and inpainting illustrate
these results.
1.5 Inverse Problems
Compressive Sensing with Randomness
Candès and Tao [139], and Donoho [217] proved that stable super-resolution
is possible for any sufficiently sparse signal f if U is an operator with random
coefficients. Compressive sensing then becomes possible by recovering a close
approximation of f ∈ CN from Q N linear measurements [133].
A recovery is stable for a sparse approximation set |⌳| ⭐ M only if the corresponding dictionary family {Ugm }m∈⌳ is a Riesz basis of the space it generates.
The M-restricted isometry conditions of Candès,Tao, and Donoho [217] imposes
uniform Riesz bounds for all sets ⌳ ⊂ ⌫ with |⌳| ⭐ M:
᭙c ∈ C|⌳| , (1 ⫺ ␦M ) c2 ⭐ c[ p] Ugp 2 ⭐ (1 ⫹ ␦M ) c2 .
(1.20)
m∈⌳
This is a strong incoherence condition on the P vectors of {Ugm }m∈⌫ , which supposes that any subset of less than M vectors is nearly uniformly distributed on the
unit sphere of ImU.
For an orthogonal basis D ⫽ {gm }m∈⌫ , this is possible for M ⭐ C Q(log N )⫺1 if
U is a matrix with independent Gaussian random coefficients. A pursuit algorithm
then provides a stable approximation of any f ∈ C N having a sparse approximation
from vectors in D.
These results open a new compressive-sensing approach to signal acquisition and
representation. Instead of first discretizing linearly the signal at a high-resolution
N and then computing a nonlinear representation over M coefficients in some
dictionary,compressive-sensing measures directly M randomized linear coefficients.
A reconstructed signal is then recovered by a nonlinear algorithm, producing an
error that can be of the same order of magnitude as the error obtained by the more
classic two-step approximation process, with a more economic acquisiton process.
These results remain valid for several types of random matrices U . Examples of
applications to single-pixel cameras, video super-resolution, new analog-to-digital
converters, and MRI imaging are described.
Blind Source Separation
Sparsity in redundant dictionaries also provides efficient strategies to separate a
family of signals { fs }0⭐s⬍S that are linearly mixed in K ⭐ S observed signals with
noise:
S⫺1
Yk [n] ⫽
uk,s fs [n] ⫹ Wk [n]
for 0 ⭐ n ⬍ N
and 0 ⭐ k ⬍ K .
s⫽0
From a stereo recording, separating the sounds of S musical instruments is an
example of source separation with k ⫽ 2. Most often the mixing matrix U ⫽
{uk,s }0⭐k⬍K ,0⭐s⬍S is unknown. Source separation is a super-resolution problem
since S N data values must be recovered from Q ⫽ K N ⭐ S N measurements. Not
knowing the operator U makes it even more complicated.
If each source fs has a sparse approximation support ⌳s in a dictionary D, with
S⫺1
s⫽0 |⌳s | N , then it is likely that the sets {⌳s }0⭐s⬍s are nearly disjoint. In this
29
30
CHAPTER 1 Sparse Representations
case, the operator U , the supports ⌳s , and the sources fs are approximated by
computing sparse approximations of the observed data Yk in D. The distribution
of these coefficients identifies the coefficients of the mixing matrix U and the
nearly disjoint source supports.Time-frequency separation of sounds illustrate these
results.
1.6 TRAVEL GUIDE
1.6.1 Reproducible Computational Science
This book covers the whole spectrum from theorems on functions of continuous
variables to fast discrete algorithms and their applications. Section 1.1.2 argues
that models based on continuous time functions give useful asymptotic results for
understanding the behavior of discrete algorithms. Still, a mathematical analysis
alone is often unable to fully predict the behavior and suitability of algorithms
for specific signals. Experiments are necessary and such experiments should be
reproducible, just like experiments in other fields of science [124].
The reproducibility of experiments requires having complete software and full
source code for inspection, modification, and application under varied parameter settings. Following this perspective, computational algorithms presented in
this book are available as MATLAB subroutines or in other software packages.
Figures can be reproduced and the source code is available. Software demonstrations and selected exercise solutions are available at http://wavelet-tour.com. For
the instructor, solutions are available at www.elsevierdirect.com/9780123743701.
1.6.2 Book Road Map
Some redundancy is introduced between sections to avoid imposing a linear progression through the book. The preface describes several possible programs for a
sparse signal-processing course.
All theorems are explained in the text and reading the proofs is not necessary to
understand the results. Most of the book’s theorems are proved in detail,and important techniques are included. Exercises at the end of each chapter give examples of
mathematical, algorithmic, and numeric applications, ordered by level of difficulty
from 1 to 4, and selected solutions can be found at http://wavelet-tour.com.
The book begins with Chapters 2 and 3, which review the Fourier transform
and linear discrete signal processing. They provide the necessary background
for readers with no signal-processing background. Important properties of linear
operators, projectors, and vector spaces can be found in the Appendix. Local timefrequency transforms and dictionaries are presented in Chapter 4; the wavelet and
windowed Fourier transforms are introduced and compared. The measurement of
instantaneous frequencies illustrates the limitations of time-frequency resolution.
Dictionary stability and redundancy are introduced in Chapter 5 through the frame
theory,with examples of windowed Fourier,wavelet,and curvelet frames. Chapter 6
1.6 Travel Guide
explains the relationship between wavelet coefficient amplitude and local signal
regularity. It is applied to the detection of singularities and edges and to the analysis
of multifractals.
Wavelet bases and fast filter bank algorithms are important tools presented in
Chapter 7. An overdose of orthonormal bases can strike the reader while studying the construction and properties of wavelet packets and local cosine bases
in Chapter 8. It is thus important to read Chapter 9, which describes sparse
approximations in bases. Signal-compression and denoising applications described
in Chapters 10 and 11 give life to most theoretical and algorithmic results in the
book. These chapters offer a practical perspective on the relevance of linear and
nonlinear signal-processing algorithms. Chapter 12 introduces sparse decompositions in redundant dictionaries and their applications. The resolution of inverse
problems is studied in Chapter 13, with super-resolution, compressive sensing, and
source separation.
31
CHAPTER
The Fourier Kingdom
2
The story begins in 1807 when Fourier presents a memoir to the Institut de France,
where he claims that any periodic function can be represented as a series of
harmonically related sinusoids. This idea had a profound impact in mathematical
analysis, physics, and engineering, but it took a century and a half to understand the
convergence of Fourier series and complete the theory of Fourier integrals.
Fourier was motivated by the study of heat diffusion, which is governed by a
linear differential equation. However, the Fourier transform diagonalizes all linear
time-invariant operators—the building blocks of signal processing. It therefore is
not only the starting point of our exploration but also the basis of all further
developments.
2.1 LINEAR TIME-INVARIANT FILTERING
Classic signal-processing operations, such as signal transmission, stationary noise
removal,or predictive coding,are implemented with linear time-invariant operators.
The time invariance of an operator L means that if the input f (t) is delayed by ,
f (t) ⫽ f (t ⫺ ), then the output is also delayed by :
g(t) ⫽ L f (t) ⇒ g(t ⫺ ) ⫽ Lf (t).
(2.1)
For numerical stability,operator L must have a weak form of continuity,which means
that L f is modified by a small amount if f is slightly modified. This weak continuity
is formalized by the theory of distributions [61, 64], which guarantees that we are
on a safe ground without having to worry further about it.
2.1.1 Impulse Response
Linear time-invariant systems are characterized by their response to a Dirac impulse,
defined in Section A.7 in the Appendix. If f is continuous, its value at t is obtained
by an “integration” against a Dirac located at t. Let ␦u (t) ⫽ ␦(t ⫺ u):
⫹⬁
f (t) ⫽
f (u) ␦u (t) du.
⫺⬁
33
34
CHAPTER 2 The Fourier Kingdom
The continuity and linearity of L imply that
⫹⬁
L f (t) ⫽
f (u) L␦u (t) du.
⫺⬁
Let h be the impulse response of L:
h(t) ⫽ L␦(t).
The time-invariance proves that L␦u (t) ⫽ h(t ⫺ u), therefore
⫹⬁
⫹⬁
L f (t) ⫽
f (u)h(t ⫺ u) du ⫽
h(u)f (t ⫺ u) du ⫽ h f (t).
⫺⬁
⫺⬁
(2.2)
A time-invariant linear filter thus is equivalent to a convolution with the impulse
response h. The continuity of f is not necessary. This formula remains valid for
any signal f for which the convolution integral converges.
Let us recall a few useful properties of convolution products:
■
■
■
Commutativity
f h(t) ⫽ h f (t).
(2.3)
df
dh
d
( f h)(t) ⫽
h(t) ⫽ f (t).
dt
dt
dt
(2.4)
f ␦ (t) ⫽ f (t ⫺ ).
(2.5)
Differentiation
Dirac convolution
Stability and Causality
A filter is said to be causal if L f (t) does not depend on the values f (u) for u ⬎ t. Since
⫹⬁
L f (t) ⫽
h(u)f (t ⫺ u) du,
⫺⬁
this means that h(u) ⫽ 0 for u ⬍ 0. Such impulse responses are said to be causal.
The stability property guarantees that L f (t) is bounded if f (t) is bounded. Since
⫹⬁
⫹⬁
|L f (t)| ⭐
|h(u)| | f (t ⫺ u)| du ⭐ sup | f (u)|
|h(u)| du,
⫺⬁
u∈R
⫹⬁
⫺⬁
it is sufficient that ⫺⬁ |h(u)|du ⬍ ⫹⬁. One can verify that this condition is also
necessary if h is a function. We thus say that h is stable if it is integrable.
EXAMPLE 2.1
An amplification and delay system is defined by
L f (t) ⫽ f (t ⫺ ).
The impulse response of this filter is h(t) ⫽ ␦(t ⫺ ).
2.2 Fourier Integrals
EXAMPLE 2.2
A uniform averaging of f over intervals of size T is calculated by
1
L f (t) ⫽
T
t⫹T /2
t⫺T /2
f (u) du.
This integral can be rewritten as a convolution of f with the impulse response h ⫽
1/T 1[⫺T /2,T /2] .
2.1.2 Transfer Functions
Complex exponentials eit are eigenvectors of convolution operators. Indeed,
⫹⬁
it
Le ⫽
h(u) ei(t⫺u) du,
⫺⬁
which yields
Leit ⫽ eit
⫹⬁
⫺⬁
The eigenvalue
ĥ() ⫽
h(u) e⫺iu du ⫽ ĥ() eit .
⫹⬁
⫺⬁
h(u) e⫺iu du
is the Fourier transform of h at the frequency . Since complex sinusoidal waves
eit are the eigenvectors of time-invariant linear systems, it is tempting to try to
decompose any function f as a sum of these eigenvectors. We are then able to
express L f directly from the eigenvalues ĥ(). The Fourier analysis proves that,
under weak conditions on f , it is indeed possible to write it as a Fourier integral.
2.2 FOURIER INTEGRALS
To avoid convergence issues,the Fourier integral is first defined over the space L 1 (R)
of integrable functions [54]. It is then extended to the space L 2 (R) of finite energy
functions [23].
2.2.1 Fourier Transform in L 1 (R)
The Fourier integral
f̂ () ⫽
⫹⬁
⫺⬁
f (t) e⫺it dt
(2.6)
measures “how much” oscillations are at the frequency there is in f . If f ∈ L 1 (R),
this integral does converge and
⫹⬁
| f̂ ()| ⭐
| f (t)| dt ⬍ ⫹⬁.
(2.7)
⫺⬁
35
36
CHAPTER 2 The Fourier Kingdom
Thus the Fourier transform is bounded, and one can verify that it is a continuous
function of (Exercise 2.1). If f̂ is also integrable,Theorem 2.1 gives the inverse
Fourier transform.
Theorem 2.1: Inverse Fourier Transform. If f ∈ L 1 (R) and f̂ ∈ L 1 (R) then
⫹⬁
1
f̂ () eit d.
f (t) ⫽
2 ⫺⬁
(2.8)
Proof. Replacing f̂ () by its integral expression yields
1
2
⫹⬁
⫺⬁
1
2
f̂ () exp(it) d ⫽
⫹⬁ ⫹⬁
⫺⬁
⫺⬁
f (u) exp[i(t ⫺ u)] du d.
We cannot apply the Fubini Theorem reffubini directly because f (u) exp[i(t ⫺ u)]
is not integrable in R2 . To avoid this technical problem, we multiply by exp(⫺2 2 /4),
which converges to 1 when goes to 0. Let us define
1
I (t) ⫽
2
⫹⬁ ⫹⬁
⫺2 2
f (u) exp
4
⫺⬁
⫺⬁
exp[i(t ⫺ u)] du d.
(2.9)
We compute I in two different ways using the Fubini theorem. The integration with
respect to u gives
I (t) ⫽
Since
1
2
⫹⬁
⫺⬁
f̂ () exp
⫺2 2
4
exp(it) d.
2 2
f̂ () exp ⫺ exp[i(t ⫺ u)] ⭐ | f̂ ()|
4
and since f̂ is integrable, we can apply the dominated convergence Theorem A.1, which
proves that
lim I (t) ⫽
→0
1
2
⫹⬁
⫺⬁
f̂ () exp(it) d.
(2.10)
Let us now compute the integral (2.9) differently by applying the Fubini theorem and
integrating with respect to :
I (t) ⫽
⫹⬁
⫺⬁
g (t ⫺ u) f (u) du,
with
g (x) ⫽
1
2
⫹⬁
⫺⬁
exp(ix) exp
⫺2 2
4
(2.11)
d.
A change of variable ⬘ ⫽ shows that g (x) ⫽ ⫺1 g1 (⫺1 x), and it is proved in (2.32)
2
that g1 (x) ⫽ ⫺1/2 e⫺x . The Gaussian g1 has an integral equal to 1 and a fast decay. The
2.2 Fourier Integrals
squeezed Gaussians g have an integral that remains equal to 1, and thus they converge
to a Dirac ␦ when goes to 0. By inserting (2.11), one can thus verify that
⫹⬁
lim
→0 ⫺⬁
|I (t) ⫺ f (t)| dt ⫽ lim
g (t ⫺ u) | f (u) ⫺ f (t)| du dt ⫽ 0.
→0
■
Inserting (2.10) proves (2.8).
The inversion formula (2.8) decomposes f as a sum of sinusoidal waves eit of
amplitude f̂ (). By using this formula,we can show (Exercise 2.1) that the hypothesis f̂ ∈ L 1 (R) implies that f must be continuous. Therefore the reconstruction (2.8)
is not proved for discontinuous functions. The extension of the Fourier transform
to the space L 2 (R) will address this issue.
The most important property of the Fourier transform for signal-processing applications is the convolution theorem 2.2. It is another way to express the fact that
sinusoidal waves eit are eigenvalues of convolution operators.
Theorem 2.2: Convolution. Let f ∈ L 1 (R) and h ∈ L 1 (R). The function g ⫽ h f is in L 1 (R)
and
ĝ() ⫽ ĥ() f̂ ().
Proof.
ĝ() ⫽
⫹⬁
⫹⬁
⫺⬁
exp(⫺it)
⫺⬁
(2.12)
f (t ⫺ u) h(u) du dt.
Since | f (t ⫺ u)||h(u)| is integrable in R2 , we can apply the Fubini Theorem A.2, and the
change of variable (t, u) → (v ⫽ t ⫺ u, u) yields
ĝ() ⫽
⫽
⫹⬁ ⫹⬁
⫺⬁
⫹⬁
⫺⬁
⫺⬁
exp[⫺i(u ⫹ v)] f (v) h(u) du dv
exp(⫺iv) f (v) dv
⫹⬁
⫺⬁
exp(⫺iu) h(u) du ,
■
which verifies (2.12).
The response L f ⫽ g ⫽ f h of a linear time-invariant system can be calculated
from its Fourier transform ĝ() ⫽ f̂ () ĥ() with the inverse Fourier formula,
⫹⬁
1
g(t) ⫽
ĝ() eit d,
(2.13)
2 ⫺⬁
which yields
L f (t) ⫽
1
2
⫹⬁
⫺⬁
ĥ() f̂ () eit d.
(2.14)
Each frequency component eit of amplitude f̂ () is amplified or attenuated by
ĥ(). Such a convolution is thus called a frequency filtering, and ĥ is the transfer
function of the filter.
37
38
CHAPTER 2 The Fourier Kingdom
Table 2.1 Fourier Transform Properties
Property
Function
Fourier Transform
f (t)
f̂ ()
f̂ (t)
2 f (⫺)
(2.15)
f1 f2 (t)
f̂1 () f̂2 ()
(2.16)
Multiplication
f1 (t) f2 (t)
1
f̂1 f̂2 ()
2
(2.17)
Translation
f (t ⫺ u)
e⫺iu f̂ ()
(2.18)
Modulation
eit f (t)
f̂ ( ⫺ )
(2.19)
Scaling
f (t/s)
|s| f̂ (s )
(2.20)
Time derivatives
f ( p) (t)
(i)p f̂ ()
(2.21)
Frequency derivatives
(⫺it)p f (t)
f̂ ( p) ()
(2.22)
Complex conjugate
f ∗ (t)
f̂ ∗ (⫺)
(2.23)
f (t) ∈ R
f̂ (⫺) ⫽ f̂ ∗ ()
(2.24)
Inverse
Convolution
Hermitian symmetry
Table 2.1 summarizes important Fourier transform properties that are often used
in calculations. Most of the formulas are proved with a change of variable in the
Fourier integral.
2.2.2 Fourier Transform in L 2 (R)
The Fourier transform of the indicator function f ⫽ 1[⫺1,1] is
1
2 sin
f̂ () ⫽
e⫺it dt ⫽
.
⫺1
This function is not integrable because f is not continuous, but its square is
integrable. The inverse Fourier transform, Theorem 2.1, thus does not apply. This
motivates the extension
of the Fourier transform to the space L 2 (R) of functions
⫹⬁
f with a finite energy ⫺⬁ | f (t)|2 dt ⬍ ⫹⬁. By working in the Hilbert space L 2 (R),
we also have access to all the facilities provided by the existence of an inner product. The inner product of f ∈ L 2 (R) and g ∈ L 2 (R) is
⫹⬁
f , g ⫽
f (t) g ∗ (t) dt,
⫺⬁
and the resulting norm in L 2 (R) is
f ⫽ f , f ⫽
2
⫹⬁
⫺⬁
| f (t)|2 dt.
2.2 Fourier Integrals
Theorem 2.3 proves that inner products and norms in L 2 (R) are conserved by
the Fourier transform up to a factor of 2. Equations (2.25) and (2.26) are called
the Parseval and Plancherel formulas, respectively.
Theorem 2.3. If f and h are in L 1 (R) ∩ L 2 (R), then
⫹⬁
⫹⬁
1
f̂ () ĥ∗ () d.
f (t) h∗ (t) dt ⫽
2 ⫺⬁
⫺⬁
(2.25)
For h ⫽ f it follows that
⫹⬁
⫺⬁
| f (t)|2 dt ⫽
1
2
⫹⬁
⫺⬁
| f̂ ()|2 d.
(2.26)
Proof. Let g ⫽ f h̄ with h̄(t) ⫽ h∗ (⫺t). The convolution, Theorem 2.2, and property
(2.23) show that ĝ() ⫽ f̂ () ĥ∗ (). The reconstruction formula (2.8) applied to g(0)
yields
⫹⬁
⫺⬁
f (t) h∗ (t) dt ⫽ g(0) ⫽
1
2
⫹⬁
⫺⬁
ĝ() d ⫽
1
2
⫹⬁
⫺⬁
f̂ () ĥ∗ () d.
■
Density Extension in L 2 (R)
If f ∈ L 2 (R) but f ∈
/ L 1 (R), its Fourier transform cannot be calculated with the
Fourier integral (2.6) because f (t) eit is not integrable. It is defined as a limit
using the Fourier transforms of functions in L 1 (R) ∩ L 2 (R).
Since L 1 (R) ∩ L 2 (R) is dense in L 2 (R), one can find a family { fn }n∈Z of functions
in L 1 (R) ∩ L 2 (R) that converges to f :
lim f ⫺ fn ⫽ 0.
n→⫹⬁
Since { fn }n∈Z converges, it is a Cauchy sequence, which means that fn ⫺ fp is
arbitrarily small if n and p are large enough. Moreover, fn ∈ L 1 (R), so its Fourier
transform f̂n is well defined.
The Plancherel formula (2.26) proves that { f̂n }n∈Z is also a Cauchy sequence
because
√
f̂n ⫺ f̂p ⫽ 2 fn ⫺ fp is arbitrarily small for large enough n and p. A Hilbert space (Appendix A.2) is
complete, which means that all Cauchy sequences converge to an element of the
space. Thus, there exists f̂ ∈ L 2 (R) such that
lim f̂ ⫺ f̂n ⫽ 0.
n→⫹⬁
By definition, f̂ is the Fourier transform of f .This extension of the Fourier transform
to L 2 (R) satisfies the convolution theorem, the Parseval and Plancherel formulas, as
well as all properties (2.15 to 2.24).
39
40
CHAPTER 2 The Fourier Kingdom
Diracs
Diracs are often used in calculations; their properties are summarized in Section A.7
in the Appendix. A Dirac ␦ associates its value to a function at t ⫽ 0. Since eit ⫽ 1
at t ⫽ 0, it seems reasonable to define its Fourier transform by
⫹⬁
ˆ
␦()
⫽
␦(t) e⫺it dt ⫽ 1.
(2.27)
⫺⬁
This formula is justified mathematically by the extension of the Fourier transform to
tempered distributions [61, 64].
2.2.3 Examples
The following examples often appear in Fourier calculations. They also illustrate
important Fourier transform properties.
The indicator function f ⫽ 1[⫺T ,T ] is discontinuous at t ⫽⫾ T . Its Fourier transform
is therefore not integrable:
T
2 sin(T )
f̂ () ⫽
e⫺it dt ⫽
.
(2.28)
⫺T
An ideal low-pass filter has a transfer function ˆ ⫽ 1[⫺,] that selects low frequencies over [⫺, ]. The impulse response is calculated with the inverse Fourier
integral (2.8):
1
sin(t)
(t) ⫽
eit d ⫽
.
(2.29)
2 ⫺
t
A passive electronic circuit implements analog filters with resistances, capacities,
and inductors. The input voltage f (t) is related to the output voltage g(t) by a
differential equation with constant coefficients:
K
k⫽0
ak f (k) (t) ⫽
M
bk g (k) (t).
(2.30)
k⫽0
Suppose that the circuit is not charged for t ⬍ 0,which means that f (t) ⫽ g(t) ⫽ 0.
The output g is a linear time-invariant function of f and thus can be written g ⫽ f .
Computing the Fourier transform of (2.30) and applying (2.22) proves that
K
k
ˆ() ⫽ ĝ() ⫽ k⫽0 ak (i) .
(2.31)
M
k
f̂ ()
k⫽0 bk (i)
It therefore is a rational function of i. An ideal low-pass transfer function
1[⫺,] thus cannot be implemented by an analog circuit. It must be approximated
by a rational function. Chebyshev or Butterworth filters are often used for this
purpose [14].
2.2 Fourier Integrals
A Gaussian f (t) ⫽ exp(⫺t 2 ) is a C⬁ function with a fast asymptotic decay. Its Fourier
transform is also a Gaussian:
√
f̂ () ⫽ exp(⫺2 /4).
(2.32)
ThisFourier transform is computed by showing with an integration by parts that
⫹⬁
f̂ () ⫽ ⫺⬁ exp(⫺t 2 ) e⫺it dt is differentiable and satisfies the differential equation
2 f̂ ⬘() ⫹ f̂ () ⫽ 0.
(2.33)
The solution to this equation
is a Gaussian f̂ () ⫽ K exp(⫺2 /4), and since
√
⫹⬁
2
f̂ (0) ⫽ ⫺⬁ exp(⫺t ) dt ⫽ , we obtain (2.32).
A Gaussian chirp f (t) ⫽ exp[⫺(a ⫺ ib)t 2 ] has a Fourier transform calculated with a
similar differential equation
f̂ () ⫽
⫺(a ⫹ ib)2
exp
.
a ⫺ ib
4(a2 ⫹ b2 )
(2.34)
A translated Dirac ␦ (t) ⫽ ␦(t ⫺ ) has a Fourier transform calculated by evaluating e⫺it at t ⫽ :
␦ˆ () ⫽
⫹⬁
⫺⬁
␦(t ⫺ ) e⫺it dt ⫽ e⫺i .
(2.35)
The Dirac comb is a sum of translated Diracs
c(t) ⫽
⫹⬁
␦(t ⫺ nT )
n⫽⫺⬁
used to uniformly sample analog signals. Its Fourier transform is derived from (2.35):
ĉ() ⫽
⫹⬁
e⫺inT .
(2.36)
n⫽⫺⬁
The Poisson formula proves that it is also equal to a Dirac comb with a spacing
equal to 2/T .
Theorem 2.4: Poisson Formula. In the sense of distribution equalities (A.29),
⫹⬁
n⫽⫺⬁
e⫺inT ⫽
⫹⬁
2 2k
␦ ⫺
.
T k⫽⫺⬁
T
(2.37)
Proof. The Fourier transform ĉ in (2.36) is periodic with period 2/T . To verify the Poisson
formula,it is therefore sufficient to prove that the restriction of ĉ to [⫺/T , /T ] is equal
41
42
CHAPTER 2 The Fourier Kingdom
to 2/T ␦. The formula (2.37) is proved in the sense of a distribution equality (A.29) by
ˆ
showing that for any test function ()
with a support included in [⫺/T , /T ],
⫹⬁ N
ˆ ⫽ lim
ĉ,
N →⫹⬁ ⫺⬁
n⫽⫺N
ˆ
exp(⫺inT ) ()
d ⫽
2 ˆ
(0).
T
The sum of the geometric series is
N
exp(⫺inT ) ⫽
n⫽⫺N
Thus,
ˆ ⫽ lim 2
ĉ,
N →⫹⬁ T
sin[(N ⫹ 1/2)T ]
.
sin[T /2]
(2.38)
/T
sin[(N ⫹ 1/2)T ] T /2 ˆ
() d.
sin[T /2]
⫺/T
(2.39)
Let
T /2
ˆ
()
sin[T /2]
ˆ
()
⫽
0
if || ⭐ /T
if || ⬎ /T
ˆ
and (t) be the inverse Fourier transform of ().
Since 2⫺1 sin(a) is the Fourier
transform of 1[⫺a,a] (t), the Parseval formula (2.25) implies
2
N →⫹⬁ T
ˆ ⫽ lim
ĉ,
2
N →⫹⬁ T
⫽ lim
⫹⬁
⫺⬁
sin[(N ⫹ 1/2)T ] ˆ
() d
(N ⫹1/2)T
⫺(N ⫹1/2)T
(2.40)
(t) dt.
ˆ
ˆ
When N goes to ⫹⬁ the integral converges to (0)
⫽ (0).
■
2.3 PROPERTIES
2.3.1 Regularity and Decay
The global regularity of a signal f depends on the decay of | f̂ ()| when the frequency increases.The differentiability of f is studied. If f̂ ∈ L 1 (R),then the Fourier
inversion formula (2.8) implies that f is continuous and bounded:
⫹⬁
⫹⬁
1
1
| f (t)| ⭐
|eit f̂ ()| d ⫽
| f̂ ()| d ⬍ ⫹⬁ .
(2.41)
2 ⫺⬁
2 ⫺⬁
Theorem 2.5 applies this property to obtain a sufficient condition that guarantees
the differentiability of f at any order p.
Theorem 2.5. A function f is bounded and p times continuously differentiable with
bounded derivatives if
⫹⬁
| f̂ ()| (1 ⫹ ||p ) d ⬍ ⫹⬁ .
(2.42)
⫺⬁
2.3 Properties
Proof. The Fourier transform of the kth-order derivative f (k) (t) is (i)k f̂ (). Applying (2.41)
to this derivative proves that
| f (k) (t)| ⭐
⫹⬁
⫺⬁
| f̂ ()| ||k d.
Condition (2.42) implies that
⫹⬁
⫺⬁
| f̂ ()|||k d ⬍ ⫹⬁
for any k ⭐ p, so f (k) (t) is continuous and bounded.
■
This result proves that if a constant K and ⬎ 0 exist such that
| f̂ ()| ⭐
K
,
1 ⫹ || p⫹1⫹
then
f ∈ C p.
If f̂ has a compact support, then (2.42) implies that f ∈ C⬁ .
The decay of | f̂ ()| depends on the worst singular behavior of f . For example,
f ⫽ 1[⫺T ,T ] is discontinuous at t ⫽⫾ T , so | f̂ ()| decays like ||⫺1 . In this case, it
could also be important to know that f (t) is regular for t ⫽⫾ T . This information
cannot be derived from the decay of | f̂ ()|.To characterize local regularity of a signal
f , it is necessary to decompose it over waveforms that are sufficiently localized in
time, as opposed to sinusoidal waves eit . Section 6.1.3 explains that wavelets are
particularly appropriate for this purpose.
2.3.2 Uncertainty Principle
Can we construct a function f , with an energy that is highly localized in time and
with a Fourier transform f̂ having an energy concentrated in a small-frequency
interval? The Dirac ␦(t ⫺ u) has a support restricted to t ⫽ u, but its Fourier transform e⫺iu has an energy uniformly spread over all frequencies. We know that
| f̂ ()| decays quickly at high frequencies only if f has regular variations in time.
The energy of f therefore must be spread over a relatively large domain.
To reduce the time spread of f , we can scale it by s ⬍ 1, while keeping its total
energy constant. If
1
t
fs (t) ⫽ √ f
, then fs 2 ⫽ f 2 .
s
s
√
The Fourier transform f̂s () ⫽ s f̂ (s) is dilated by 1/s, so we lose in frequency
localization what we gained in time. Underlying is a trade-off between time and
frequency localization.
Time and frequency energy concentrations are restricted by the Heisenberg
uncertainty principle. This principle has a particularly important interpretation in
quantum mechanics as an uncertainty on the position and momentum of a free
particle. The state of a one-dimensional particle is described by a wave function
43
44
CHAPTER 2 The Fourier Kingdom
f ∈ L 2 (R). The probability density that this particle is located at t is f12 | f (t)|2 .
The probability density that its momentum is equal to is 21f 2 | f̂ ()|2 . The
average location of this particle is
⫹⬁
1
u⫽
t | f (t)|2 dt,
(2.43)
f 2 ⫺⬁
and the average momentum is
⫽
1
2 f 2
⫹⬁
⫺⬁
| f̂ ()|2 d.
The variances around these average values are, respectively,
⫹⬁
1
t2 ⫽
(t ⫺ u)2 | f (t)|2 dt
f 2 ⫺⬁
(2.44)
(2.45)
and
2 ⫽
1
2 f 2
⫹⬁
⫺⬁
( ⫺ )2 | f̂ ()|2 d.
(2.46)
The larger t , the more uncertainty there is concerning the position of the free
particle; the larger , the more uncertainty there is concerning its momentum.
Theorem 2.6: Heisenberg Uncertainty. The temporal variance and the frequency
variance of f ∈ L 2 (R) satisfy
t2 2 ⭓
1
.
4
(2.47)
This inequality is an equality if and only if there exist (u, , a, b) ∈ R2 ⫻ C2 such that
f (t) ⫽ a exp[it ⫺ b(t ⫺ u)2 ].
(2.48)
√
Proof. The following proof, from Weyl [70], supposes that lim|t|→⫹⬁ tf (t) ⫽ 0, but the theorem is valid for any f ∈ L 2 (R). If the average time and frequency localization of f is u
and , then the average time and frequency location of exp(⫺it) f (t ⫹ u) is zero. Thus,
it is sufficient to prove the theorem for u ⫽ ⫽ 0. Observe that
t2 2 ⫽
1
2 f 4
⫹⬁
⫺⬁
|t f (t)|2 dt
⫹⬁
⫺⬁
| f̂ ()|2 d.
(2.49)
Since if̂ () is the Fourier transform of f ⬘(t), the Plancherel identity (2.26) applied to
if̂ () yields
t2 2 ⫽
1
f 4
⫹⬁
⫺⬁
|t f (t)| dt
2
⫹⬁
⫺⬁
| f ⬘(t)|2 dt.
(2.50)
2.3 Properties
Schwarz’s inequality implies
t2 2 ⭓
Since lim|t|→⫹⬁
1
f 4
⭓
1
f 4
⭓
1
4 f 4
⫹⬁
⫺⬁
⫹⬁
⫺⬁
|t f ⬘(t) f ∗ (t)| dt
2
t
[ f ⬘(t) f ∗ (t) ⫹ f ⬘∗ (t) f (t)] dt
2
⫹⬁
⫺⬁
2
2
t (| f (t)|2 )⬘ dt
.
√
t f (t) ⫽ 0, an integration by parts gives
t2 2 ⭓
1
4 f 4
⫹⬁
⫺⬁
2
| f (t)|2 dt
1
⫽ .
4
(2.51)
To obtain an equality, Schwarz’s inequality applied to (2.50) must be an equality. This
implies that there exists b ∈ C such that
f ⬘(t) ⫽ ⫺2 b t f (t).
(2.52)
Thus, there exists a ∈ C such that f (t) ⫽ a exp(⫺bt 2 ). The other steps of the proof are
then equalities so that the lower bound is indeed reached. When u ⫽ 0 and ⫽ 0, the
corresponding time and frequency translations yield (2.48).
■
In quantum mechanics,this theorem shows that we cannot arbitrarily reduce the
uncertainty as to the position and the momentum of a free particle. In signal processing, the modulated Gaussians (2.48) that have a minimum joint time-frequency
localization are called Gabor chirps. As expected, they are smooth functions with
fast time asymptotic decay.
Compact Support
Despite the Heisenberg uncertainty bound, we might still be able to construct a
function of compact support whose Fourier transform has a compact support. Such
a function would be very useful for constructing a finite impulse response filter with
a band-limited transfer function. Unfortunately, Theorem 2.7 proves that it does
not exist.
Theorem 2.7. If f ⫽ 0 has a compact support then f̂ () cannot be zero on a whole
interval. Similarly, if f̂ ⫽ 0 has a compact support, then f (t) cannot be zero on a whole
interval.
Proof. We prove only the first statement because the second is derived from the first by
applying the Fourier transform. If f̂ has a compact support included in [⫺b, b], then
f (t) ⫽
1
2
b
⫺b
f̂ () exp(it) d.
(2.53)
45
46
CHAPTER 2 The Fourier Kingdom
If f (t) ⫽ 0 for t ∈ [c, d], by differentiating n times under the integral at t0 ⫽ (c ⫹ d)/2, we
obtain
f (n) (t0 ) ⫽
Since
f (t) ⫽
1
2
1
2
b
⫺b
b
f̂ () (i)n exp(it0 ) d ⫽ 0.
(2.54)
f̂ () exp[i(t ⫺ t0 )] exp(it0 ) d,
(2.55)
⫺b
developing exp[i(t ⫺ t0 )] as an infinite series yields, for all t ∈ R,
⫹⬁
f (t) ⫽
1 [i(t ⫺ t0 )]n
2 n⫽0
n!
b
⫺b
f̂ () n exp(it0 ) d ⫽ 0.
This contradicts our assumption that f ⫽ 0.
(2.56)
■
2.3.3 Total Variation
Total variation measures the total amplitude of signal oscillations. It plays an important role in image processing,where its value depends on the length the image level
sets. We show that a low-pass filter can considerably amplify the total variation by
creating Gibbs oscillations.
Variations and Oscillations
If f is differentiable, its total variation is defined by
⫹⬁
f V ⫽
| f ⬘(t)| dt.
⫺⬁
(2.57)
If {xp }p are the abscissa of the local extrema of f , where f ⬘(xp ) ⫽ 0, then
f V ⫽
| f (xp⫹1 ) ⫺ f (xp )|.
p
Thus, it measures the total amplitude of the oscillations of f . For example, if f (t) ⫽
exp(⫺t 2 ), then f V ⫽ 2. If f (t) ⫽ sin(t)/(t), then f has a local extrema at xp ∈
[ p, p ⫹ 1] for any p ∈ Z. Since | f (xp⫹1 ) ⫺ f (xp )| ∼ | p|⫺1 , we derive that f V ⫽ ⫹⬁.
The total variation of nondifferentiable functions can be calculated by considering the derivative in the general sense of distributions [61, 75]. This is equivalent
to approximating the derivative by a finite difference on an interval h that goes to
zero:
⫹⬁
| f (t) ⫺ f (t ⫺ h)|
f V ⫽ lim
dt.
(2.58)
h→0 ⫺⬁
|h|
The total variation of discontinuous functions is thus well defined. For example, if
f ⫽ 1[a,b] , then (2.58) gives f V ⫽ 2. We say that f has a bounded variation if
f V ⬍ ⫹⬁.
2.3 Properties
Whether f ⬘ is the standard derivative of f or its generalized derivative in the
sense of distributions, its Fourier transform is f ⬘() ⫽ if̂ (). Therefore
⫹⬁
|| | f̂ ()| ⭐
| f ⬘(t)|dt ⫽ f V ,
⫺⬁
which implies that
| f̂ ()| ⭐
f V
.
||
(2.59)
However, | f̂ ()| ⫽ O(||⫺1 ) is not a sufficient condition to guarantee that f has
bounded variation. For example, if f (t) ⫽ sin(t)/(t), then f̂ ⫽ 1[⫺,] satisfies
| f̂ ()| ⭐ ||⫺1 although f V ⫽ ⫹⬁. In general, the total variation of f cannot be
evaluated from | f̂ ()|.
Discrete Signals
Let fN [n] ⫽ f N (n/N ) be a discrete signal obtained with an averaging filter,
N (t) ⫽ 1[0,N ⫺1 ] (t), and a uniform sampling at intervals N ⫺1 . The discrete total
variation is calculated by approximating the signal derivative by a finite difference
over the sampling distance, h ⫽ N ⫺1 , and replacing the integral (2.58) by a Riemann
sum, which gives
fN V ⫽
| fN [n] ⫺ fN [n ⫺ 1]|.
(2.60)
n
If np are the abscissa of the local extrema of fN , then
fN V ⫽
| fN [np⫹1 ] ⫺ fN [np ]|.
p
The total variation thus measures the total amplitude of the oscillations of f . In
accordance with (2.58), we say that the discrete signal has a bounded variation if
fN V is bounded by a constant independent of the resolution N .
Gibbs Oscillations
Filtering a signal with a low-pass filter can create oscillations that have an infinite
total variation. Let f ⫽ f be the filtered signal obtained with an ideal low-pass
ˆ ⫽ 1[⫺,] . If f ∈ L 2 (R), then f converges to f in
filter whose transfer function is
2
L (R) norm:lim→⫹⬁ f ⫺ f ⫽ 0. Indeed, f̂ ⫽ f̂ 1[⫺,] and the Plancherel formula
(2.26) imply that
⫹⬁
1
1
f ⫺ f 2 ⫽
| f̂ () ⫺ f̂ ()|2 d ⫽
| f̂ ()|2 d,
2 ⫺⬁
2 ||⬎
which goes to zero as increases. However, if f is discontinuous in t0 , then we
show that f has Gibbs oscillations in the neighborhood of t0 , which prevents
supt∈R | f (t) ⫺ f (t)| from converging to zero as increases.
47
48
CHAPTER 2 The Fourier Kingdom
Let f be a bounded variation function f V ⬍ ⫹⬁ that has an isolated discontinuity at t0 , with a left limit f (t0⫺ ) and right limit f (t0⫹ ). It is decomposed as a
sum of fc , which is continuous in the neighborhood of t0 , plus a Heaviside step of
amplitude f (t0⫹ ) ⫺ f (t0⫺ ):
f (t) ⫽ fc (t) ⫹ [ f (t0⫹ ) ⫺ f (t0⫺ )] u(t ⫺ t0 ),
with
u(t) ⫽
if t ⭓ 0
.
otherwise
1
0
(2.61)
Thus,
f (t) ⫽ fc (t) ⫹ [ f (t0⫹ ) ⫺ f (t0⫺ )] u (t ⫺ t0 ).
(2.62)
Since fc has bounded variation and is uniformly continuous in the neighborhood
of t0 , one can prove (Exercise 2.15) that fc (t) converges uniformly to fc (t) in
a neighborhood of t0 . The following theorem shows that this is not true for u ,
which creates Gibbs oscillations.
Theorem 2.8: Gibbs. For any ⬎ 0,
u (t) ⫽
t
sin x
dx.
⫺⬁ x
(2.63)
Proof. The impulse response of an ideal low-pass filter, calculated in (2.29), is (t) ⫽
sin(t)/(t). Thus,
⫹⬁
⫹⬁
sin (t ⫺ )
sin (t ⫺ )
u (t) ⫽
u()
d ⫽
d.
(t
⫺
)
(t ⫺ )
⫺⬁
0
The change of variable x ⫽ (t ⫺ ) gives (2.63).
The function
s(t) ⫽
■
t
sin x
dx
⫺⬁ x
is a sigmoid that increases from 0 at t ⫽ ⫺⬁ to 1 at t ⫽ ⫹⬁, with s(0) ⫽ 1/2. It has
oscillations of period /, which are attenuated when the distance to 0 increases;
however, their total variation is infinite: sV ⫽ ⫹⬁. The maximum amplitude of the
Gibbs oscillations occurs at t ⫽⫾ /, with an amplitude independent of :
sin x
A ⫽ s() ⫺ 1 ⫽
dx ⫺ 1 ≈ 0.045.
⫺⬁ x
Inserting (2.63) into (2.62) shows that
f (t) ⫺ f (t) ⫽ [ f (t0⫹ ) ⫺ f (t0⫺ )] s((t ⫺ t0 )) ⫹ (, t),
(2.64)
where lim→⫹⬁ sup|t⫺t0 |⬍␣ |(, t)| ⫽ 0 in some neighborhood of size ␣ ⬎ 0 around
t0 .The sigmoid s((t ⫺ t0 )) centered at t0 creates a maximum error of fixed amplitude
2.3 Properties
f (t)
f
0.4
0.4
0.2
0.2
0
0
4 (t)
20.2
20.2
0
0.5
f
1
0
2 (t)
0.5
f
0.4
0.4
0.2
0.2
0
0
20.2
1
(t)
20.2
0
0.5
1
0
0.5
1
FIGURE 2.1
Gibbs oscillations created by low-pass filters with cut-off frequencies that decrease from left
to right.
for all . This is seen in Figure 2.1, where the Gibbs oscillations have an amplitude
proportional to the jump f (t0⫹ ) ⫺ f (t0⫺ ) at all frequencies .
Image Total Variation
The total variation of an image f (x1 , x2 ) depends on the amplitude of its variations
as well as the length of the contours along which they occur. Suppose that f (x1 , x2 )
is differentiable. The total variation is defined by
(x1 , x2 )| dx1 dx2 ,
f V ⫽
|ⵜf
(2.65)
where the modulus of the gradient vector is
1/2
⭸f (x1 , x2 ) 2 ⭸f (x1 , x2 ) 2
|ⵜf (x1 , x2 )| ⫽ .
⫹ ⭸x
⭸x1
2
49
50
CHAPTER 2 The Fourier Kingdom
As in one dimension, the total variation is extended to discontinuous functions
by taking the derivatives in the general sense of distributions. An equivalent norm
is obtained by approximating the partial derivatives by finite differences:
f (x1 , x2 ) ⫺ f (x1 ⫺ h, x2 ) 2
|⌬h f (x1 , x2 )| ⫽ h
1/2
f (x1 , x2 ) ⫺ f (x1 , x2 ⫺ h) 2
⫹ .
h
One can verify that
f V ⭐ lim
h→0
√
|⌬h f (x1 , x2 )| dx1 dx2 ⭐ 2 f V .
(2.66)
The finite difference integral gives a larger value when f (x1 , x2 ) is discontinuous
along a diagonal line in the (x1 , x2 ) plane.
The total variation of f is related to the length of its level sets. Let us define
⍀y ⫽ {(x1 , x2 ) ∈ R2
:
f (x1 , x2 ) ⬎ y}.
If f is continuous,then the boundary ⭸⍀y of ⍀y is the level set of all (x1 , x2 ) such that
f (x1 , x2 ) ⫽ y. Let H 1 (⭸⍀y ) be the length of ⭸⍀y . Formally, this length is calculated
in the sense of the mono-dimensional Hausdorff measure. Theorem 2.9 relates the
total variation of f to the length of its level sets.
Theorem 2.9: Co-area Formula. If f V ⬍ ⫹⬁, then
⫹⬁
H 1 (⭸⍀y ) dy.
f V ⫽
⫺⬁
(2.67)
Proof. The proof is a highly technical result that is given in [75]. Here we give an intuitive
explanation when f is continuously differentiable. In this case ⭸⍀y is a differentiable
curve x( y, s) ∈ R2 , which is parameterized by arc-length s. Let (x) be the vector tan (x) is orthogonal to (x). The Frenet
gent to this curve in the plane. The gradient ⵜf
coordinate system along ⭸⍀y is composed of (x) and of the unit vector n
(x) parallel
(x). Let ds and dn be the Lebesgue measures in the direction of and n
to ⵜf
. We then
have
(x)| ⫽ ⵜf
(x) · n
|ⵜf
⫽
dy
,
dn
(2.68)
where dy is the differential of amplitudes across level sets. The idea of the proof is to
decompose the total variation integral over the plane as an integral along the level sets
and across level sets, which is written:
f V ⫽
(x1 , x2 )| dx1 dx2 ⫽
|ⵜf
⭸⍀y
(x( y, s))| ds dn.
|ⵜf
(2.69)
2.4 Two-Dimensional Fourier Transform
By using (2.68), we can get
f V ⫽
But
⭸⍀y ds ⫽ H
⭸⍀y
ds dy.
1 (⭸⍀ ) is the length of the level set, which justifies (2.67).
y
■
The co-area formula gives an important geometrical interpretation of the total
image variation. Images have a bounded gray level. Thus, the integral (2.67) is calculated over a finite interval and is proportional to the average length of level sets. It
is finite as long as the level sets are not fractal curves. Let f ⫽ ␣ 1⍀ be proportional
to the indicator function of a set ⍀ ⊂ R2 that has a boundary ⭸⍀ of length L. The
co-area formula (2.9) implies that f V ⫽ ␣ L. In general, bounded variation images
must have step edges of finite length.
Discrete Images
A camera measures light intensity with photoreceptors that perform an averaging
and a uniform sampling over a grid that is supposed to be uniform. For a resolution
N , the sampling interval is N ⫺1 . The resulting image can be written fN [n1 , n2 ] ⫽
f N (n1 /N , n2 /N ), where N ⫽ 1[0,N ⫺1 ]2 and f is the averaged analog image. Its
total variation is defined by approximating derivatives by finite differences and the
integral (2.66) by a Riemann sum:
fN V ⫽
1 | fN [n1 , n2 ] ⫺ fN [n1 ⫺ 1, n2 ]|2
N n n
1
2
⫹ | fN [n1 , n2 ] ⫺ fN [n1 , n2 ⫺ 1]|2
1/2
.
(2.70)
In accordance with (2.66),we say that the image has bounded variation if fN V
is bounded by a constant independent of resolution N . The co-area formula proves
that it depends on the length of the level sets as the image resolution increases. The
√
2 upper-bound factor in √
(2.66) comes from the fact that the length of a diagonal
line can be increased by 2 if it is approximated by a zig-zag line that remains
on the horizontal and vertical segments of the image-sampling grid. Figure 2.2(a)
shows a bounded variation image and Figure 2.2(b) displays the level sets obtained
by uniformly discretizing amplitude variable y. The total variation of this image
remains nearly constant as resolution varies.
2.4 TWO-DIMENSIONAL FOURIER TRANSFORM
The Fourier transform in Rn is a straightforward extension of the one-dimensional
Fourier transform. The two-dimensional case is briefly reviewed for imageprocessing applications. The Fourier transform of a two-dimensional integrable
51
52
CHAPTER 2 The Fourier Kingdom
( a)
( b)
FIGURE 2.2
(a) The total variation of this image remains nearly constant when resolution N increases.
(b) Level sets ⭸⍀y obtained by uniformly sampling amplitude variable y.
function, f ∈ L 1 (R2 ), is
f̂ (1 , 2 ) ⫽
⫹⬁ ⫹⬁
⫺⬁
⫺⬁
f (x1 , x2 ) exp[⫺i(1 x1 ⫹ 2 x2 )] dx1 dx2 .
(2.71)
In polar coordinates, exp[i(1 x ⫹ 2 y)] can be rewritten
exp[i(1 x1 ⫹ 2 x2 )] ⫽ exp[i(x1 cos ⫹ x2 sin )],
with ⫽ 21 ⫹ 22 . It is a plane wave that propagates in the direction of and
oscillates at frequency . The properties of a two-dimensional Fourier transform are
essentially the same as in one dimension. Let us summarize a few
important results.
We
write
⫽
(
,
),
x
⫽
(x
,
x
),
·
x
⫽
x
⫹
x
,
and
f (x1 , x2 )dx1 dx2 ⫽
1
2
1 2
1 1
2 2
f (x) dx.
■
■
If f ∈ L 1 (R2 ) and f̂ ∈ L 1 (R2 ), then
1
f̂ () exp[i( · x)] d.
f (x) ⫽ 2
4
(2.72)
If f ∈ L 1 (R2 ) and h ∈ L 1 (R2 ), then the convolution
g(x) ⫽ f h(x) ⫽
f (u) h(x ⫺ u) du
has a Fourier transform
ĝ() ⫽ f̂ () ĥ().
(2.73)
2.4 Two-Dimensional Fourier Transform
■
The Parseval formula proves that
1
f (x) g ∗ (x) dx ⫽ 2
f̂ () ĝ ∗ () d.
4
If f ⫽ g, we obtain the Plancherel equality
1
| f (x)|2 dx ⫽ 2
| f̂ ()|2 d.
4
(2.74)
(2.75)
The Fourier transform of a finite-energy function thus has finite energy. With
the same density-based argument as in one dimension, energy equivalence
makes it possible to extend the Fourier transform to any function f ∈ L 2 (R2 ).
■
If f ∈ L 2 (R2 ) is separable, which means that
f (x) ⫽ f (x1 , x2 ) ⫽ g(x1 ) h(x2 ),
then its Fourier transform is
f̂ () ⫽ f̂ (1 , 2 ) ⫽ ĝ(1 ) ĥ(2 ),
where ĥ and ĝ are the one-dimensional Fourier transforms of g and h. For
example, the indicator function,
f (x1 , x2 ) ⫽
1
0
if |x1 | ⭐ T , |x2 | ⭐ T
⫽ 1[⫺T ,T ] (x1 ) ⫻ 1[⫺T ,T ] (x2 ),
otherwise
is a separable function, the Fourier transform of which is derived from (2.28):
f̂ (1 , 2 ) ⫽
■
4 sin(T 1 ) sin(T 2 )
.
1 2
If f (x1 , x2 ) is rotated by :
f (x1 , x2 ) ⫽ f (x1 cos ⫺ x2 sin , x1 sin ⫹ x2 cos ),
then its Fourier transform is rotated by :
f̂ (1 , 2 ) ⫽ f̂ (1 cos ⫺ 2 sin , 1 sin ⫹ 2 cos ).
(2.76)
Radon Transform
A Radon transform computes integrals of f ∈ L 2 (R2 ) along rays. It provides a good
model for some tomographic systems such as X-ray measurements in medical imaging. It is then necessary to invert the Radon transform to reconstruct the two- or
three-dimensional body from these integrals.
Let us write ⫽ (cos , sin ). A ray ⌬t, is a line defined by its equation
x · ⫽ x1 cos ⫹ x2 sin ⫽ t.
The projection p of f along a parallel line of orientation is defined by
᭙ ∈ [0, ), ᭙ t ∈ R, p (t) ⫽
f (x) ds ⫽
f (x) ␦(x · ⫺ t) dx,
⌬t,
(2.77)
53
54
CHAPTER 2 The Fourier Kingdom
p1
p1
p2
p 2
1
2
Radon p (t)
Image f
p1 h
p2 h
t
t
1 backprojection
2 backprojections
8 backprojections
16 backprojections
FIGURE 2.3
The Radon transform and its reconstruction with an increasing number of back projections.
where ␦ is the Dirac distribution. The Radon transform maps f (x) to p (t) for
∈ [0, ).
In medical imaging applications, a scanner is rotated around an object to compute the projection p for many angles ∈ [0, ), as illustrated in Figure 2.3. The
Fourier slice, Theorem 2.10, relates the Fourier transform of p to slices of the
Fourier transform of f .
Theorem 2.10: Fourier Slice. The Fourier transform of projections satisfies
᭙ ∈ [0, ), ᭙ ∈ R p̂ () ⫽ f̂ ( cos , sin ).
Proof. The Fourier transform of the projection is
p̂ () ⫽
⫹⬁ ⫺⬁
f (x) ␦(x · ⫺ t) dx e⫺it dt
⫽
f (x) exp (⫺i(x · )) dx ⫽ f̂ ( ).
■
An image f can be recovered from its projection p thanks to the projection
slice theorem. Indeed, the Fourier transform f̂ , known along each ray of direction
2.5 Exercises
and f , is thus obtained with the 2D inverse Fourier transform 2.71. The backprojection theorem (2.11) gives an inversion formula.
Theorem 2.11: Backprojection. The image f is recovered using a one-dimensional
filter h(t):
1
f (x) ⫽
p ∗ h(x · ) d, with ĥ() ⫽ ||.
2 0
Proof. The inverse Fourier transform 2.72 in polar coordinates (1 , 2 ) ⫽ ( cos , sin ),
with d1 d2 ⫽ d d, can be written
f (x) ⫽
1
4 2
⫹⬁ 2
0
0
f̂ ( cos , sin ) exp (i(x · )) d d.
Using the Fourier slice, Theorem 2.10, with p⫹ (t) ⫽ p (⫺t), this is rewritten as
1
f (x) ⫽
2
0
1
2
⫹⬁
⫺⬁
|| p̂ () exp (i(x · )) d d.
(2.78)
The inner integral is the inverse Fourier transform of p̂ () || evaluated at x · ∈ R. The
convolution formula 2.73 shows that it is equal to p ∗ h(x · ).
■
In medical imaging applications,only a limited number of projections is available;
thus, the Fourier transform f̂ is partially known. In this case, an approximation of
f can still be recovered by summing the corresponding filtered backprojections
p ∗ h(x · ). Figure 2.3 shows this process, and the reconstruction of an image
with a geometric object, using an increasing number of evenly spaced projections.
Section 13.3 describes a nonlinear super-resolution reconstruction algorithm that
recovers a more precise image by using a sparse representation.
2.5 EXERCISES
2.1
1 Prove that if f ∈ L 1 (R), then f̂ () is a continuous function of , and that if
f̂ ∈ L 1 (R), then f (t) is continuous.
2.2
1 Prove that a filter with impulse response h(t) is stable only if
2.3
1 Prove the translation (2.1),scaling (2.1), and time derivative (2.1) properties
|h(t)| dt ⬍ ⬁.
of the Fourier transform.
2.4
1 Let f (t) ⫽ Re[ f (t)] and f (t) ⫽ Ima[ f (t)] be the real and imaginary parts of
r
i
f (t). Prove that f̂r () ⫽ [ f̂ () ⫹ f̂ ∗ (⫺)]/2 and f̂i () ⫽ [ f̂ () ⫺ f̂ ∗ (⫺)]/(2i).
Note:Exercises have been ordered by level of difficulty:Level1 exercises are direct applications of the
course. Level2 require more thinking. Level3 exercises include some technical derivations. Level4 are
projects at the interface of research; they are possible topics for a final course project or independent
study.
55
56
CHAPTER 2 The Fourier Kingdom
2.5
1 Prove that if f̂ () is differentiable and f̂ (0) ⫽ f̂ ⬘(0) ⫽ 0, then
⫹⬁
⫺⬁
2.6
⫺⬁
t f (t) dt ⫽ 0.
1 By using the Fourier transform, verify that
⫹⬁
⫺⬁
2.7
f (t) dt ⫽
⫹⬁
sin(t)
dt ⫽ 1
(t)
⫹⬁
and
⫺⬁
3
sin3 t
dt ⫽
.
t3
4
2 Show that the Fourier transform of f (t) ⫽ exp(⫺(a ⫺ ib)t 2 ) is
f̂ () ⫽
a ⫹ ib
2
.
exp ⫺
a ⫺ ib
4(a2 ⫹ b2 )
Hint: write a differential equation similar to (2.33).
2.8
3 Riemann-Lebesgue. Prove that if f ∈ L 1 (R),then lim f̂ () ⫽ 0. Hint: Prove
→⬁
it first for C1 functions with a compact support and use a density argument.
2.9
2 Stability of passive circuits:
(a) Let p be a complex number with Re[ p] ⬍ 0. Compute the Fourier transforms of f (t) ⫽ exp( pt) 1[0,⫹⬁) (t) and of f (t) ⫽ t n exp( pt) 1[0,⫹⬁) (t).
(b) A passive circuit relates the input voltage f to the output voltage g by a
differential equation with constant coefficients:
K
ak f (k) (t) ⫽
k⫽0
M
bk g (k) (t).
k⫽0
Prove thatthis system is stable and causal if and only if the roots of the
k
equation M
k⫽0 bk z ⫽ 0 have a strictly negative real part.
(c) A Butterworth filter satisfies
|ĥ()|2 ⫽
1
.
1 ⫹ (/0 )2N
For N ⫽ 3, compute ĥ() and h(t) so that this filter can be implemented
by a stable electronic circuit.
2.10
1 For any A ⬎ 0, construct f such that the time and frequency spread mea-
sured, respectively, by t and in (2.46) and (2.45) satisfy t ⬎ A and
⬎ A.
2.11
3 Suppose that f (t) ⭓ 0 and that its support is in [⫺T , T ]. Verify that | f̂ ()| ⭐
f̂ (0). Let c be the half-power point defined by | f̂ (c )|2 ⫽ | f (0)|2 /2 and
| f ()|2 ⬍ | f (0)|2 /2 for ⬍ c . Prove that c T ⭓ /2.
2.12
2 Rectification. A rectifier computes g(t) ⫽ | f (t)| for recovering the enve-
lope of modulated signals [54].
2.5 Exercises
(a) Show that if f (t) ⫽ a(t) sin 0 t with a(t) ⭓ 0 then g(t) ⫽ | f (t)| satisfies
ĝ() ⫽ ⫺
⫹⬁
2 â( ⫺ 2n0 )
.
n⫽⫺⬁ 4n2 ⫺ 1
(b) Suppose that â() ⫽ 0 for || ⬎ 0 . Find h such that a(t) ⫽ h g(t).
2.13
2 Amplitude modulation. For 0 ⭐ n ⬍ N , we suppose that f (t) is real and
n
that f̂n () ⫽ 0 for || ⬎ 0 . An amplitude-modulated multiplexed signal is
defined by
g(t) ⫽
N
fn (t) cos(2 n 0 t).
n⫽0
Compute ĝ() and verify that the width of its support is 4N 0 . Find a
demodulation algorithm that recovers each fn from g.
2.14
1 Show that ⫽ ⫹⬁ if (t) ⫽ sin(t)/(t). Show that ⫽ 2 if (t) ⫽
V
V
1[a,b] (t).
⫽ f h with ĥ ⫽ 1[⫺,] . Suppose that f has a bounded variation
f V ⬍ ⫹⬁ and that it is continuous in a neighborhood of t0 . Prove that in a
neighborhood of t0 , f (t) converges uniformly to f (t) when goes to ⫹⬁.
2.15
3 Let f
2.16
1 Compute the two-dimensional Fourier transforms of f (x) ⫽ 1
2
2
and of f (x) ⫽ e⫺(x1 ⫹x2 ) .
2.17
[0,1]2 (x1 , x2 )
1 Compute the Radon transform of the indicator function of the unit circle:
f (x1 , x2 ) ⫽ 1x 1 ⫹x 2 ⭐1 .
2.18
2 Let f (x , x ) be an image that has a discontinuity of amplitude A along a
1 2
straight line that has an angle in the plane (x1 , x2 ). Compute the amplitude of the Gibbs oscillations of f h (x1 , x2 ) as a function of , and A for
ĥ (1 , 2 ) ⫽ 1[⫺,] (1 ) 1[⫺,] (2 ).
57
CHAPTER
Discrete Revolution
3
Digital signal processing has taken over. First used in the 1950s at the service
of analog signal processing to simulate analog transforms, digital algorithms have
invaded most traditional fortresses, including phones, music recording, cameras,
televisions, and all information processing. Analog computations performed with
electronic circuits are faster than digital algorithms implemented with microprocessors but are less precise and less flexible. Thus, analog circuits are often replaced by
digital chips once the computational performance of microprocessors is sufficient
to operate in real time for a given application.
Whether sound recordings or images, most discrete signals are obtained by sampling an analog signal. An analog-to-digital conversion is a linear approximation that
introduces an error dependent on the sampling rate. Once more, the Fourier transform is unavoidable because the eigenvectors of discrete time-invariant operators
are sinusoidal waves. The Fourier transform is discretized for signals of finite size
and implemented with a fast Fourier transform (FFT) algorithm.
3.1 SAMPLING ANALOG SIGNALS
The simplest way to discretize an analog signal f is to record its sample values
{ f (ns)}n∈Z at interval s. An approximation of f (t) at any t ∈ R may be recovered by
interpolating these samples. The Shannon-Whittaker sampling theorem gives a sufficient condition on the support of the Fourier transform f̂ to recover f (t) exactly.
Aliasing and approximation errors are studied when this condition is not satisfied.
Digital-acquisition devices often do not satisfy the restrictive hypothesis of the
Shannon-Whittaker sampling theorem. General linear analog-to-discrete conversion
is introduced in Section 3.1.3, showing that a stable uniform discretization is a linear
approximation. A digital conversion also approximates discrete coefficients, with
a given precision, to store them with a limited number of bits. This quantization
aspect is studied in Chapter 10.
3.1.1 Shannon-Whittaker Sampling Theorem
Sampling is first studied from the more classic Shannon-Whittaker point of view,
which tries to recover f (t) from its samples { f (ns)}n∈Z . A discrete signal can
59
60
CHAPTER 3 Discrete Revolution
be represented as a sum of Diracs. We associate to any sample f (ns) a Dirac
f (ns)␦(t ⫺ ns) located at t ⫽ ns. A uniform sampling of f thus corresponds to the
weighted Dirac sum
fd (t) ⫽
⫹⬁
f (ns) ␦(t ⫺ ns).
(3.1)
n⫽⫺⬁
The Fourier transform of ␦(t ⫺ ns) is e⫺ins , so the Fourier transform of fd is a
Fourier series:
f̂d () ⫽
⫹⬁
f (ns) e⫺ins .
(3.2)
n⫽⫺⬁
To understand how to compute f (t) from the sample values f (ns) and therefore f
from fd , we relate their Fourier transforms f̂ and f̂d .
Theorem 3.1. The Fourier transform of the discrete signal obtained by sampling f at
interval s is
⫹⬁
f̂d () ⫽
1 f̂
s k⫽⫺⬁
⫺
2k
.
s
(3.3)
Proof. Since ␦(t ⫺ ns) is zero outside t ⫽ ns,
f (ns) ␦(t ⫺ ns) ⫽ f (t) ␦(t ⫺ ns),
we can rewrite (3.1) as multiplication with a Dirac comb:
fd (t) ⫽ f (t)
⫹⬁
␦(t ⫺ ns) ⫽ f (t) c(t).
(3.4)
1
f̂ ĉ().
2
(3.5)
⫹⬁
2 2k
␦ ⫺
.
s k⫽⫺⬁
s
(3.6)
n⫽⫺⬁
Computing the Fourier transform yields
f̂d () ⫽
The Poisson formula (2.4) proves that
ĉ() ⫽
Since f̂ ␦( ⫺ ) ⫽ f̂ ( ⫺ ), inserting (3.6) into (3.5) proves (3.3).
■
Theorem 3.1 proves that sampling f at interval s is equivalent to making its
Fourier transform 2/s periodic by summing all its translations f̂ ( ⫺ 2k/s).
The resulting sampling theorem was first proved by Whittaker [482] in 1935 in
3.1 Sampling Analog Signals
a book on interpolation theory. Shannon rediscovered it in 1949 for applications to
communication theory [429].
Theorem 3.2: Shannon, Whittaker. If the support of f̂ is included in [⫺/s, /s], then
f (t) ⫽
⫹⬁
f (ns) s (t ⫺ ns),
(3.7)
sin(t/s)
.
t/s
(3.8)
n⫽⫺⬁
with
s (t) ⫽
Proof. If n ⫽ 0, the support of f̂ ( ⫺ n/s) does not intersect the support of f̂ () because
f̂ () ⫽ 0 for || ⬎ /s; so (3.3) implies
f̂d () ⫽
f̂ ()
if || ⭐ .
s
s
(3.9)
ˆ s ⫽ s 1[⫺/s,/s] . Since the support of f̂ is in [⫺/s, /s],
The Fourier transform of s is
it results from (3.9) that f̂ () ⫽ ˆ s () f̂d (). The inverse Fourier transform of this
equality gives
⫹⬁
f (t) ⫽ s fd (t) ⫽ s f (ns) ␦(t ⫺ ns)
n⫽⫺⬁
⫽
⫹⬁
n⫽⫺⬁
f (ns) s (t ⫺ ns).
■
The sampling theorem supposes that the support of f̂ is included in [⫺/s, /s],
which guarantees that f has no brutal variations between consecutive samples;
thus, it can be recovered with a smooth interpolation. Section 3.1.3 shows that one
can impose other smoothness conditions to recover f from its samples. Figure 3.1
illustrates the different steps for a sampling and reconstruction from samples, in
both the time and Fourier domains.
3.1.2 Aliasing
The sampling interval s is often imposed by computation or storage constraints and
support of f̂ is generally not included in [⫺/s, /s]. In this case the interpolation
formula (3.7) does not recover f . We analyze the resulting error and a filtering
procedure to reduce it.
Theorem 3.1 proves that
⫹⬁
2k
1 f̂d () ⫽
f̂ ⫺
.
(3.10)
s k⫽⫺⬁
s
Suppose that support of f̂ goes beyond [⫺/s, /s]. In general, support of f̂ ( ⫺
2k/s) intersects [⫺/s, /s] for several k ⫽ 0, as shown in Figure 3.2. This folding
61
62
CHAPTER 3 Discrete Revolution
^
f (t)
f ()
2 s
t
s
(a)
^
fd ()
3
2 s
fd (t)
s
2s
t
3
s
s
(b)
^
s ()
2s
s (t)
1
0
s
23s
2s
t
s
3s
(c)
^
^
fd () s()
2s
s
fd s(t)
t
(d)
FIGURE 3.1
(a) Signal f and its Fourier transform f̂ . (b) A uniform sampling of f makes its Fourier
transform periodic. (c) Ideal low-pass filter. (d) The filtering of (b) with (c) recovers f .
3.1 Sampling Analog Signals
^
f (t)
f ()
s
2s
t
(a)
^
fd ()
3
2 s
fd (t)
s
2 s
t
3
s
s
(b)
^
s ()
s (t)
1
2 s
0
s
23s
2s
t
s
3s
(c)
^
fd s(t)
^
fd () s()
2s
s
t
(d)
FIGURE 3.2
(a) Signal f and its Fourier transform f̂ . (b) Aliasing produced by an overlapping of
f̂ ( ⫺ 2k/s) for different k, shown with dashed lines. (c) Ideal low-pass filter. (d) The filtering
of (b) with (c) creates a low-frequency signal that is different from f .
63
64
CHAPTER 3 Discrete Revolution
of high-frequency components over a low-frequency interval is called aliasing. In
the presence of aliasing, the interpolated signal
s fd (t) ⫽
⫹⬁
f (ns) s (t ⫺ ns)
n⫽⫺⬁
has a Fourier transform
ˆ s () ⫽ s f̂d () 1[⫺/s,/s] () ⫽ 1[⫺/s,/s] ()
f̂d ()
⫹⬁
k⫽⫺⬁
f̂
⫺
2k
,
s
(3.11)
which may be completely different from f̂ () over [⫺/s, /s]. The signal s fd
may not even be a good approximation of f , as shown by Figure 3.2.
EXAMPLE 3.1
Let us consider a high-frequency oscillation
ei0 t ⫹ e⫺i0 t
.
2
f̂ () ⫽ ␦( ⫺ 0 ) ⫹ ␦( ⫹ 0 ) .
f (t) ⫽ cos(0 t) ⫽
Its Fourier transform is
If 2/s ⬎ 0 ⬎ /s, then (3.11) yields
ˆ s ()
f̂d ()
⫹⬁ 2k 2k ⫹ ␦ ⫹ 0 ⫺
␦ ⫺ 0 ⫺
s
s
k⫽⫺⬁
2
2
⫹ 0 ⫹ ␦ ⫹
⫺ 0 ,
⫽ ␦ ⫺
s
s
2
fd s (t) ⫽ cos
⫺ 0 t .
s
⫽ 1[⫺/s,/s] ()
so
The aliasing reduces the high-frequency 0 to a lower frequency 2/s ⫺ 0 ∈ [⫺/s, /s].
The same frequency folding is observed in a film that samples a fast-moving object without
enough images per second. A wheel turning rapidly appears as though turning much more
slowly in the film.
Removal of Aliasing
To apply the sampling theorem, f is approximated by the closest signal f̃ ,the Fourier
transform of which has a support in [⫺/s, /s]. The Plancherel formula (2.26)
proves that
1
2
1
⫽
2
f ⫺ f̃ 2 ⫽
⫹⬁
⫺⬁
| f̂ () ⫺ f̃ ()|2 d
||⬎/s
| f̂ ()|2 d ⫹
1
2
||⭐/s
| f̂ () ⫺ f̃ ()|2 d.
3.1 Sampling Analog Signals
This distance is minimum when the second integral is zero and therefore
f̃ () ⫽ f̂ () 1[⫺/s,/s] () ⫽
1 ˆ
s () f̂ ().
s
(3.12)
It corresponds to f̃ ⫽ 1s f s .
The filtering of f by s avoids aliasing by removing any frequency larger than
/s. Since f̃ has a support in [⫺/s, /s], the sampling theorem proves that f̃ (t)
can be recovered from the samples f̃ (ns). An analog-to-digital converter is therefore
composed of a filter that limits the frequency band to [⫺/s, /s], followed by a
uniform sampling at interval s.
3.1.3 General Sampling and Linear Analog Conversions
The Shannon-Whittaker theorem is a particular example of linear discrete-to-analog
conversion, which does not apply to all digital acquisition devices. This section
describes general analog-to-discrete conversion and reverse discrete-to-analog conversion, with general linear filtering and uniform sampling. Analog signals are
approximated by linear projections on approximation spaces.
Sampling Theorems
We want to recover a stable approximation of f ∈ L 2 (R) from a filtering and uniform
¯ s (ns)}n∈Z , for some real filter ¯ s (t). These samples
sampling, which outputs { f
can be written as inner products in L 2 (R):
f s (ns) ⫽
⫹⬁
⫺⬁
¯ s (ns ⫺ t) dt ⫽ f (t), s (t ⫺ ns),
f (t)
(3.13)
¯ s (⫺t). Let Us be the approximation space generated by linear comwith s (t) ⫽
bination of the {s (t ⫺ ns)}n∈Z . The approximation f̃ ∈ Us , which minimizes the
maximum possible error f ⫺ f̃ , is the orthognal projection of f on Us (Exercice
3.5). The calculation of this orthogonal projection is stable if {s (t ⫺ ns)}n∈Z is a
Riesz basis of Us , as defined in Section 5.1.1.
Following Definition 5.1,a Riesz basis is a family of linearly independent functions
that yields an inner product satisfying an energy equivalence.There exists B ⭓ A ⬎ 0
such that for any f ∈ Us
A f 2 ⭐
⫹⬁
| f (t), s (t ⫺ ns)|2 ⭐ B f 2 .
(3.14)
n⫽⫺⬁
The basis is orthogonal if and only if A ⫽ B. The following generalized sampling
theorem computes the orthogonal projection on the approximation space Us [468].
¯ s (t) ⫽
Theorem 3.3: Linear sampling. Let {s (t ⫺ ns)}n∈Z be a Riesz basis of Us and
s (⫺t). There exists a biorthogonal basis {˜ s (t ⫺ ns)}n∈Z of Us such that
᭙f ∈ L 2 (R), PUs f (t) ⫽
⫹⬁
n⫽⫺⬁
f ¯ s (ns) ˜ s (t ⫺ ns).
(3.15)
65
66
CHAPTER 3 Discrete Revolution
˜ s,n (t)}n∈Z exists
Proof. For any Riesz basis, Section 5.1.2 proves that a biorthogonal basis {
that satisfies the biorthogonality relations
᭙(n, m) ∈ Z2 , s (t ⫺ ns), ˜ s,m (t ⫺ ms) ⫽ ␦[n ⫺ m].
(3.16)
˜ s,0 (t) ⫽ s (t ⫺ ns),
˜ s,0 (t ⫺ ms) ⫽ 0 and since the dual basis
Since s (t ⫺ (n ⫺ m)s),
˜ s,m (t) ⫽
˜ s,0 (t ⫺ ms). Section 5.1.2 proves in (5.20) that the
is unique, necessarily
orthogonal projection in Us can be written
⫹⬁
PUs f (t) ⫽
f (t), s (t ⫺ ns) ˜ s (t ⫺ ns)
n⫽⫺⬁
■
which proves (3.15).
The orthogonal projection (3.15) can be rewritten as an analog filtering of the
¯
discrete signal fd (t) ⫽ ⫹⬁
n⫽⫺⬁ f s (ns) ␦(t ⫺ ns):
˜ s (t).
PUs f (t) ⫽ fd
(3.17)
If f ∈ Us , then PUs f ⫽ f so it is exactly reconstructed by filtering the uniformly
˜
¯ s (ns)}n∈Z with the analog filter (t).
If f ∈
/ Us , then
sampled discrete signal { f
(3.17) recovers the best linear approximation of f in Us . Section 9.1 shows that the
linear approximation error f ⫺ PUs f essentially depends on the uniform regularity
of f . Given some prior information on f , optimizing the analog discretization filter
s amounts to optimizing the approximation space Us to minimize this error. The
following theorem characterizes filters s that generate a Riesz basis and computes
the dual filter.
Theorem 3.4. A filter s generates a Riesz basis {s (t ⫺ ns)}n∈Z of a space Us if and
only if there exists B ⭓ A ⬎ 0 such that
᭙ ∈ [0, 2/s], A ⭐
⫹⬁
1 ˆ
2k 2
|s ( ⫺
)| ⭐ B.
s k⫽⫺⬁
s
(3.18)
˜ s , which satisfies:
The biorthogonal basis {˜ s (t ⫺ ns)}n∈Z is defined by the dual filter
˜s () ⫽
⫹⬁
k⫽⫺⬁
ˆ ∗ ()
s
s
.
ˆ
|s ( ⫺ 2k/s)|2
(3.19)
Proof. Theorem 5.5 proves that {s (t ⫺ ns)}n∈Z is a Riesz basis of Us with Riesz bounds
B ⭓ A ⬎ 0 if and only if it is linearly independent and
᭙a ∈ 2 (Z), Aa2 ⭐ a[ns] s (t ⫺ ns)2 ⭐ Ba2 ,
(3.20)
n∈Z
with a2 ⫽ n∈Z |a[ns]|2 .
Let us first write these conditions in the Fourier domain. The Fourier transform of
f (t) ⫽ ⫹⬁
n⫽⫺⬁ a[ns] s (t ⫺ ns) is
f̂ () ⫽
⫹⬁
n⫽⫺⬁
ˆ s (),
a[ns] e⫺ins ˆ s () ⫽ â()
(3.21)
3.1 Sampling Analog Signals
⫺ins . Let us relate the norm of f
where â() is the Fourier series â() ⫽ ⫹⬁
n⫽⫺⬁ a[ns] e
and â. Since â() is 2/s periodic, inserting (3.21) in the Plancherel formula (2.26) gives
f 2 ⫽
1
2
⫽
1
2
⫽
1
2
⫹⬁
⫺⬁
| f̂ ()|2 d
⫹⬁
2/s
0
|â( ⫹ 2k/s)|2 |ˆ s ( ⫹ 2k/s)|2 d
(3.22)
k⫽⫺⬁
2/s
|â()|2
0
⫹⬁
|ˆ s ( ⫹ 2k/s)|2 d.
k⫽⫺⬁
Section 3.2.2 on Fourier series proves that
a2 ⫽
⫹⬁
|a[ns]|2 ⫽
n⫽⫺⬁
s
2
2/s
|â()|2 d.
(3.23)
0
As a consequence of (3.22) and (3.23),the Riesz bound inequalities (3.20) are equivalent to
᭙â ∈ L 2 [0, 2/s],
1
2
2/s
|â()|2
0
⫹⬁
ˆ s ( ⫹ 2k/s)|2 d ⭐
|
k⫽⫺⬁
Bs
2
2/s
|â()|2 d
0
(3.24)
and
᭙â ∈ L 2 [0, 2/s],
1
2
2/s
0
|â()|2
⫹⬁
ˆ s ( ⫹ 2k/s)|2 d ⭓
|
k⫽⫺⬁
As
2
2/s
|â()|2 d.
0
(3.25)
ˆ s satisfies (3.18), then clearly (3.24) and (3.25) are valid, which proves (3.22).
If
Conversely, if {s (ns ⫺ t)}n∈Z is a Riesz basis. Suppose that either the upper or the
lower bound of (3.18) is not satisfied for in a set of nonzero measures. Let â be the
indicator function of this set.Then either (3.24) or (3.25) is not valid for this â.This implies
that the Riesz bounds (3.20) are not valid for a and therefore that it is not a Riesz basis,
which contradicts our hypothesis. So (3.18) is indeed valid for almost all ∈ [0, 2/s].
˜ s ∈ Us such that
To compute the biorthogonal basis, we are looking for
˜
˜
{s (t ⫺ ns)}n∈Z satisfies the biorthogonal relations (3.16). Since s ∈ Us we saw in
ˆ s (), where â() is 2/s
(3.21) that its Fourier transform can be written ˜s () ⫽ â()
¯ s ˜ s (t). Its Fourier transform is
periodic. Let us define g(t) ⫽
ˆ ∗ () ˆ˜s () ⫽ â()|ˆ s ()|2 .
ĝ() ⫽
s
The biorthogonal relations (3.16) are satisfied if and only if g(ns) ⫽ 0 if n ⫽ 0 and g(0) ⫽ 1.
It results that gd (t) ⫽ ⫹⬁
n⫽⫺⬁ g(ns) ␦(t ⫺ ns) ⫽ ␦(t). Theorem 3.1 derives in (3.3) that
⫹⬁
ĝd () ⫽
⫹⬁
1 â() ˆ
ĝ ( ⫺ 2k/s) ⫽
|s ( ⫺ 2k/s)|2 ⫽ 1.
s k⫽⫺⬁
s k⫽⫺⬁
67
68
CHAPTER 3 Discrete Revolution
It results that
⫺1
⫹⬁
â() ⫽ s
|ˆ s ( ⫺ 2k/s)|2
,
k⫽⫺⬁
■
which proves (3.19).
This theorem gives a necessary and sufficient condition on the low-pass filter
¯ s (t) ⫽ s (⫺t) to recover a stable signal approximation from uniform sampling
at interval s. For various sampling interval s, the low-pass filter can be obtained by
ˆ s () ⫽ s1/2 (s).The
ˆ
dilating a single filter s (t) ⫽ s⫺1/2 (t/s) and thus
necessary
and sufficient Riesz basis condition (3.18) is then satisfied if and only if
᭙ ∈ [⫺, ],
A⭐
⫹⬁
ˆ ⫺ 2k)|2 ⭐ B.
|(
(3.26)
k⫽⫺⬁
˜
and therefore
It results from (3.19) that the dual filter satisfies ˜ s () ⫽ s1/2 (s)
˜s (t) ⫽ s⫺1/2 (t/s).
˜
When A ⫽ B ⫽ 1, the Riesz basis is an orthonormal basis, which
proves Corollary 3.1.
Corollary 3.1. The family {s (t ⫺ ns)}n∈Z is an orthonormal basis of the space Us it
generates, with s (t) ⫽ s⫺1/2 (t/s), if and only if
᭙ ∈ [0, 2],
⫹⬁
ˆ ⫺ 2k)|2 ⫽ 1,
|(
(3.27)
k⫽⫺⬁
and the dual filter is ˜ s ⫽ s .
Shannon-Whittaker Revisited
Shannon-Whittaker, Theorem 3.2, is defined with a sine-cardinal perfect low-pass
filter s , which we renormalize here to have a unit norm. The following theorem
proves that it samples functions on an orthonormal basis.
Theorem 3.5. If s (t) ⫽ s1/2 sin(s⫺1 t)/(t) then {s (t ⫺ ns)}n∈Z is an orthonormal
basis of the space Us of functions whose Fourier transforms have a support included in
[⫺/s, /s]. If f ∈ Us , then
f (nT ) ⫽ s⫺1/2 f s (ns).
(3.28)
Proof. The filter satisfies s (t) ⫽ s⫺1/2 (t/s) with (t) ⫽ sin(t)/(t).The Fourier transform
ˆ
()
⫽ 1[⫺,] () satisfies the condition (3.27) of Corollary 3.1,which proves that {s (t ⫺
ns)}n∈Z is an orthonormal basis of a space Us .
Any f (t) ⫽ ⫹⬁
n⫽⫺⬁ a[ns] s (t ⫺ ns) ∈ Us has a Fourier transform that can be written
f̂ () ⫽
⫹⬁
a[ns] e⫺ins ˆ s () ⫽ â() s1/2 1[⫺/s,/s] ,
(3.29)
n⫽⫺⬁
which implies that f ∈ Us if and only if f has a Fourier transform supported in [⫺/s, /s].
3.1 Sampling Analog Signals
If f ∈ Us , then decomposing it on the orthonormal basis {s (t ⫺ ns)}n∈Z gives
f (t) ⫽ PUs f (t) ⫽
f (u), s (u ⫺ ns) s (t ⫺ ns).
n∈Z
Since s ( ps) ⫽ s⫺1/2 ␦[ ps] and s (⫺t) ⫽ s (t), the result is that
f (ns) ⫽ s⫺1/2 f (u), s (u ⫺ ns) ⫽ s⫺1/2 f s (ns).
■
This theorem proves that in the particular case of the Shannon-Whittaker sampling theorem, if f ∈ Us then the sampled low-pass filtered values f s (ns) are
proportional to the signal samples f (ns). This comes from the fact that the sinecardinal (t) ⫽ sin(t/s)/(t/s) satisfies the interpolation property (ns) ⫽ ␦[ns].
A generalization of such multiscale interpolations is studied in Section 7.6.
Shannon-Whittaker sampling approximates signals by restricting their Fourier
transform to a low-frequency interval. It is particularly effective for smooth signals
with a Fourier transform that has energy concentrated at low frequencies. It is
also adapted for sound recordings, which are sufficiently approximated by lowerfrequency harmonics.
For discontinuous signals, such as images, a low-frequency restriction produces
Gibbs oscillations,as described in Section 2.3.3.The image visual quality is degraded
by these oscillations, which have a total variation (2.65) that is infinite. A piecewise
constant approximation has the advantage of creating no such spurious oscillations.
Block Sampler
A block sampler approximates signals with piecewise constant functions. The
approximation space Us is the set of all functions that are constant on intervals
[ns, (n ⫹ 1)s], for any n ∈ Z. Let s (t) ⫽ s⫺1/2 1[0,s] (t). The family {s (t ⫺ ns]}n∈Z is
an orthonormal basis of Us (Exercise 3.1). If f ∈
/ Us , then its orthogonal projection
on Us is calculated with a partial decomposition in the block orthonormal basis
of Us
PUs f (t) ⫽
⫹⬁
f (u), s (u ⫺ ns) s (t ⫺ ns),
(3.30)
n⫽⫺⬁
and each coefficient is proportional to the signal average on [ns, (n ⫹ 1)s]:
f (u), s (u ⫺ ns) ⫽ f s (ns) ⫽ s⫺1/2
(n⫹1)s
f (u) du.
ns
This block analog-to-digital conversion is particularly simple to implement in analog
electronics, where integration is performed by a capacity.
In domains where f is a regular function, a piecewise constant approximation
PUs f is not very precise and can be significantly improved. More precise approximations are obtained with approximation spaces Us of higher-order polynomial
splines. The resulting approximations can introduce small Gibbs oscillations, but
these oscillations have a finite total variation.
69
70
CHAPTER 3 Discrete Revolution
Spline Sampling
Block samplers are generalized by spline sampling with a space Us of spline functions
that are m ⫺ 1 times continuously differentiable and equal to a polynomial of degree
m on any interval [ns, (n ⫹ 1)s),for n ∈ Z.When m ⫽ 1,functions in Us are piecewise
linear and continuous.
A Riesz basis of polynomial splines is constructed with box splines. A box spline
of degree m is computed by convolving the box window 1[0,1] with itself m ⫹ 1
times and centering it at 0 or 1/2. Its Fourier transform is
sin(/2) m⫹1
⫺i
ˆ
()
⫽
exp
.
(3.31)
/2
2
If m is even, then ⫽ 1 and have a support centered at t ⫽ 1/2. If m is odd, then
ˆ
⫽ 0 and (t) are symmetric about t ⫽ 0. One can verify that ()
satisfies the
sampling condition (3.26) using a closed-form expression (7.20) of the resulting
series. This means that for any s ⬎ 0, a box splines family {s (t ⫺ ns)}n∈Z defines a
Riesz basis of Us , and thus is a stable sampling.
3.2 DISCRETE TIME-INVARIANT FILTERS
3.2.1 Impulse Response and Transfer Function
Classic discrete signal-processing algorithms most generally are based on timeinvariant linear operators [51, 55]. The time invariance is limited to translations
on the sampling grid. To simplify notation, the sampling interval is normalized s ⫽ 1,
and we denote f [n] the sample values. A linear discrete operator L is time-invariant
if an input f [n], delayed by p ∈ Z, fp [n] ⫽ f [n ⫺ p], produces an output also delayed
by p:
L fp [n] ⫽ L f [n ⫺ p].
Impulse Response
We denote by ␦[n] the discrete Dirac
␦[n] ⫽
1
0
if n ⫽ 0
.
if n ⫽ 0
(3.32)
Any signal f [n] can be decomposed as a sum of shifted Diracs:
⫹⬁
f [n] ⫽
f [ p] ␦[n ⫺ p].
p⫽⫺⬁
Let L␦[n] ⫽ h[n] be the discrete impulse response. Linearity and time invariance
implies that
L f [n] ⫽
⫹⬁
p⫽⫺⬁
f [ p] h[n ⫺ p] ⫽ f h[n].
(3.33)
3.2 Discrete Time-Invariant Filters
A discrete linear time-invariant operator is thus computed with a discrete convolution. If h[n] has a finite support, the sum (3.33) is calculated with a finite number
of operations. These are called finite impulse response (FIR) filters. Convolutions
with infinite impulse response filters may also be calculated with a finite number
of operations if they can be rewritten with a recursive equation (3.45).
Causality and Stability
A discrete filter L is causal if L f [ p] depends only on the values of f [n] for n ⭐ p.
The convolution formula (3.33) implies that h[n] ⫽ 0 if n ⬍ 0.
The filter is stable if any bounded input signal f [n] produces a bounded output
signal L f [n]. Since
|L f [n]| ⭐ sup | f [n]|
n∈Z
⫹⬁
|h[k]|,
k⫽⫺⬁
1
it is sufficient that ⫹⬁
n⫽⫺⬁ |h[n]| ⬍ ⫹⬁, which means that h ∈ (Z). One can verify
that this sufficient condition is also necessary. Thus, the filter is stable if and only if
h ∈ 1 (Z) (Exercise 3.6).
Transfer Function
The Fourier transform plays a fundamental role in analyzing discrete time-invariant
operators because discrete sinusoidal waves e [n] ⫽ ein are eigenvectors:
Le [n] ⫽
⫹⬁
ei(n⫺p) h[ p] ⫽ ein
p⫽⫺⬁
⫹⬁
h[ p] e⫺ip .
(3.34)
p⫽⫺⬁
The eigenvalue is a Fourier series
ĥ() ⫽
⫹⬁
h[ p] e⫺ip .
(3.35)
p⫽⫺⬁
It is the filter transfer function.
EXAMPLE 3.2
The uniform discrete average
L f [n] ⫽
n⫹N
1
f [ p]
2N ⫹ 1 p⫽n⫺N
is a time-invariant discrete filter that has an impulse response of h ⫽ (2N ⫹ 1)⫺1 1[⫺N ,N ] . Its
transfer function is
ĥ() ⫽
⫹N
1
1 sin(N ⫹ 1/2)
e⫺in ⫽
.
2N ⫹ 1 n⫽⫺N
2N ⫹ 1
sin /2
(3.36)
71
72
CHAPTER 3 Discrete Revolution
3.2.2 Fourier Series
The properties of Fourier series are essentially the same as the properties of the
Fourier transform since Fourier series are particular instances of Fourier transforms
⫹⬁
⫺in .
for Dirac sums. If f (t) ⫽ ⫹⬁
n⫽⫺⬁ f [n] ␦(t ⫺ n), then f̂ () ⫽
n⫽⫺⬁ f [n] e
⫺in
has period 2,so Fourier series have period 2.An important
For any n ∈ Z,e
issue to understand is whether all functions with period 2 can be written as Fourier
series. Such functions are characterized by their restriction to [⫺, ].We therefore
consider functions â ∈ L 2 [ ⫺ , ] that are square integrable over [⫺, ].The space
L 2 [ ⫺ , ] is a Hilbert space with the inner product
â, b̂ ⫽
1
2
⫺
â() b̂∗ () d
(3.37)
and the resulting norm
â2 ⫽
1
2
⫺
|â()|2 d.
Theorem 3.6 proves that any function in L 2 [ ⫺ , ] can be written as a Fourier
series.
Theorem 3.6. The family of functions {e⫺ik }k∈Z is an orthonormal basis of L 2 [ ⫺ , ].
Proof. The orthogonality with respect to the inner product (3.37) is established with a direct
integration.To prove that {exp(⫺ik)}k∈Z is a basis,we must show that linear expansions
of these vectors are dense in L 2 [ ⫺ , ].
ˆ with a support included
We first prove that any continuously differentiable function
in [⫺, ] satisfies
⫹⬁
ˆ
()
⫽
ˆ
(),
exp(⫺ik) exp(⫺ik),
(3.38)
k⫽⫺⬁
with a pointwise convergence for any ∈ [⫺, ]. Let us compute the partial sum
SN () ⫽
N
ˆ
(),
exp(⫺ik) exp(⫺ik)
k⫽⫺N
N
⫽
1
2
k⫽⫺N
⫽
1
2
⫺
⫺
ˆ
()
ˆ
()
exp(ik) d exp(⫺ik)
N
exp[ik( ⫺ )] d.
k⫽⫺N
The Poisson formula (2.37) proves the distribution equality
lim
N →⫹⬁
N
k⫽⫺N
exp[ik( ⫺ )] ⫽ 2
⫹⬁
k⫽⫺⬁
␦( ⫺ ⫺ 2k),
3.2 Discrete Time-Invariant Filters
ˆ is in [⫺, ], we get
and since the support of
ˆ
lim SN () ⫽ ().
N →⫹⬁
ˆ is continuously differentiable, following the steps (2.38–2.40) in the proof of the
Since
ˆ
Poisson formula shows that SN () converges uniformly to ()
on [⫺, ].
To prove that linear expansions of sinusoidal waves {exp(⫺ik)}k∈Z are dense in
L 2 [ ⫺ , ], let us verify that the distance between â ∈ L 2 [ ⫺ , ] and such a linear
expansion is less than , for any ⬎ 0. Continuously differentiable functions with a
ˆ such that
support included in [⫺, ] are dense in L 2 [ ⫺ , ]; thus, there exists
ˆ ⭐ /2. The uniform pointwise convergence proves that there exists N for which
â ⫺
ˆ
|SN () ⫺ ()|
⭐ ,
2
∈[⫺,]
sup
which implies that
ˆ 2⫽
SN ⫺
1
2
⫺
2
ˆ
|SN () ⫺ ()|
d ⭐
2
.
4
It follows that â is approximated by the Fourier series SN with an error
ˆ ⫹
ˆ ⫺ SN ⭐ .
â ⫺ SN ⭐ â ⫺
■
Theorem 3.6 proves that if f ∈ 2 (Z), the Fourier series
f̂ () ⫽
⫹⬁
f [n] e⫺in
(3.39)
n⫽⫺⬁
can be interpreted as the decomposition of f̂ in an orthonormal basis of L 2 [ ⫺ , ].
The Fourier series coefficients can thus be written as inner products in L 2 [ ⫺ , ]:
f [n] ⫽ f̂ (), e⫺in ⫽
1
2
⫺
f̂ () ein d.
(3.40)
The energy conservation of orthonormal bases (A.10) yields a Plancherel identity:
⫹⬁
n⫽⫺⬁
| f [n]|2 ⫽ f̂ 2 ⫽
1
2
⫺
| f̂ ()|2 d.
Pointwise Convergence
The equality (3.39) is meant in the sense of mean-square convergence
N
⫺ik lim f̂ () ⫺
f [k] e
⫽ 0.
N →⫹⬁ k⫽⫺N
It does not imply a pointwise convergence at all ∈ R.
(3.41)
73
74
CHAPTER 3 Discrete Revolution
In 1873, Dubois-Reymond constructed a periodic function f̂ () that is continuous and has a Fourier series that diverges at some point. On the other hand, if
f̂ () is continuously differentiable, then the proof of Theorem 3.6 shows that its
Fourier series converges uniformly to f̂ () on [⫺, ]. It was only in 1966 that Carleson [149] was able to prove that if f̂ ∈ L 2 [ ⫺ , ] then its Fourier series converges
almost everywhere. The proof is very technical.
Convolutions
Since {e⫺ik }k∈Z are eigenvectors of discrete convolution operators, we also have
a discrete convolution theorem.
Theorem 3.7. If f ∈ 1 (Z) and h ∈ 1 (Z), then g ⫽ f h ∈ 1 (Z) and
ĝ() ⫽ f̂ () ĥ().
(3.42)
The proof is identical to the proof of the convolution,Theorem 2.2, if we replace
integrals by discrete sums. The reconstruction formula (3.40) shows that a filtered
signal can be written
f h[n] ⫽
1
2
⫺
ĥ()f̂ () ein d.
(3.43)
The transfer function ĥ() amplifies or attenuates the frequency components f̂ ()
of f [n].
EXAMPLE 3.3
An ideal discrete low-pass filter has a 2 periodic transfer function that is defined by
ĥ() ⫽ 1[⫺,] (), for ∈ [⫺, ] and 0 ⬍ ⬍ . Its impulse response is computed with (3.40):
h[n] ⫽
1
2
⫺
ein d ⫽
sin n
.
n
(3.44)
It is a uniform sampling of the ideal analog low-pass filter (2.29).
EXAMPLE 3.4
A recursive filter computes g ⫽ L f , which is a solution of a recursive equation
K
k⫽0
ak f [n ⫺ k] ⫽
M
bk g[n ⫺ k],
(3.45)
k⫽0
with b0 ⫽ 0. If g[n] ⫽ 0 and f [n] ⫽ 0 for n ⬍ 0, then g has a linear and time-invariant
dependency on f and thus can be written g ⫽ f h. The transfer function is obtained by
computing the Fourier transform of (3.45). The Fourier transform of fk [n] ⫽ f [n ⫺ k] is
3.3 Finite Signals
f̂k () ⫽ f̂ () e⫺ik , so
ĥ() ⫽
ĝ()
f̂ ()
⫽
K
⫺ik
k⫽0 ak e
.
M
⫺ik
k⫽0 bk e
It is a rational function of e⫺i . If bk ⫽ 0 for some k ⬎ 0, then one can verify that the impulse
response h has an infinite support. The stability of such filters is studied in Exercise 3.18.
A direct calculation of the convolution sum g[n] ⫽ f h[n] would require an infinite number of
operations, whereas (3.45) computes g[n] with K ⫹ M additions and multiplications from its
past values.
Window Multiplication
An infinite impulse response filter h, such as the ideal low-pass filter (3.44), may be
approximated by a finite response filter h̃ by multiplying h with a window g of finite
support:
h̃[n] ⫽ g[n] h[n].
One can verify (Exercise 3.6) that a multiplication in time is equivalent to a
convolution in the frequency domain:
h̃() ⫽
1
2
⫺
ĥ() ĝ( ⫺ ) d ⫽
1
ĥ ĝ().
2
(3.46)
Clearly h̃ ⫽ ĥ only if ĝ ⫽ 2␦, which would imply that g has an infinite support
and g[n] ⫽ 1. The approximation h̃ is close to ĥ only if ĝ approximates a Dirac,
which means that all its energy is concentrated at low frequencies. In time,g should
therefore have smooth variations.
The rectangular window g ⫽ 1[⫺N ,N ] has a Fourier transform ĝ computed in
(3.36). It has important side lobes far away from ⫽ 0. The resulting h̃ is a poor
approximation of ĥ. The Hanning window
n g[n] ⫽ cos2
1[⫺N ,N ] [n]
2N
is smoother and thus has a Fourier transform better concentrated at low frequencies.
The spectral properties of other windows are studied in Section 4.2.2.
3.3 FINITE SIGNALS
Up to now we have considered discrete signals f [n] defined for all n ∈ Z. In practice,
f [n] is known over a finite domain, say 0 ⭐ n ⬍ N . Convolutions therefore must be
modified to take into account the border effects at n ⫽ 0 and n ⫽ N ⫺ 1. The Fourier
transform also must be redefined over finite sequences for numerical computations.
The fast Fourier transform algorithm is explained as well as its application to fast
convolutions.
75
76
CHAPTER 3 Discrete Revolution
3.3.1 Circular Convolutions
Let f̃ and h̃ be signals of N samples. To compute the convolution product
f̃ h̃[n] ⫽
⫹⬁
f̃ [ p] h̃[n ⫺ p]
for 0 ⭐ n ⬍ N ,
p⫽⫺⬁
we must know f̃ [n] and h̃[n] beyond 0 ⭐ n ⬍ N . One approach is to extend f̃ and
h̃ with a periodization over N samples, and to define
f [n] ⫽ f̃ [n mod N ], h[n] ⫽ h̃[n mod N ].
The circular convolution of two such signals, both with period N , is defined as a
sum over their period:
f h[n] ⫽
N
⫺1
f [ p] h[n ⫺ p] ⫽
p⫽0
N
⫺1
f [n ⫺ p] h[ p].
p⫽0
It is also a signal of period N .
The eigenvectors of a circular convolution operator
L f [n] ⫽ f h[n]
are the discrete complex exponentials ek [n] ⫽ exp (i2kn/N ). Indeed,
N ⫺1
i2kn ⫺i2kp
Lek [n] ⫽ exp
h[ p] exp
,
N
N
p⫽0
and the eigenvalue is the discrete Fourier transform of h:
N
⫺1
⫺i2kp
ĥ[k] ⫽
h[ p] exp
.
N
p⫽0
3.3.2 Discrete Fourier Transform
The space of signals of period N is an Euclidean space of dimension N and the inner
product of two such signals f and g is
f , g ⫽
N
⫺1
f [n] g ∗ [n].
(3.47)
n⫽0
Theorem 3.8 proves that any signal with period N can be decomposed as a sum of
discrete sinusoidal waves.
Theorem 3.8. The family
ek [n] ⫽ exp
i2kn
N
0⭐k⬍N
is an orthogonal basis of the space of signals of period N .
3.3 Finite Signals
Since the space is of dimension N , any orthogonal family of N vectors is an
orthogonal basis. To prove this theorem, it is sufficient to verify that {ek }0⭐k⬍N is
orthogonal with respect to the inner product (3.47) (Exercise 3.8). Any signal f of
period N can be decomposed on this basis:
f⫽
N
⫺1
f , ek ek .
ek 2
k⫽0
By definition, the discrete Fourier transform (DFT) of f is
N
⫺1
⫺i2kn
f̂ [k] ⫽ f , ek ⫽
f [n] exp
.
N
n⫽0
(3.48)
(3.49)
Since ek 2 ⫽ N , (3.48) gives an inverse discrete Fourier formula:
N ⫺1
i2kn
1 f̂ [k] exp
f [n] ⫽
.
N k⫽0
N
(3.50)
The orthogonality of the basis also implies a Plancherel formula:
f 2 ⫽
N
⫺1
n⫽0
N ⫺1
|f [n]|2 ⫽
1 | f̂ [k]|2 .
N k⫽0
(3.51)
The discrete Fourier tranform (DFT) of a signal f of period N is computed from
its values for 0 ⭐ n ⬍ N . Then why is it important to consider it a periodic signal
with period N rather than a finite signal of N samples? The answer lies in the
interpretation of the Fourier coefficients. The discrete Fourier sum (3.50) defines a
signal of period N for which the samples f [0] and f [N ⫺ 1] are side by side. If f [0]
and f [N ⫺ 1] are very different, this produces a brutal transition in the periodic
signal, creating relatively high-amplitude Fourier coefficients at high frequencies.
For example, Figure 3.3 shows that the “smooth” ramp f [n] ⫽ n for 0 ⭐ n ⬍ N has
sharp transitions at n ⫽ 0 and n ⫽ N once made periodic.
Circular Convolutions
Since {exp (i2kn/N )}0⭐k⬍N are eigenvectors of circular convolutions, we derive
a convolution theorem.
21
0 1
N21 N
FIGURE 3.3
Signal f [n] ⫽ n for 0 ⭐ n ⬍ N made periodic over N samples.
77
78
CHAPTER 3 Discrete Revolution
Theorem 3.9. If f and h have period N , then the discrete Fourier transform of g ⫽ f h is
ĝ[k] ⫽ f̂ [k] ĥ[k].
(3.52)
The proof is similar to the proof of the two previous convolution theorems—
2.2 and 3.7. This theorem shows that a circular convolution can be interpreted
as a discrete frequency filtering. It also opens the door to fast computations of
convolutions using the fast Fourier transform.
3.3.3 Fast Fourier Transform
For a signal f of N points, a direct calculation of the N discrete Fourier sums
N
⫺1
⫺i2kn
f̂ [k] ⫽
f [n] exp
, for 0 ⭐ k ⬍ N ,
(3.53)
N
n⫽0
requires N 2 complex multiplications and additions. The FFT algorithm reduces the
numerical complexity to O(N log2 N ) by reorganizing the calculations.
When the frequency index is even, we group the terms n and n ⫹ N /2:
f̂ [2k] ⫽
N
/2⫺1
n⫽0
⫺i2kn
f [n] ⫹ f [n ⫹ N /2] exp
.
N /2
(3.54)
When the frequency index is odd, the same grouping becomes
⫺i2n ⫺i2kn
f̂ [2k ⫹ 1] ⫽
exp
f [n] ⫺ f [n ⫹ N /2] exp
.
N
N /2
n⫽0
N
/2⫺1
(3.55)
Equation (3.54) proves that even frequencies are obtained by calculating the DFT
of the N /2 periodic signal
fe [n] ⫽ f [n] ⫹ f [n ⫹ N /2].
Odd frequencies are derived from (3.55) by computing the Fourier transform of the
N /2 periodic signal:
⫺i2n fo [n] ⫽ exp
f [n] ⫺ f [n ⫹ N /2] .
N
A DFT of size N may thus be calculated with two discrete Fourier transforms of size
N /2 plus O(N ) operations.
The inverse FFT of f̂ is derived from the forward fast Fourier transform of its
complex conjugate f̂ ∗ by observing that
N ⫺1
1 ∗
⫺i2kn
f [n] ⫽
.
f̂ [k] exp
N k⫽0
N
∗
(3.56)
3.3 Finite Signals
Complexity
Let C(N ) be the number of elementary operations needed to compute a DFT with
the FFT. Since f is complex, the calculation of fe and fo requires N complex additions and N /2 complex multiplications. Let KN be the corresponding number of
elementary operations. We have
C(N ) ⫽ 2 C(N /2) ⫹ K N ,
(3.57)
since the Fourier transform of a single point is equal to itself, C(1) ⫽ 0. With the
)
change of variable l ⫽ log2 N and the change of function T (l) ⫽ C(N
N , from (3.57)
we derive
T (l) ⫽ T (l ⫺ 1) ⫹ K .
Since T (0) ⫽ 0, we get T (l) ⫽ K l and therefore
C(N ) ⫽ K N log2 (N ).
Several variations of this fast algorithm exist [49, 237].The goal is to minimize the
constant K . The most efficient fast DFT to this date is the split-radix FFT algorithm,
which is slightly more complicated than the procedure just described; however; it
requires only N log2 N real multiplications and 3N log2 N additions. When the input
signal f is real, there are half as many parameters to compute, since f̂ [⫺k] ⫽ f̂ ∗ [k].
The number of multiplications and additions is thus reduced by 2.
3.3.4 Fast Convolutions
The low computational complexity of a FFT makes it efficient to compute finite
discrete convolutions by using the circular convolution,Theorem 3.9. Let f and h
be two signals with samples that are nonzero only for 0 ⭐ n ⬍ M. The causal signal
g[n] ⫽ f h[n] ⫽
⫹⬁
f [k] h[n ⫺ k]
(3.58)
k⫽⫺⬁
is nonzero only for 0 ⭐ n ⬍ 2M. If h and f have M nonzero samples, calculating this
convolution product with the sum (3.58) requires M(M ⫹ 1) multiplications and
additions. When M ⭓ 32, the number of computations is reduced by using the FFT
[11, 49].
To use the fast Fourier transform with the circular convolution,Theorem 3.9, the
noncircular convolution (3.58) is written as a circular convolution. We define two
signals of period 2M:
f [n] if 0 ⭐ n ⬍ M
a[n] ⫽
(3.59)
0
if M ⭐ n ⬍ 2M
h[n] if 0 ⭐ n ⬍ M
b[n] ⫽
(3.60)
0
if M ⭐ n ⬍ 2M
79
80
CHAPTER 3 Discrete Revolution
By letting c ⫽ a b, one can verify that c[n] ⫽ g[n] for 0 ⭐ n ⬍ 2M. The 2M nonzero
coefficients of g are thus obtained by computing â and b̂ from a and b and then
calculating the inverse DFT of ĉ ⫽ â b̂.
With the fast Fourier transform algorithm, this requires a total of O(M log2 M)
additions and multiplications instead of M(M ⫹ 1). A single FFT or inverse FFT of
a real signal of size N is calculated with 2⫺1 N log2 N multiplications, using a splitradix algorithm. The FFT convolution is thus performed with a total of 3M log2 M ⫹
11M real multiplications. For M ⭓ 32, the FFT algorithm is faster than the direct
convolution approach. For M ⭐ 16, it is faster to use a direct convolution sum.
Fast Overlap–Add Convolutions
The convolution of a signal f of L nonzero samples with a smaller causal signal h
of M samples is calculated with an overlap–add procedure that is faster than the
previous algorithm.The signal f is decomposed with a sum of L/M blocks fr having
M nonzero samples:
f [n] ⫽
L/M⫺1
fr [n ⫺ rM], with
fr [n] ⫽ f [n ⫹ rM] 1[0,M⫺1] [n].
(3.61)
r⫽0
For each 0 ⭐ r ⬍ L/M, the 2M nonzero samples of gr ⫽ fr h are computed
with the FFT-based convolution algorithm, which requires O(M log2 M) operations.
These L/M convolutions are thus obtained with O(L log2 M) operations. The block
decomposition (3.61) implies that
f h[n] ⫽
L/M⫺1
gr [n ⫺ rM].
(3.62)
r⫽0
The addition of these L/M translated signals of size 2M is done with 2L additions.
The overall convolution is thus performed with O(L log2 M) operations.
3.4 DISCRETE IMAGE PROCESSING
Two-dimensional signal processing poses many specific geometric and topological
problems that do not exist in one dimension [21, 33]. For example,a simple concept,
such as causality,is not well defined in two dimensions.We can avoid the complexity
introduced by the second dimension by extending one-dimensional algorithms with
a separable approach. This not only simplifies the mathematics but also leads to fast
numerical algorithms along the rows and columns of images. Section A.5 in the
Appendix reviews the properties of tensor products for separable calculations.
3.4.1 Two-Dimensional Sampling Theorems
The light intensity measured by a camera is generally sampled over a rectangular
array of picture elements, called pixels. One-dimensional sampling theorems are
3.4 Discrete Image Processing
extended to this two-dimensional sampling array. Other two-dimensional sampling
grids such as hexagonal, are also possible, but nonrectangular sampling arrays are
not used often.
Let s1 and s2 be the sampling intervals along the x1 and x2 axes of an infinite
rectangular sampling grid. The following renormalizes the axes so that s1 ⫽ s2 ⫽ s.
A discrete image obtained by sampling f (x) with x ⫽ (x1 , x2 ) can be represented as
a sum of Diracs located at the grid points:
fd (x) ⫽
f (sn) ␦(x ⫺ ns).
n∈Z2
The two-dimensional Fourier transform of ␦(x ⫺ sn) is e⫺isn· with ⫽ (1 , 2 )
and n · ⫽ n1 1 ⫹ n2 2 . Thus, the Fourier transform of fd is a two-dimensional
Fourier series:
f̂d () ⫽
f (sn) e⫺isn· .
(3.63)
n∈Z2
It is 2/s periodic along 1 and along 2 . An extension of Theorem 3.1 relates f̂d
to the two-dimensional Fourier transform f̂ of f .
Theorem 3.10. The Fourier transform of the discrete image fd (x) is
1 f̂ ( ⫺ 2k/s), with k ⫽ (k1 , k2 ).
f̂d () ⫽ 2
s
2
(3.64)
k∈Z
We derive the following two-dimensional sampling theorem, which is analogous
to Theorem 3.2.
Theorem 3.11. If f̂ has a support included in [⫺/s, /s]2 , then
f (ns) s (x ⫺ ns),
f (x) ⫽ s
(3.65)
n∈Z2
where
s (x1 , x2 ) ⫽
1 sin(x1 /s) sin(x2 /s)
.
s
x1 /s
x2 /s
(3.66)
If the support of f̂ is not included in the low-frequency rectangle [⫺/s, /s]2 ,
the interpolation formula (3.65) introduces aliasing errors. Such aliasing is eliminated by prefiltering f with the ideal low-pass separable filter s (x) having a Fourier
transform equal to 1 on [⫺/s, /s]2 .
General Sampling Theorems
As explained in Section 3.1.3,the Shannon-Whittaker sampling theorem is a particular case of more general linear sampling theorems with low-pass filters.The following
theorem is a two-dimensional extension of Theorems 3.3 and 3.4; it characterizes
these filters to obtain a stable reconstruction.
81
82
CHAPTER 3 Discrete Revolution
Theorem 3.12. If there exists B ⭓ A ⬎ 0 such that the Fourier transform of s ∈ L 2 (R2 )
satisfies
᭙ ∈ [0, 2/s]2 A ⭐ ĥ() ⫽
|ˆ s ( ⫺ 2k/s)|2 ⭐ B,
k∈Z2
then {s (x ⫺ ns)}n∈Z2 is a Riesz basis of a space Us . The Fourier transform of the dual
˜ s is ˜ s () ⫽
ˆ ∗ ()/ h(), and the orthogonal projection of f ∈ L 2 (R2 ) in Us is
filter
s
f ¯ s (ns) ˜ s (x ⫺ ns), with ¯ s (x) ⫽ s (⫺x).
(3.67)
PUs f (x) ⫽
n∈Z2
This theorem gives a necessary and sufficient condition to obtain a stable
linear reconstruction from samples computed with a linear filter. The proof is
a direct extension of the proofs of Theorems 3.3 and 3.4. It recovers a signal
approximation as an orthogonal projection by filtering the discrete signal fd (x) ⫽
¯
n∈Z2 f s (ns) ␦(x ⫺ ns):
˜ s (x).
PUs f (x) ⫽ fd
The same as in one dimension, the filter s can be obtained by scaling a single
filter s (x) ⫽ s⫺1 (s⫺1 x). The two-dimensional Shannon-Whittaker theorem is a
ˆ s ⫽ s1[⫺/s,/s]2 , which defines an orthonormal basis of
particular example, where
the space Us of functions having a Fourier transform supported in [⫺/s, /s]2 .
3.4.2 Discrete Image Filtering
The properties of two-dimensional space-invariant operators are essentially the same
as in one dimension.The sampling interval s is normalized to 1.A pixel value located
at n ⫽ (n1 , n2 ) is written f [n]. A linear operator L is space-invariant if L fp [n] ⫽
L f [n ⫺ p] for any fp [n] ⫽ f [n ⫺ p], with p ⫽ ( p1 , p2 ) ∈ Z2 . A discrete Dirac is defined
by ␦[n] ⫽ 1 if n ⫽ (0, 0) and ␦[n] ⫽ 0 if n ⫽ (0, 0).
Impulse Response
Since f [n] ⫽
p∈Z2 f [ p] ␦[n ⫺ p], linearity and time invariance implies
L f [n] ⫽
f [ p] h[n ⫺ p] ⫽ f h[n],
(3.68)
p∈Z2
where h[n] is the response of the impulse h[n] ⫽ L␦[n]. If the impulse response is
separable
h[n1 , n2 ] ⫽ h1 [n1 ] h2 [n2 ],
(3.69)
then the two-dimensional convolution (3.68) is computed as one-dimensional convolutions along the columns of the image followed by one-dimensional convolutions
along the rows (or vice versa):
f h[n1 , n2 ] ⫽
⫹⬁
p1 ⫽⫺⬁
h1 [n1 ⫺ p1 ]
⫹⬁
p2 ⫽⫺⬁
h2 [n2 ⫺ p2 ] f [ p1 , p2 ].
(3.70)
3.4 Discrete Image Processing
This factorization reduces the number of operations. If h1 and h2 are finite
impulse response filters of size M1 and M2 , respectively, then the separable calculation (3.70) requires M1 ⫹ M2 additions and multiplications per point (n1 , n2 ) as
opposed to M1 M2 in a nonseparable computation (3.68).
Transfer Function
The Fourier transform of a discrete image f is defined by the Fourier series:
f̂ () ⫽
f [n] e⫺i·n , with · n ⫽ n1 1 ⫹ n2 2 .
(3.71)
n∈Z2
The two-dimensional extension of the convolution, Theorem 3.7, proves that if
g[n] ⫽ L f [n] ⫽ f h[n] then its Fourier transform is ĝ() ⫽ f̂ () ĥ(), and ĥ() is
the transfer function of the filter.When a filter is separable h[n1 , n2 ] ⫽ h1 [n1 ] h2 [n2 ],
its transfer function is also separable:
ĥ(1 , 2 ) ⫽ ĥ1 (1 ) ĥ2 (2 ).
(3.72)
3.4.3 Circular Convolutions and Fourier Basis
The discrete convolution of a finite image f̃ raises border problems. As in one
dimension,these border issues are solved by extending the image,making it periodic
along its rows and columns:
f [n1 , n2 ] ⫽ f̃ [n1 mod N1, n2 mod N2 ],
where N ⫽ N1 N2 is the image size.The resulting periodic image f [n1 , n2 ] is defined
for all (n1 , n2 ) ∈ Z2 , and each of its rows and columns are periodic one-dimension
signals.
A discrete convolution is replaced by a circular convolution over the image
period. If f and h have a periodicity N1 and N2 along (n1 , n2 ), then
f h[n1 , n2 ] ⫽
N
1 ⫺1 N
2 ⫺1
f [ p1 , p2 ] h[n1 ⫺ p1 , n2 ⫺ p2 ].
(3.73)
p1⫽0 p2 ⫽0
Discrete Fourier Transform
The eigenvectors of circular convolutions are two-dimensional discrete sinusoidal
waves:
ek [n] ⫽ ek ·n , with
k ⫽ (2k1 /N1 , 2k2 /N2 )
for 0 ⭐ k1 ⬍ N1 , 0 ⭐ k2 ⬍ N2 .
This family of N ⫽ N1 N2 discrete vectors is the separable product of two onedimensional discrete Fourier bases {ei2k1 n/N1 }0⭐k1 ⬍N1 and {ei2k1 n/N2 }0⭐k2 ⬍N2 .
Thus,Theorem A.3 proves that the family {ek [n]}0⭐k1 ⬍N1 ,0⭐k2 ⬍N2 is an orthogonal
basis of CN ⫽ CN1 ⊗ CN2 (Exercise 3.23). Any image f ∈ CN can be decomposed in
this orthogonal basis:
N ⫺1 N ⫺1
1
2
1 f̂ [k] eik ·n ,
f [n] ⫽
N k ⫽0 k ⫽0
1
2
(3.74)
83
84
CHAPTER 3 Discrete Revolution
where f̂ is the two-dimensional DFT of f
f̂ [k] ⫽ f , ek ⫽
N
1 ⫺1 N
2 ⫺1
f [n] e⫺ik ·n .
(3.75)
n1 ⫽0 n2 ⫽0
Fast Convolutions
Since eik ·n are eigenvectors of two-dimensional circular convolutions, the DFT of
g ⫽ f h is
ĝ[k] ⫽ f̂ [k] ĥ[k].
(3.76)
A direct computation of f h with the summation (3.73) requires O(N 2 ) multiplications. With the two-dimensional FFT described next, f̂ [k] and ĥ[k] as well as
the inverse DFT of their product (3.76) are calculated with O(N log N ) operations.
Noncircular convolutions are computed with a fast algorithm by reducing them to
circular convolutions, with the same approach as in Section 3.3.4.
Separable Basis Decomposition
Let B1 ⫽ {ek11 }0⭐k1 ⬍N1 and B2 ⫽ {ek22 }0⭐k2 ⬍N2 be two orthogonal bases of CN1 and
CN2 . Suppose the calculation of decomposition coefficients of f1 ∈ CN1 in the
basis B1 requires C1 (N1 ) operations and of f2 ∈ CN1 in the basis B2 requires
C2 (N2 ) operations. One can verify (Exercise 3.23) that the family B ⫽ {ek [n] ⫽
ek11 [n1 ] ek22 [n2 ]}0⭐k1 ⬍N1 ,0⭐k2 ⬍N2 is an orthogonal basis of the space CN ⫽ CN1 ⊗
CN2 of images f [n1 , n2 ] of N ⫽ N1 N2 pixels. We describe a fast separable algorithm that computes the decomposition coefficients of an image f in B with
N2 C1 (N1 ) ⫹ N1 C2 (N2 ) operations as opposed to N 2 . A fast two-dimensional FFT is
derived.
Two-dimensional inner products are calculated with
f , ek11 ek22 ⫽
⫽
N
1 ⫺1 N
2 ⫺1
n1 ⫽0 n2 ⫽0
N
1 ⫺1
n1 ⫽0
f [n1 , n2 ] ek1∗1 [n1 ] ek2∗2 [n2 ]
ek1∗1 [n1 ]
N
2 ⫺1
n2 ⫽0
(3.77)
f [n1 , n2 ] ek2∗2 [n2 ].
For 0 ⭐ n1 ⬍ N1 , we must compute
Uf [n1 , k2 ] ⫽
N
2 ⫺1
n2 ⫽0
f [n1 , n2 ] ek2∗2 [n2 ],
which are the decomposition coefficients of the N1 image rows of size N2 in the
basis B2 . The coefficients { f , ek11 ek22 }0⭐k1 ⬍N1 ,0⭐k2 ⬍N2 are calculated in (3.77) as
the inner products of the columns of the transformed image U f [n1 , k2 ] in the basis
B1 . The overall algorithm thus requires performing N1 one-dimensional transforms
3.5 Exercises
in the basis B2 plus N2 one-dimensional transforms in the basis B1 ; it therefore
requires N2 C1 (N1 ) ⫹ N1 C2 (N2 ) operations.
The fast Fourier transform algorithm of Section 3.3.3 decomposes signals of
size N1 and N2 in the discrete Fourier bases B1 ⫽ {ek11 [n1 ] ⫽ ei2k1 n1 /N1 }0⭐k1 ⬍N1
and B2 ⫽ {ek22 [n2 ] ⫽ ei2k2 n2 /N2 }0⭐k2 ⬍N2 , with C1 (N1 ) ⫽ KN1 log2 N1 and C2 (N2 ) ⫽
KN2 log2 N2 operations. A separable implementation of a two-dimensional FFT
thus requires N2 C1 (N1 ) ⫹ N1 C2 (N2 ) ⫽ KN log2 N operations, with N ⫽ N1 N2 .
A split-radix FFT corresponds to K ⫽ 3.
3.5 EXERCISES
3.1
1 Show that if (t) ⫽ s ⫺1/2 1
s
[0,s) (t), then {s (t ⫺ ns)}n∈Z is an orthonormal
basis of the space of piecewise constant function on intervals [ns, (n ⫹ 1)s)
for any n ∈ Z.
3.2
2 Prove that if f has a Fourier transform included in [⫺/s, /s], then
᭙u ∈ R,
3.3
f (u) ⫽
1
f (t), s (t ⫺ u)
s
with
s (t) ⫽
sin(t/s)
.
t/s
2 An interpolation function f (t) satisfies f (n) ⫽ ␦[n] for any n ∈ Z.
(a) Prove that ⫹⬁
k⫽⫺⬁ f̂ ( ⫹ 2k) ⫽ 1 if and only if f is an interpolation
function.
2
(b) Suppose that f (t) ⫽ ⫹⬁
n⫽⫺⬁ h[n] (t ⫺ n) with ∈ L (R). Find ĥ() as a
ˆ
function of () so that f (t) is an interpolation function. Relate f̂ () to
ˆ
(),
and give a sufficient condition on ˆ to guarantee that f ∈ L 2 (R).
3.4
2 Prove that if f ∈ L 2 (R) and
⫹⬁
⫹⬁
2
n⫽⫺⬁ f (t ⫺ n) ∈ L [0, 1], then
n⫽⫺⬁
3.5
⫹⬁
f (t ⫺ n) ⫽
f̂ (2k) ei2kt.
k⫽⫺⬁
1 We want to approximate f by a signal f̃ in an approximation space U .
s
Prove that the approximation f̃ that minimizes f̃ ⫺ f , is the orthogonal
projection of f in Us .
3.6
2 Prove that the discrete filter L f [n] ⫽ f h[n] is stable if and only if h ∈ 1 (Z).
3.7
2 If ĥ() and ĝ() are the Fourier transforms of h[n] and g[n], we write
ĥ ĝ() ⫽
⫹
⫺
ĥ() ĝ( ⫺ ) d.
Prove that if f [n] ⫽ g[n] h[n], then f̂ () ⫽ (2)⫺1 ĥ ĝ().
85
86
CHAPTER 3 Discrete Revolution
3.8
1 Prove that {ei2kn/N }
3.9
2 Suppose that f̂ has a support in [⫺(n ⫹ 1)/s, ⫺n/s] ∪ [n/s, (n ⫹ 1)/s]
0⭐k⬍N is an orthogonal family and thus an orthogonal
basis of CN . What renormalization factor is needed to obtain an orthonormal
basis?
and that f (t) is real. Find an interpolation formula that recovers f (t) from
{ f (ns)}n∈Z .
3.10
3 Suppose that f̂ has a support in [⫺/s, /s].
(a) Give the filter s (t) such that for any f ,
᭙n ∈ Z,
f̃ (ns) ⫽
(n⫹1/2)s
(n⫺1/2)s
f (t) dt ⫽ f s (ns).
(b) Show that f̃ (t) ⫽ f s (t) can be recovered from { f̃ (ns)}n∈Z with an
interpolation formula.
(c) Reconstruct f from f̃ by inverting s .
(d) Prove that the reconstruction of f (t) from { f̃ (ns)}n∈Z is stable.
3.11
2 The linear box spline (t) is defined in (3.31) for m ⫽ 1.
(a) Give an analytical formula for (t) and specify its support.
(b) Prove with (7.20) that {(t ⫺ n)}n∈Z is a Riesz basis of the space of finiteenergy functions that are continuous and linear on intervals [ns, (n ⫹ 1)]
for n ∈ Z.
˜
(c) Does the dual filter (t)
have a compact support? Compute its graph
numerically.
f [n] is defined for 0 ⭐ n ⬍ N , prove that | f̂ [k]| ⭐
0 ⭐ ⬍N .
3.12
1 If
3.13
2 The discrete and periodic total variation is
f V ⫽
N
⫺1
N ⫺1
n⫽0 | f [n]| for any
| f [n] ⫺ f [n ⫺ 1]| ⫹ | f [N ⫺ 1] ⫺ f [0]|.
n⫽0
⫺1
(a) Prove that f V ⫽ N
n⫽0 | f h[n]| where h[n] is a filter and specify
ĥ[k].
(b) Derive an upper bound of | f̂ [k]| as a function of k⫺1 .
3.14
1 Let g[n] ⫽ (⫺1)n h[n]. Relate ĝ() to ĥ(). If h is a low-pass filter,what kind
of filter is g?
3.15
2 Prove the convolution Theorem 3.7.
3.16
2 Let h⫺1 be the inverse of h defined by h h⫺1 [n] ⫽ ␦[n].
(a) Compute ĥ⫺1 () as a function of ĥ().
(b) Prove that if h has a finite support, then h⫺1 has a finite support if and
only if h[n] ⫽ ␦[n ⫺ p] for some p ∈ Z.
3.5 Exercises
3.17
1 All pass filters:
(a) Verify that
ĥ() ⫽
K
a∗ ⫺ e⫺i
k
1 ⫹ ak ei
k⫽1
is an all-pass filter; that is, |ĥ()| ⫽ 1.
(b) Prove that {h[n ⫺ m]}m∈Z is an orthonormal basis of 2 (Z).
3.18
2 Recursive filters:
(a) Compute the Fourier transform of h[n] ⫽ an 1[0,⫹⬁) [n] for |a| ⬍ 1. Compute the inverse Fourier transform of ĥ() ⫽ (1 ⫺ a e⫺i )⫺p .
(b) Suppose that g ⫽ f h is calculated by a recursive equation with real
coefficients
K
k⫽0
ak f [n ⫺ k] ⫽
M
bk g[n ⫺ k].
k⫽0
Write ĥ() as a function of the parameters ak and bk .
(c) Show that h is a stable filter if and only if the equation
has roots with a modulus strictly smaller than 1.
3.19
M
⫺k ⫽ 0
k⫽0 bk z
1 Discrete interpolation. Let f̂ [k] be the DFT of a signal f [n] of size N . We
define a signal f̃ [n] of size 2N by f̃ [N /2] ⫽ f̃ [3N /2] ⫽ f [N /2] and
⎧
if 0 ⭐ k ⬍ N /2
⎪
⎨2 f̂ [k]
f̃ [k] ⫽ 0
if N /2 ⬍ k ⬍ 3N /2
⎪
⎩
2 f̂ [k ⫺ N ] if 3N /2 ⬍ k ⬍ 2N .
Prove that f̃ is an interpolation of f that satisfies f̃ [2n] ⫽ f [n].
3.20
2 Decimation. Let x[n] ⫽ y[Mn] with M ⬎ 1.
3.21
3 We want to compute numerically the Fourier transform of f (t). Let f [n] ⫽
d
f
[n
⫺
pN
].
f (ns) and fp [n] ⫽ ⫹⬁
d
p⫽⫺⬁
⫺1
(a) Show that x̂() ⫽ M ⫺1 M⫺1
( ⫺ 2k) .
k⫽0 ŷ M
(b) Give a sufficient condition on ŷ() to recover y from x and give the
interpolation formula that recovers y[n] from x.
(a) Prove that the DFT of fp [n] is related to the Fourier series of fd [n] and
to the Fourier transform of f (t) by
⫹⬁
2k
2k 2l
1 f̂p [k] ⫽ f̂d
f̂
⫽
⫺
.
N
s l⫽⫺⬁
Ns
s
(b) Suppose that | f (t)| and | f̂ ()| are negligible when t ∈
/ [⫺t0 , t0 ] and
∈
/ [⫺0 , 0 ]. Relate N and s to t0 and 0 so that one can compute
87
88
CHAPTER 3 Discrete Revolution
an approximate value of f̂ () for all ∈ R by interpolating the samples
f̂p [k]. Is it possible to compute exactly f̂ () with such an interpolation
formula? 4
(c) Let f (t) ⫽ sin(t)/(t) . What is the support of f̂ ? Sample f appropriately and compute f̂ numerically with an FFT algorithm.
3.22
2 The analytic part f [n] of a real discrete signal f [n] of size N is defined by
a
⎧
⎨ f̂ [k]
f̂a [k] ⫽ 2 f̂ [k]
⎩
0
if k ⫽ 0, N /2
if 0 ⬍ k ⬍ N /2
if N /2 ⬍ k ⬍ N .
(a) Compute fa [n] for f [n] ⫽ cos(2kn/N ) with 0 ⬍ k ⬍ N /2.
(b) Prove that the real part g[n] ⫽ Re( f [n]) is what satisfies
ĝ[k] ⫽ ( f̂ [k] ⫹ f̂ ∗ [⫺k])/2.
(c) Prove that Re( fa ) ⫽ f .
Prove that if {ek1 [n1 ]}0⭐k1 ⬍N1 is an orthonormal basis of CN1
and {ek2 [n2 ]}0⭐k1 ⬍N1 is an orthonormal basis of CN2 , then {ek1 [n1 ]
ek2 [n2 ]}0⭐k1 ⬍N1 ,0⭐k2 ⬍N2 is an orthogonal basis of the space CN ⫽ CN1 N2 of
images f [n1 , n2 ] of N ⫽ N1 N2 pixels.
3.23
1
3.24
2 Let h[n , n ] be a nonseparable filter that is nonzero for 0 ⭐ n , n ⬍ M. Let
1
2
1
2
f [n1 , n2 ] be a square image defined for 0 ⭐ n1 , n2 ⭐ L M of N ⫽ (L M)2 pix-
els. Describe an overlap–add algorithm to compute g[n1 , n2 ] ⫽ f h[n1 , n2 ].
By using an FFT that requires K P log P operators to compute the Fourier
transform of an image of P pixels, how many operations does your algorithm
require? If K ⫽ 6, for what range of M is it better to compute the convolution
with a direct summation?
3.25
2 Let f [n , n , n ] be a three-dimensional signal of size N ⫽ N N N . The
1
2
3
1 2 3
discrete Fourier transform is defined as a decomposition in a separable discrete Fourier basis. Give a separable algorithm that decomposes f in this
basis with K N log N operations, by using a one-dimensional FFT algorithm
that requires K P log P operations for a one-dimensional signal of size P.
CHAPTER
Time Meets Frequency
4
When we listen to music, we clearly “hear” the time variation of the sound
“frequencies.” These localized frequency events are not “pure” tones but packets
of close frequencies. The properties of sounds are revealed by transforms that
decompose signals over elementary functions that have a narrow localization in
time and frequency. Windowed Fourier transforms and wavelet transforms are two
important classes of local time-frequency decompositions. Measuring the time variations of“instantaneous”frequencies illustrates the limitations imposed by the Heisenberg uncertainty. Such frequencies are detected as local maxima in windowed
Fourier and wavelet dictionaries and define a signal-approximation support.
Audio-processing algorithms are implemented by modifying the geometry of this
approximation support.
There is no unique definition of time-frequency energy density; all quadratic
distributions are related through the averaging of a single quadratic form called the
Wigner-Ville distribution. This framework gives another perspective on windowed
Fourier and wavelet transforms.
4.1 TIME-FREQUENCY ATOMS
A linear time-frequency transform correlates the signal with a dictionary of waveforms that are concentrated in time and in frequency. The waveforms are called
time-frequency atoms. Let us consider a general dictionary of atoms D ⫽ { }∈⌫ ,
where might be a multiindex parameter. We suppose that ∈ L 2 (R) and that
⫽ 1.The corresponding linear time-frequency transform of f ∈ L 2 (R) is defined
by
⫹⬁
⌽f () ⫽
f (t) ∗ (t) dt ⫽ f , .
⫺⬁
The Parseval formula (2.25) proves that
⫹⬁
⫹⬁
1
∗
f̂ () ˆ ∗ () d.
⌽f () ⫽
f (t) (t) dt ⫽
2 ⫺⬁
⫺⬁
(4.1)
89
90
CHAPTER 4 Time Meets Frequency
If (t) is nearly zero when t is outside a neighborhood of an abscissa u, then
ˆ () is
f , depends only on the values of f in this neighborhood. Similarly, if negligible for far from ,then the right integral of (4.1) proves that f , reveals
the properties of f̂ in the neighborhood of .
Heisenberg Boxes
The slice of information provided by f , is represented in a time-frequency plane
(t, ) by a rectangle having a position and size that depends on the time-frequency
spread of . Since
⫹⬁
2 ⫽
| (t)|2 dt ⫽ 1,
⫺⬁
we interpret | (t)|2 as a probability distribution centered at
⫹⬁
u ⫽
t | (t)|2 dt.
(4.2)
The spread around u is measured by the variance
⫹⬁
t2 () ⫽
(t ⫺ u )2 | (t)|2 dt.
(4.3)
⫺⬁
⫺⬁
The Plancherel formula (2.26) proves that
⫹⬁
|ˆ ()|2 d ⫽ 2 2 .
⫺⬁
ˆ is therefore defined by
The center frequency of ⫹⬁
1
⫽
|ˆ ()|2 d,
2 ⫺⬁
and its spread around is
2 () ⫽
1
2
⫹⬁
⫺⬁
( ⫺ )2 |ˆ ()|2 d.
(4.4)
(4.5)
The time-frequency resolution of is represented in the time-frequency plane
(t, ) by a Heisenberg box centered at (u , ), having a time width equal to
t () and a frequency (). Figure 4.1 illustrates this. The Heisenberg uncertainty
Theorem 2.6 proves that the area of the rectangle is at least one-half:
1
t ⭓ .
2
(4.6)
This limits the joint resolution of in time and frequency. The time-frequency
plane must be manipulated carefully because a point (t0 , 0 ) is ill-defined. There is
no function that is concentrated perfectly at a point t0 and a frequency 0 . Only
rectangles with an area of at least one-half may correspond to time-frequency atoms.
4.1 Time-Frequency Atoms
^ ()|
|
t
| (t)|
0
t
u
FIGURE 4.1
Heisenberg box representing an atom .
Translation-Invariant Dictionaries
For pattern recognition, it can be important to construct signal representations that
are translation-invariant. When a pattern is translated, its numerical descriptors are
then translated but not modified. Observe that for any ∈ D and any shift u,
f (t ⫺ u), (t) ⫽ f (t), (t ⫹ u).
A translation-invariant representation is thus obtained if (t ⫹ u) is in D up to a
multiplicative constant. Such a dictionary is said to be translation-invariant.
A translation-invariant dictionary is obtained by translating a family of generators
{ }∈⌫ and can be written D ⫽ {u, }∈⌫,u∈R , with u, (t) ⫽ u, (t ⫺ u). The
resulting time-frequency transform of f can then be written as a convolution:
⌽f (u, ) ⫽ f , u, ⫽
⫹⬁
⫺⬁
f (t) u, ∗ (t ⫺ u) dt ⫽ u, f ˜ (u),
˜ (t) ⫽ ∗ (⫺t).
with Energy Density
Let us suppose that (t) is centered at t ⫽ 0 so that u, (t) is centered at u. Let be the center frequency of ˆ () defined in (4.4). The time-frequency box of u,
specifies a neighborhood of (u, ), where the energy of f is measured by
⫹⬁
2
∗
P⌽ f (u, ) ⫽ | f , u, | ⫽ f (t) u, (t) dt .
2
⫺⬁
(4.7)
Section 4.5.1 proves that any such energy density is an averaging of the Wigner-Ville
distribution, with a kernel that depends on the atoms u, .
91
92
CHAPTER 4 Time Meets Frequency
EXAMPLE 4.1
A windowed Fourier atom is constructed with a window g modulated by the frequency and
translated by u:
u, (t) ⫽ gu, (t) ⫽ eit g(t ⫺ u).
(4.8)
The resulting window Fourier dictionary D ⫽ {gu, (t)}u,∈R2 is translation-invariant since
gu, ⫽ eiu g0, (t ⫺ u). A windowed Fourier dictionary is also frequency-invariant because
eit gu, (t) ⫽ gu,⫹ (t) ∈ D.
This dictionary is thus particularly useful to analyze patterns that are translated in time and
frequency.
A wavelet atom is a dilation by s and a translation by u of a mother wavelet :
u, (t) ⫽
1
s
u,s (t) ⫽ √
t ⫺u
.
s
(4.9)
A wavelet dictionary D ⫽ { u,s (t)}u∈R,s∈R⫹ is translation-invariant but also scale-invariant
because scaling any wavelet produces a dilated wavelet that remains in the dictionary.
A wavelet dictionary can be used to analyze patterns translated and scaled by arbitrary
factors.
Wavelets and windowed Fourier atoms have well-localized energy in time while their Fourier
transform is mostly concentrated in a limited-frequency band. The properties of the resulting
transforms are studied in Sections 4.2 and 4.3.
4.2 WINDOWED FOURIER TRANSFORM
In 1946, Gabor [267] introduced windowed Fourier atoms to measure the “frequency variations” of sounds. A real and symmetric window g(t) ⫽ g(⫺t) is translated
by u and modulated by the frequency :
gu, (t) ⫽ eit g(t ⫺ u).
(4.10)
It is normalized g ⫽ 1 so that gu, ⫽ 1 for any (u, ) ∈ R2 .The resulting windowed
Fourier transform of f ∈ L 2 (R) is
S f (u, ) ⫽ f , gu, ⫽
⫹⬁
⫺⬁
f (t) g(t ⫺ u) e⫺it dt.
(4.11)
This transform is also called the short time Fourier transform because the multiplication by g(t ⫺ u) localizes the Fourier integral in the neighborhood of t ⫽ u.
4.2 Windowed Fourier Transform
As in (4.7), one can define an energy density called a spectrogram, denoted PS :
⫹⬁
2
PS f (u, ) ⫽ |S f (u, )|2 ⫽ f (t) g(t ⫺ u) e⫺it dt .
(4.12)
⫺⬁
The spectrogram measures the energy of f in a time-frequency neighborhood of
(u, ) specified by the Heisenberg box of gu, .
Heisenberg Boxes
Since g is even, gu, (t) ⫽ eit g(t ⫺ u) is centered at u. The time spread around u
is independent of u and :
⫹⬁
⫹⬁
t2 ⫽
(t ⫺ u)2 | gu, (t)|2 dt ⫽
t 2 | g(t)|2 dt.
(4.13)
⫺⬁
⫺⬁
The Fourier transform ĝ of g is real and symmetric because g is real and symmetric.
The Fourier transform of gu, is
ĝu, () ⫽ ĝ( ⫺ ) exp[⫺iu( ⫺ )].
(4.14)
It is a translation by of the frequency window ĝ, so its center frequency is . The
frequency spread around is
⫹⬁
⫹⬁
1
1
2 ⫽
( ⫺ )2 | ĝu, ()|2 d ⫽
2 |ĝ()|2 d.
(4.15)
2 ⫺⬁
2 ⫺⬁
It is independent of u and . Thus, gu, corresponds to a Heisenberg box of
area t centered at (u, ), as illustrated by Figure 4.2. The size of this box is
independent of (u, ), which means that a windowed Fourier transform has the
same resolution across the time-frequency plane.
t
|g^v, ()|
|g^u, ()|
t
|g u, (t)|
0
u
|g v, (t)|
v
FIGURE 4.2
Heisenberg boxes of two windowed Fourier atoms, gu, and g , .
t
93
94
CHAPTER 4 Time Meets Frequency
EXAMPLE 4.2
A sinusoidal wave f (t) ⫽ exp(i0 t), the Fourier transform of which is a Dirac f̂ () ⫽ 2
( ⫺ 0 ), has a windowed Fourier transform:
S f (u, ) ⫽ ĝ( ⫺ 0 ) exp[⫺iu( ⫺ 0 )].
Its energy is spread over the frequency interval [0 ⫺ /2, 0 ⫹ /2].
EXAMPLE 4.3
The windowed Fourier transform of a Dirac f (t) ⫽ (t ⫺ u0 ) is
S f (u, ) ⫽ g(u0 ⫺ u) exp(⫺iu0 ).
Its energy is spread in the time interval [u0 ⫺ t /2, u0 ⫹ t /2].
EXAMPLE 4.4
A linear chirp f (t) ⫽ exp(iat 2 ) has an “instantaneous” frequency that increases linearly in
time. For a Gaussian window g(t) ⫽ ( 2 )⫺1/4 exp[⫺t 2 /(2 2 )], the windowed Fourier transform of f is calculated using the Fourier transform (2.34) of Gaussian chirps. One can verify
that its spectrogram is
PS f (u, ) ⫽ |S f (u, )|2 ⫽
4 2
1 ⫹ 4a2 4
1/2
2 ( ⫺ 2au)2
.
exp ⫺
1 ⫹ 4a2 4
(4.16)
For a fixed time u, PS f (u, ) is a Gaussian that reaches its maximum at the frequency
(u) ⫽ 2au. Observe that if we write f (t) ⫽ exp[i(t)], then (u) is equal to the instantaneous
frequency, defined as the derivative of the phase: (u) ⫽ ⬘(u) ⫽ 2au. Section 4.4.2 explains
this results.
EXAMPLE 4.5
Figure 4.3 gives the spectrogram of a signal that includes a linear chirp, a quadratic chirp, and
two modulated Gaussians. The spectrogram is computed with a Gaussian window dilated by
⫽ 0.05. As expected from (4.16), the linear chirp yields large amplitude coefficients along
the trajectory of its instantaneous frequency, which is a straight line. The quadratic chirp
yields large coefficients along a parabola. The two modulated Gaussians produce low- and
high-frequency blobs at u ⫽ 0.5 and u ⫽ 0.87.
4.2.1 Completeness and Stability
When the time-frequency indices (u, ) vary across R2 , the Heisenberg boxes of the
atoms gu, cover the whole time-frequency plane. One can expect therefore that f
4.2 Windowed Fourier Transform
f(t)
2
0
2
t
0
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
0.6
0.8
1
/ 2
500
400
300
200
100
0
u
0
(a)
/ 2
500
400
300
200
100
0
u
0
0.2
0.4
(b)
FIGURE 4.3
The signal includes a linear chirp with a frequency that increases, a quadratic chirp with a
frequency that decreases, and two modulated Gaussian functions located at t ⫽ 0.5 and
t ⫽ 0.87. (a) Spectrogram PS f (u, ); dark points indicate large-amplitude coefficients.
(b) Complex phase of S f (u, ) in regions where the modulus PS f (u, ) is nonzero.
95
96
CHAPTER 4 Time Meets Frequency
can be recovered from its windowed Fourier transform S f (u, ). Theorem 4.1 gives
a reconstruction formula and proves that the energy is conserved.
Theorem 4.1. If f ∈ L 2 (R), then
1
2
f (t) ⫽
⫹⬁ ⫹⬁
⫺⬁
⫺⬁
S f (u, ) g(t ⫺ u) eit d du
(4.17)
and
⫹⬁
⫺⬁
| f (t)|2 dt ⫽
1
2
⫹⬁ ⫹⬁
⫺⬁
⫺⬁
|S f (u, )|2 d du.
(4.18)
Proof. The reconstruction formula (4.17) is proved first. Let us apply the Fourier Parseval
formula (2.25) to the integral (4.17) with respect to the integration in u. The Fourier
transform of f (u) ⫽ S f (u, ) with respect to u is computed by observing that
S f (u, ) ⫽ exp(⫺iu)
⫹⬁
⫺⬁
f (t) g(t ⫺ u) exp[i(u ⫺ t)] dt ⫽ exp(⫺iu) f g (u),
where g (t) ⫽ g(t) exp(it) because g(t) ⫽ g(⫺t). Its Fourier transform therefore is
f̂ () ⫽ f̂ ( ⫹ ) ĝ ( ⫹ ) ⫽ f̂ ( ⫹ ) ĝ().
The Fourier transform of g(t ⫺ u) with respect to u is ĝ() exp(⫺it). Thus,
1
2
⫹⬁ ⫹⬁
⫺⬁
1
⫽
2
⫺⬁
S f (u, ) g(t ⫺ u) exp(it) du d
⫹⬁ ⫺⬁
1
2
⫹⬁
⫺⬁
f̂ ( ⫹ ) |ĝ()| exp[it( ⫹ )] d d.
2
If f̂ ∈ L 1 (R), we can apply the Fubini theorem (A.2) to reverse the integration order.
The inverse Fourier transform proves that
1
2
⫹⬁
⫺⬁
f̂ ( ⫹ ) exp[it( ⫹ )] d ⫽ f (t).
⫹⬁
1
2
Since 2
/ L 1 (R), a density argument is used
⫺⬁ |ĝ()| d ⫽ 1, we derive (4.17). If f̂ ∈
to verify this formula.
Let us now prove the energy conservation (4.18). Since the Fourier transform in u of
S f (u, ) is f̂ ( ⫹ ) ĝ(), the Plancherel formula (2.26) applied to the right side of (4.18)
gives
1
2
⫹⬁ ⫹⬁
⫺⬁
⫺⬁
|S f (u, )|2 du d ⫽
1
2
⫹⬁
⫺⬁
1
2
⫹⬁
⫺⬁
| f̂ ( ⫹ ) ĝ()|2 d d.
4.2 Windowed Fourier Transform
The Fubini theorem applies and the Plancherel formula proves that
1
2
⫹⬁
⫺⬁
| f̂ ( ⫹ )|2 d ⫽ f 2 ,
■
which implies (4.18).
The reconstruction formula (4.17) can be rewritten
⫹⬁ ⫹⬁
1
f (t) ⫽
f , gu, gu, (t) d du.
2 ⫺⬁ ⫺⬁
(4.19)
It resembles the decomposition of a signal in an orthonormal basis; however, it
is not because the functions {gu, }u,∈R2 are very redundant in L 2 (R). The second equality (4.18) justifies the interpretation of the spectrogram PS f (u, ) ⫽
|S f (u, )|2 as an energy density because its time-frequency sum equals the signal
energy.
Reproducing Kernel
A windowed Fourier transform represents a one-dimension signal f (t) by a
two-dimensional function S f (u, ). Energy conservation proves that S f ∈ L 2 (R2 ).
Because S f (u, ) is redundant, it is not true that any ⌽ ∈ L 2 (R2 ) is the windowed
Fourier transform of some f ∈ L 2 (R). Theorem 4.2 gives a necessary and sufficient
condition for such a function to be a windowed Fourier transform.
Theorem 4.2. Let ⌽ ∈ L 2 (R2 ). There exists f ∈ L 2 (R) such that ⌽(u, ) ⫽ S f (u, ), if and
only if,
⫹⬁ ⫹⬁
1
⌽(u0 , 0 ) ⫽
⌽(u, ) K (u0 , u, 0 , ) du d,
(4.20)
2 ⫺⬁ ⫺⬁
with
K (u0 , u, 0 , ) ⫽ gu, , gu0 ,0 .
(4.21)
Proof. Suppose that there exists f such that ⌽(u, ) ⫽ S f (u, ). Let us replace f with its
reconstruction integral (4.17) in the windowed Fourier transform definition:
S f (u0 , 0 ) ⫽
⫹⬁ ⫺⬁
1
2
⫹⬁ ⫹⬁
⫺⬁
⫺⬁
S f (u, ) gu, (t) du d
gu∗ 0 ,0 (t) dt.
(4.22)
Inverting the integral on t with the integrals on u and yields (4.20). To prove that the
condition (4.20) is sufficient, we define f as in the reconstruction formula (4.17):
f (t) ⫽
1
2
⫹⬁ ⫹⬁
⫺⬁
⫺⬁
⌽(u, ) g(t ⫺ u) exp(it) d du
and show that (4.20) implies that ⌽(u, ) ⫽ S f (u, ).
■
97
98
CHAPTER 4 Time Meets Frequency
Ambiguity Function
The reproducing kernel K (u0 , u, 0 , ) measures the time-frequency overlap of the
two atoms gu, and gu0 ,0 . The amplitude of K (u0 , u, 0 , ) decays with u0 ⫺ u and
0 ⫺ at a rate that depends on the energy concentration of g and ĝ. Replacing gu,
and gu0 ,0 by their expression and making the change of variable v ⫽ t ⫺ (u ⫹ u0 )/2
in the inner product integral (4.21) yields
i
K (u0 , u, 0 , ) ⫽ exp ⫺ (0 ⫺ )(u ⫹ u0 ) Ag(u0 ⫺ u, 0 ⫺ ),
(4.23)
2
where
Ag( , ) ⫽
⫹⬁ g v⫹
g v⫺
e⫺iv dv
2
2
⫺⬁
(4.24)
is called the ambiguity function of g. Using the Parseval formula to replace this
time integral with a Fourier integral gives
⫹⬁ 1
i Ag( , ) ⫽
ĝ ⫹
(4.25)
ĝ ⫺
e d.
2 ⫺⬁
2
2
The decay of the ambiguity function measures the spread of g in time and of ĝ
in frequency. For example, if g has a support included in an interval of size T , then
Ag( , ) ⫽ 0 for | | ⭓ T /2. The integral (4.25) shows that the same result applies to
the support of ĝ.
4.2.2 Choice of Window
The resolution in time and frequency of the windowed Fourier transform depends
on the spread of the window in time and frequency. This can be measured from
the decay of the ambiguity function (4.24) or more simply from the area t of the Heisenberg box. The uncertainty Theorem 2.6 proves that this area reaches
the minimum value 1/2, if, and only if, g is a Gaussian. The ambiguity function
Ag( , ) is then a two-dimensional Gaussian.
Window Scale
The time-frequency localization of g can be modified with a scaling. Suppose that
g has a Heisenberg time and frequency width, respectively, equal to t and .
Let gs (t) ⫽ s⫺1/2 g(t/s) be its dilation by s. A change of variables in the integrals
(4.13) and (4.15) shows that the Heisenberg time and frequency width of gs are,
respectively, st and /s. The area of the Heisenberg box is not modified, but it is
dilated by s in time and compressed by s in frequency. Similarly,a change of variable
in the ambiguity integral (4.24) shows that the ambiguity function is dilated in time
and frequency, respectively, by s and 1/s:
Ags ( , ) ⫽ Ag , s .
s
The choice of a particular scale s depends on the desired resolution trade-off
between time and frequency.
4.2 Windowed Fourier Transform
^
g ()
D
0
20
A
A
FIGURE 4.4
The energy spread of ĝ is measured by its bandwidth ⌬ and the maximum amplitude A of
the first side lobes, located at ⫽⫾ 0 .
Finite Support
In numerical applications, g must have a compact support. Theorem 2.7 proves
that its Fourier transform ĝ necessarily has an infinite support. It is a symmetric
function with a main lobe centered at ⫽ 0, which decays to zero with oscillations. Figure 4.4 illustrates its behavior. To maximize the frequency resolution of
the transform, we must concentrate the energy of ĝ near ⫽ 0. The following three
important parameters evaluate the spread of ĝ:
■
The root mean-square bandwidth ⌬, which is defined by
|ĝ(⌬/2)|2 1
⫽ .
|ĝ(0)|2
2
■
The maximum amplitude A of the first side lobes located at ⫽⫾ 0 in
Figure 4.4. It is measured in decibels:
A ⫽ 10 log10
■
|ĝ(0 )|2
.
|ĝ(0)|2
The polynomial exponent p, which gives the asymptotic decay of |ĝ()| for
large frequencies:
|ĝ()| ⫽ O(⫺p⫺1 ).
(4.26)
Table 4.1 gives the values of these three parameters for several windows g having
a support restricted to [⫺1/2, 1/2] [293]. Figure 4.5 shows the graph of these
windows.
To interpret the three frequency parameters, let us consider the spectrogram of
a frequency tone f (t) ⫽ exp(i0 t). If ⌬ is small, then |S f (u, )|2 ⫽ |ĝ( ⫺ 0 )|2 has
energy concentrated near ⫽ 0 .The side lobes of ĝ create“shadows”at ⫽ 0 ⫾ 0 ,
which can be neglected if A is also small.
99
100
CHAPTER 4 Time Meets Frequency
Table 4.1 Frequency Parameters of Five Windows g
g(t)
Name
⌬
A
p
Rectangle
1
0.89
⫺13db
0
Hamming
0.54 ⫹ 0.46 cos(2t)
1.36
⫺43db
0
Gaussian
exp(⫺18t 2 )
1.55
⫺55db
0
Hanning
cos2 (t)
1.44
⫺32db
2
Blackman
0.42 ⫹ 0.5 cos(2t) ⫹ 0.08 cos(4t)
1.68
⫺58db
2
Note: Supports are restricted to [⫺1/2, 1/2]. The windows are normalized so that g(0) ⫽ 1 but g ⫽ 1.
Hamming
Gaussian
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
⫺0.5
0
Hanning
0.5 ⫺0.5
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
⫺0.5
0
0.5 ⫺0.5
0
Blackman
0.5
0
0.5
FIGURE 4.5
Graphs of four windows g with supports that are [⫺1/2, 1/2].
If the frequency tone is embedded in a signal that has other components of
much higher energy at different frequencies, the tone can still be detected if
ĝ( ⫺ ) attenuates these components rapidly when | ⫺ | increases. This means
that |ĝ()| has a rapid decay, and Theorem 2.5 proves that this decay depends on
4.2 Windowed Fourier Transform
the regularity of g. Property (4.26) is typically satisfied by windows that are p times
differentiable.
4.2.3 Discrete Windowed Fourier Transform
The discretization and fast computation of the windowed Fourier transform follow
the same ideas as the discretization of the Fourier transform described in Section 3.3.
We consider discrete signals of period N .The window g[n] is chosen to be a symmetric discrete signal of period N with unit norm g ⫽ 1. Discrete windowed Fourier
atoms are defined by
i2ln
gm,l [n] ⫽ g[n ⫺ m] exp
.
N
The discrete Fourier transform (DFT) of gm,l is
⫺i2m(k ⫺ l)
ĝm,l [k] ⫽ ĝ[k ⫺ l] exp
.
N
The discrete windowed Fourier transform of a signal f of period N is
N ⫺1
⫺i2ln
S f [m, l] ⫽ f, gm,l ⫽
f [n] g[n ⫺ m] exp
,
N
n⫽0
(4.27)
For each 0 ⭐ m ⬍ N , S f [m, l] is calculated for 0 ⭐ l ⬍ N with a DFT of f [n]g[n ⫺
m]. This is performed with N FFT procedures of size N , and therefore requires a
total of O(N 2 log2 N ) operations. Figure 4.3 is computed with this algorithm.
Inverse Transform
Theorem 4.3 discretizes the reconstruction formula and the energy conservation of
Theorem 4.1.
Theorem 4.3. If f is a signal of period N , then
N ⫺1 N ⫺1
1
i2ln
f [n] ⫽
S f [m, l] g[n ⫺ m] exp
N m⫽0 l⫽0
N
(4.28)
and
N ⫺1 N ⫺1
N ⫺1
| f [n]|2 ⫽
n⫽0
1
|S f [m, l]|2 .
N l⫽0 m⫽0
(4.29)
This theorem is proved by applying the Parseval and Plancherel formulas of the
discrete Fourier transform, exactly as in the proof of Theorem 4.1 (Exercise 4.1).
The energy conservation (4.29) proves that this windowed Fourier transform defines
a tight frame, as explained in Chapter 5. The reconstruction formula (4.28) is
rewritten
N ⫺1
N ⫺1
1
i2ln
f [n] ⫽
g[n ⫺ m]
S f [m, l] exp
.
N m⫽0
N
l⫽0
101
102
CHAPTER 4 Time Meets Frequency
The second sum computes, for each 0 ⭐ m ⬍ N , the inverse DFT of S f [m, l]
with respect to l. This is calculated with N FFT procedures, requiring a total of
O(N 2 log2 N ) operations.
A discrete windowed Fourier transform is an N 2 image S f [l, m] that is very
redundant because it is entirely specified by a signal f of size N . The redundancy
is characterized by a discrete reproducing kernel equation, which is the discrete
equivalent of (4.20) (Exercise 4.1).
4.3 WAVELET TRANSFORMS
To analyze signal structures of very different sizes, it is necessary to use timefrequency atoms with different time supports. The wavelet transform decomposes
signals over dilated and translated wavelets. A wavelet is a function ∈ L 2 (R) with
a zero average:
⫹⬁
(t) dt ⫽ 0.
(4.30)
⫺⬁
It is normalized ⫽ 1 and centered in the neighborhood of t ⫽ 0. A dictionary of
time-frequency atoms is obtained by scaling by s and translating it by u:
1
t ⫺u
D ⫽ u,s (t) ⫽ √
s
s
u∈R,s∈R⫹
These atoms remain normalized: u,s ⫽ 1. The wavelet transform of f ∈ L 2 (R) at
time u and scale s is
⫹⬁
t ⫺u
1
W f (u, s) ⫽ f , u,s ⫽
f (t) √ ∗
dt.
(4.31)
s
s
⫺⬁
Linear Filtering
The wavelet transform can be rewritten as a convolution product:
⫹⬁
1
t ⫺u
W f (u, s) ⫽
f (t) √ ∗
dt ⫽ f ¯ s (u),
s
s
⫺⬁
with
¯ s (t) ⫽ √1
s
∗
(4.32)
⫺t
.
s
The Fourier transform of ¯ s (t) is
√
¯ s () ⫽ s ˆ ∗ (s).
(4.33)
4.3 Wavelet Transforms
⫹⬁
Since ˆ (0) ⫽ ⫺⬁ (t) dt ⫽ 0,it appears that ˆ is the transfer function of a band-pass
filter.The convolution (4.32) computes the wavelet transform with dilated band-pass
filters.
Analytic Versus Real Wavelets
Like a windowed Fourier transform, a wavelet transform can measure the time
evolution of frequency transients. This requires using a complex analytic wavelet,
which can separate amplitude and phase components.The properties of this analytic
wavelet transform are described in Section 4.3.2,and its application to the measurement of instantaneous frequencies is explained in Section 4.4.3. In contrast, real
wavelets are often used to detect sharp signal transitions. Section 4.3.1 introduces
elementary properties of real wavelets, which are developed in Chapter 6.
4.3.1 Real Wavelets
Suppose that
is a real wavelet. Since it has a zero average, the wavelet integral
⫹⬁
1
t ⫺u
W f (u, s) ⫽
f (t) √ ∗
dt
s
s
⫺⬁
measures the variation of f in a neighborhood of u proportional to s. Section 6.1.3
proves that when scale s goes to zero, the decay of the wavelet coefficient characterizes the regularity of f in the neighborhood of u.This has important applications
for detecting transients and analyzing fractals. This section concentrates on the
completeness and redundancy properties of real wavelet transforms.
EXAMPLE 4.6
Wavelets equal to the second derivative of a Gaussian are called Mexican hats. They were
first used in computer vision to detect multiscale edges [487]. The normalized Mexican hat
wavelet is
(t) ⫽
2
√
1/4
2
2
⫺t
t
.
⫺
1
exp
2 2
3 2
(4.34)
For ⫽ 1, Figure 4.6 plots ⫺ and its Fourier transform:
√ 5/2 1/4
⫺ 2 2
2
ˆ () ⫽ ⫺ 8 .
exp
√
2
3
(4.35)
Figure 4.7 shows the wavelet transform of a piecewise regular signal on the left and, almost
everywhere, singular on the right. The maximum scale is smaller than 1 because the support
of f is normalized to [0, 1]. The minimum scale is limited by the sampling interval of the
discretized signal used in numerical calculations. When the scale decreases, the wavelet
transform has a rapid decay to zero in the regions where the signal is regular. The isolated
singularities on the left create cones of large-amplitude wavelet coefficients that converge to
the locations of the singularities. This is further explained in Chapter 6.
103
104
CHAPTER 4 Time Meets Frequency
⫺ ^ ()
⫺ (t )
1
1.5
0.5
1
0.5
0
0
⫺0.5
⫺5
0
5
⫺5
0
5
FIGURE 4.6
Mexican-hat wavelet (4.34) for ⫽ 1 and its Fourier transform.
f(t)
2
1
0
t
0
⫺6
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
log2(s)
⫺4
⫺2
0
u
0
FIGURE 4.7
Real wavelet transform W f (u, s) computed with a Mexican-hat wavelet (4.34). The vertical
axis represents log2 s. Black, gray, and white points correspond, respectively, to positive, zero,
and negative wavelet coefficients.
4.3 Wavelet Transforms
A real wavelet transform is complete and maintains an energy conservation
as long as the wavelet satisfies a weak admissibility condition, specified by Theorem 4.4. This theorem was first proved in 1964 by the mathematician Calderón
[132] from a different point of view. Wavelets did not appear as such, but Calderón
defines a wavelet transform as a convolution operator that decomposes the identity.
Grossmann and Morlet [288] were not aware of Calderón’s work when they proved
the same formula for signal processing.
Theorem 4.4: Calderón, Grossmann and Morlet. Let ∈ L 2 (R) be a real function such
that
⫹⬁ ˆ
| ()|2
C ⫽
d ⬍ ⫹⬁.
(4.36)
0
Any f ∈ L 2 (R) satisfies
1
f (t) ⫽
C
and
⫹⬁
⫺⬁
⫹⬁ ⫹⬁
1
w f (u, s) √
s
⫺⬁
0
| f (t)|2 dt ⫽
⫹⬁ ⫹⬁
1
C
0
⫺⬁
t ⫺u
s
ds
,
s2
(4.37)
ds
.
s2
(4.38)
du
|w f (u, s)|2 du
Proof. The proof of (4.38) is almost identical to the proof of (4.18). Let us concentrate on the
proof of (4.37).The right integral b(t) of (4.37) can be rewritten as a sum of convolutions.
Inserting w f (u, s) ⫽ f ¯ s (u) with s (t) ⫽ s⫺1/2 (t/s) yields
b(t) ⫽
1
C
1
⫽
C
⫹⬁
0
⫹⬁
0
w f (., s) ds
s (t) 2
s
(4.39)
ds
f ¯ s s (t) 2 .
s
The“.”indicates the variable over which the convolution is calculated.We prove that b ⫽ f
by showing that their Fourier transforms are equal. The Fourier transform of b is
b̂() ⫽
Since
that
1
C
⫹⬁
f̂ ()
√
s ˆ ∗ (s)
0
√
ds f̂ ()
s ˆ (s) 2 ⫽
s
C
⫹⬁
0
| ˆ (s )|2
ds
.
s
is real we know that | ˆ (⫺)|2 ⫽ | ˆ ()|2 .The change of variable ⫽ s thus proves
b̂() ⫽
⫹⬁ ˆ
1
| ()|2
f̂ ()
d ⫽ f̂ ().
C
0
The theorem hypothesis
C ⫽
⫹⬁ ˆ
| ()|2
d ⬍ ⫹⬁
0
■
105
106
CHAPTER 4 Time Meets Frequency
is called the wavelet admissibility condition. To guarantee that this integral is
finite, we must ensure that ˆ (0) ⫽ 0, which explains why wavelets must have a
zero average.This condition is nearly sufficient. If ˆ (0) ⫽ 0 and ˆ () is continuously
differentiable, then the admissibility condition is satisfied. One can verify that ˆ ()
is continuously differentiable if has a sufficient time decay:
⫹⬁
(1 ⫹ |t|) | (t)| dt ⬍ ⫹⬁.
⫺⬁
Reproducing Kernel
Like a windowed Fourier transform, a wavelet transform is a redundant representation with a redundancy characterized by a reproducing kernel equation. Inserting
the reconstruction formula (4.37) into the definition of the wavelet transform yields
⫹⬁ ⫹⬁ ⫹⬁
ds
1
∗
w f (u0 , s0 ) ⫽
w f (u, s) u,s (t) du 2
u0 ,s0 (t) dt.
C 0
s
⫺⬁
⫺⬁
Interchanging these integrals gives
⫹⬁
1
ds
w f (u0 , s0 ) ⫽
K (u, u0 , s, s0 ) w f (u, s) du 2 ,
C ⫺⬁
s
(4.40)
with
K (u0 , u, s0 , s) ⫽ u,s ,
u0 ,s0 .
(4.41)
The reproducing kernel K (u0 , u, s0 , s) measures the correlation of two wavelets,
u0 ,s0 . The reader can verify that any function ⌽(u, s) is the wavelet
transform of some f ∈ L 2 (R) if and only if it satisfies the reproducing kernel
equation (4.40).
u,s and
Scaling Function
When w f (u, s) is known only for s ⬍ s0 , to recover f we need a complement of
information that corresponds to w f (u, s) for s ⬎ s0 .This is obtained by introducing
a scaling function that is an aggregation of wavelets at scales larger than 1. The
modulus of its Fourier transform is defined by
⫹⬁
⫹⬁ ˆ
ds
| ()|2
2
ˆ
|()|
⫽
| ˆ (s)|2
⫽
d,
(4.42)
s
1
ˆ
and the complex phase of ()
can be arbitrarily chosen. One can verify that ⫽ 1,
and we can derive from the admissibility condition (4.36) that
2
ˆ
lim |()|
⫽C .
→0
(4.43)
The scaling function therefore can be interpreted as the impulse response of a
low-pass filter. Let us denote
1
t
s (t) ⫽ √ and ¯ s (t) ⫽ ∗s (⫺t).
s
s
4.3 Wavelet Transforms
The low-frequency approximation of f at scale s is
1
t ⫺u
¯ s (u).
Lf (u, s) ⫽ f (t), √ ⫽f s
s
(4.44)
With a minor modification of Theorem 4.4’s, proof it can be shown that
(Exercise 4.3)
s0
1
ds
1
f (t) ⫽
w f (., s) s (t) 2 ⫹
Lf (., s0 ) s0 (t).
(4.45)
C 0
s
C s0
EXAMPLE 4.7
If is the second-order derivative of a Gaussian with a Fourier transform given by (4.35), then
the integration (4.42) yields
2 ˆ
()
⫽
√
3
3/2
1/4
2 ⫹
1
2 2
exp
⫺
.
2
2
(4.46)
Figure 4.8 displays and ˆ for ⫽ 1.
^
()
(t)
1.5
0.8
0.6
1
0.4
0.5
0.2
0
0
25
0
5
25
0
5
FIGURE 4.8
Scaling function associated to a Mexican-hat wavelet and its Fourier transform calculated with
(4.46).
4.3.2 Analytic Wavelets
To analyze the time evolution of frequency tones, it is necessary to use an analytic
wavelet to separate the phase and amplitude information of signals. The properties
of the resulting analytic wavelet transform are studied next.
107
108
CHAPTER 4 Time Meets Frequency
Analytic Signal
A function fa ∈ L 2 (R) is said to be analytic if its Fourier transform is zero for negative
frequencies:
f̂a () ⫽ 0
if ⬍ 0.
An analytic function is necessarily complex but is entirely characterized by its real
part. Indeed, the Fourier transform of its real part f ⫽ Re[ fa ] is
f̂ () ⫽
f̂a () ⫹ f̂a∗ (⫺)
,
2
and this relation can be inverted:
f̂a () ⫽
2 f̂ () if ⭓ 0
0
if ⬍ 0
(4.47)
The analytic part fa (t) of a signal f (t) is the inverse Fourier transform of f̂a ()
defined by (4.47).
Discrete Analytic Part
The analytic part fa [n] of a discrete signal f [n] of size N is also computed by setting
the negative frequency components of its discrete Fourier transform to zero. The
Fourier transform values at k ⫽ 0 and k ⫽ N /2 must be carefully adjusted so that
Re[ fa ] ⫽ f (Exercise 3.4):
⎧
if k ⫽ 0, N /2
⎨ f̂ [k]
(4.48)
f̂a [k] ⫽ 2 f̂ [k] if 0 ⬍ k ⬍ N /2
⎩
0
if N /2 ⬍ k ⬍ N
We obtain fa [n] by computing the inverse DFT.
EXAMPLE 4.8
The Fourier transform of
f (t) ⫽ a cos(0 t ⫹ ) ⫽
a
exp[i(0 t ⫹ )] ⫹ exp[⫺i(0 t ⫹ )]
2
is
f̂ () ⫽ a exp(i) ( ⫺ 0 ) ⫹ exp(⫺i) ( ⫹ 0 ) .
The Fourier transform of the analytic part computed with (4.47) is f̂a () ⫽ 2a exp(i)
( ⫺ 0 ) and therefore
fa (t) ⫽ a exp[i(0 t ⫹ )].
(4.49)
4.3 Wavelet Transforms
Time-Frequency Resolution
An analytic wavelet transform is calculated with an analytic wavelet :
w f (u, s) ⫽ f ,
u,s ⫽
⫹⬁
1
f (t) √
s
⫺⬁
∗
t ⫺u
s
dt.
(4.50)
Its time-frequency resolution depends on the time-frequency spread of the wavelet
atoms u,s . We suppose that is centered at 0, which implies that u,s is centered
at t ⫽ u. With the change of variable v ⫽ t⫺u
s , we verify that
⫹⬁
⫺⬁
(t ⫺ u)2 | u,s (t)|2 dt ⫽ s2 t2 ,
(4.51)
⫹⬁
with t2 ⫽ ⫺⬁ t 2 | (t)|2 dt. Since ˆ () is zero at negative frequencies, the center
frequency of ˆ is
1 ⫹⬁ ˆ
| ()|2 d.
2 0
⫽
(4.52)
u,s is a dilation of ˆ by 1/s:
The Fourier transform of
√
ˆ u,s () ⫽ s ˆ (s) exp(⫺iu).
(4.53)
Its center frequency therefore is /s. The energy spread of ˆ u,s around /s is
1
2
⫹⬁ ⫺
0
2
2 2
ˆ
u,s () d ⫽ 2 ,
s
s
(4.54)
with
2 ⫽
1
2
⫹⬁
( ⫺ )2 | ˆ ()|2 d.
0
Thus, the energy spread of a wavelet time-frequency atom u,s corresponds to a
Heisenberg box centered at (u, /s),of size st along time and /s along frequency.
The area of the rectangle remains equal to t at all scales but the resolution in
time and frequency depends on s, as illustrated in Figure 4.9.
An analytic wavelet transform defines a local time-frequency energy density PW f ,
which measures the energy of f in the Heisenberg box of each wavelet u,s centered
at (u, ⫽ /s):
PW f (u, ) ⫽ |W f (u, s)|2 ⫽ W f u,
This energy density is called a scalogram.
2
.
(4.55)
109
110
CHAPTER 4 Time Meets Frequency
|^u, s ()|
s
s
s t
|^u0, s0()|
s0 t
s0
u0, s0
u, s
0
s0
u
u0
t
FIGURE 4.9
Heisenberg boxes of two wavelets. Smaller scales decrease the time spread but increase the
frequency support, which is shifted toward higher frequencies.
Completeness
An analytic wavelet transform of f depends only on its analytic part fa .Theorem 4.5
derives a reconstruction formula and proves that energy is conserved for real signals.
Theorem 4.5. For any f ∈ L 2 (R),
W f (u, s) ⫽
1
W fa (u, s).
2
(4.56)
⫹⬁
If C ⫽ 0 ⫺1 | ˆ ()|2 d ⬍ ⫹⬁ and f is real, then
⫹⬁ ⫹⬁
2
ds
f (t) ⫽
Re
W f (u, s) s (t ⫺ u) du 2 ,
C
s
0
⫺⬁
and
2
f ⫽
C
2
⫹⬁ ⫹⬁
0
⫺⬁
|W f (u, s)|2 du
ds
.
s2
(4.57)
(4.58)
Proof. Let us first prove (4.56). The Fourier transform with respect to u of
fs (u) ⫽ w f (u, s) ⫽ f ¯ s (u)
is
f̂s () ⫽ f̂ ()
√ ∗
s ˆ (s).
Since ˆ () ⫽ 0 at negative frequencies, and f̂a () ⫽ 2f̂ () for ⭓ 0, we derive that
f̂s () ⫽
√
1
f̂a () s ˆ ∗ (s),
2
which is the Fourier transform of (4.56).
4.3 Wavelet Transforms
With the same derivations as in the proof of (4.37), one can verify that the inverse
wavelet formula reconstructs the analytic part of f :
fa (t) ⫽
1
C
⫹⬁ ⫹⬁
⫺⬁
0
w fa (u, s)
ds
s (t ⫺ u) 2 du.
s
(4.59)
Since f ⫽ Re[ fa ], inserting (4.56) proves (4.57).
An energy conservation for the analytic part fa is proved as in (4.38) by applying the
Plancherel formula:
⫹⬁
⫺⬁
| fa (t)|2 dt ⫽
1
C
⫹⬁ ⫹⬁
0
⫺⬁
|Wa f (u, s)|2 du
ds
.
s2
Since W fa (u, s) ⫽ 2W f (u, s) and fa 2 ⫽ 2 f 2 , equation (4.58) follows.
■
If f is real, the change of variable ⫽ 1/s in the energy conservation (4.58)
proves that
⫹⬁ ⫹⬁
2
f 2 ⫽
PW f (u, ) du d.
C 0
⫺⬁
It justifies the interpretation of a scalogram as a time-frequency energy density.
Wavelet Modulated Windows
An analytic wavelet can be constructed with a frequency modulation of a real and
symmetric window g. The Fourier transform of
(t) ⫽ g(t) exp(i t)
(4.60)
is ˆ () ⫽ ĝ( ⫺ ). If ĝ() ⫽ 0 for || ⬎ , then ˆ () ⫽ 0 for ⬍ 0. Therefore, is
analytic, as shown in Figure 4.10. Since g is real and even, ĝ is also real and symmetric. The center frequency of ˆ is therefore and
| ˆ ( )| ⫽ sup | ˆ ()| ⫽ ĝ(0).
(4.61)
∈R
A Gabor wavelet (t) ⫽ g(t) ei t is obtained with a Gaussian window:
2
1
⫺t
.
g(t) ⫽ 2 1/4 exp
2 2
( )
^()
^
g()
0
FIGURE 4.10
Fourier transform ˆ () of a wavelet (t) ⫽ g(t) exp(i t).
(4.62)
111
112
CHAPTER 4 Time Meets Frequency
The Fourier transform of this window is ĝ() ⫽ (4 2 )1/4 exp(⫺ 2 2 /2). If
2 2 1 then ĝ() ≈ 0 for || ⬎ . Thus, such Gabor wavelets are considered to
be approximately analytic.
EXAMPLE 4.9
The wavelet transform of f (t) ⫽ a exp(i0 t) is
√
√
W f (u, s) ⫽ a s ˆ ∗ (s0 ) exp(i0 u) ⫽ a s ĝ(s0 ⫺ ) exp(i0 u).
Observe that the normalized scalogram is maximum at ⫽ 0 :
2
1
0
PW f (u, ) ⫽ |W f (u, s)|2 ⫽ a2 ĝ
⫺ 1 .
s
EXAMPLE 4.10
The wavelet transform of a linear chirp f (t) ⫽ exp(iat 2 ) ⫽ exp[i(t)] is computed for a Gabor
wavelet with a Gaussian window given in (4.62). By using the Fourier transform of Gaussian
chirps (2.34), one can verify that
1/2
⫺ 2
4 2
|W f (u, s)|2
2
exp
(
⫺
2asu)
⫽
.
s
1 ⫹ 4s2 a2 4
1 ⫹ 4a2 s4 4
As long as 4a2 s4 4 1 at a fixed time u, the renormalized scalogram
Gaussian function of s that reaches its maximum at
(u) ⫽
s(u)
⫽ ⬘(u) ⫽ 2 a u.
⫺1 P
W f (u, ) is a
(4.63)
Section 4.4.3 explains why the amplitude is maximum at the instantaneous frequency ⬘(u).
EXAMPLE 4.11
Figure 4.11 displays the normalized scalogram ⫺1 PW f (u, ), and the complex phase
⌰W (u, ) of W f (u, s), for the signal f of Figure 4.3. The frequency bandwidth of wavelet
atoms is proportional to 1/s ⫽ / . The frequency resolution of the scalogram is therefore
finer than the spectrogram at low frequencies but coarser than the spectrogram at higher
frequencies. This explains why the wavelet transform produces interference patterns between
the high-frequency Gabor function at the abscissa t ⫽ 0.87 and the quadratic chirp at the
same location, whereas the spectrogram in Figure 4.3 separates them well.
4.3.3 Discrete Wavelets
Let f̄ (t) be a continuous time signal defined over [0, 1]. Let f [n] be the discrete
signal obtained by a low-pass filtering of f̄ and uniform sampling at intervals N ⫺1 .
4.3 Wavelet Transforms
/2
400
300
200
100
0
u
0
0.2
0.4
0.6
0.8
1
0.6
0.8
1
(a)
/ 2
400
300
200
100
0
u
0
0.2
0.4
(b)
FIGURE 4.11
(a) Normalized scalogram ⫺1 Pw f (u, ) computed from the signal in Figure 4.3; dark points
indicate large-amplitude coefficients. (b) Complex phase ⌰W (u, ) of W f (u, /), where the
modulus is nonzero.
Its discrete wavelet transform can only be calculated at scales N ⫺1 ⬍ s ⬍ 1, as
shown in Figure 4.7. It is calculated for s ⫽ a j , with a ⫽ 21/v , which provides v
intermediate scales in each octave [2 j , 2 j⫹1 ).
Let (t) be a wavelet with a support included in [⫺K /2, K /2]. For 1 ⭐ a j ⭐
N K ⫺1 , a discrete wavelet scaled by a j is defined by
1
n
aj
aj
j [n] ⫽ √
.
113
114
CHAPTER 4 Time Meets Frequency
This discrete wavelet has Ka j nonzero values on [⫺N /2, N /2]. The scale a j
is larger than 1; otherwise, the sampling interval may be larger than the wavelet
support.
Fast Transform
To avoid border problems, we treat f [n] and the wavelets j [n] as periodic signals of period N . The discrete wavelet transform can then be written as a circular
convolution with ¯ j [n] ⫽ j∗ [⫺n]:
N ⫺1
W f [n, a j ] ⫽
∗
j [m ⫺ n] ⫽ f f [m]
¯ j [n].
(4.64)
m⫽0
This circular convolution is calculated with the fast Fourier transform algorithm,
which requires O(N log2 N ) operations. If a ⫽ 21/v , there are v log2 (N /(2K )) scales
a j ∈ [2N ⫺1 , K ⫺1 ].The total number of operations to compute the wavelet transform
over all scales therefore is O(vN (log2 N )2 ) [408].
To compute the scalogram PW [n, ] ⫽ |W f [n, ]|2 ,we calculate W f [n, s] at any
scale s with a parabola interpolation. Let j be the closest integer to log2 s/log2 a,
and p(x) be the parabola such that
p( j ⫺ 1) ⫽ W f [n, a j⫺1 ],
p( j) ⫽ W f [n, a j ],
p( j ⫹ 1) ⫽ W f [n, a j⫹1 ].
A second-order interpolation computes
W f [n, s] ⫽ p
log2 s
.
log2 a
Parabolic interpolations are used instead of linear interpolations in order to more
precisely locate the ridges defined in Section 4.4.3.
Discrete Scaling Filter
A wavelet transform computed up to a scale a j is not a complete signal representation. It is necessary to add low frequencies Lf [n, a j ] corresponding to scales larger
than a j . A discrete and periodic scaling filter is computed by sampling the scaling
function (t) defined in (4.42):
n
1
J [n] ⫽ √ aj
aj
for
n ∈ [⫺N /2, N /2].
¯ J [n] ⫽ ∗ [⫺n]; the low frequencies are carried by
Let J
N ⫺1
Lf [n, a j ] ⫽
m⫽0
f [m] ∗J [m ⫺ n] ⫽ f ¯ J [n].
(4.65)
4.4 Time-Frequency Geometry of Instantaneous Frequencies
Reconstruction
An approximate inverse wavelet transform is implemented by discretizing the integral (4.45). Suppose that aI ⫽ 1 is the finest scale. Since ds/s2 ⫽ d loge s/s and the
discrete wavelet transform is computed along an exponential scale sequence {a j }j
with a logarithmic increment d loge s ⫽ loge a, we obtain
J
f [n] ≈
loge a
1
W f [., a j ] j
C
a
j⫽I
j [n] ⫹
1
Lf [., a j ] J [n].
C aj
(4.66)
The “.” indicates the variable over which the convolution is calculated. These circular convolutions are calculated using the FFT, with O(vN (log2 N )2 ) operations.
Analytic wavelet transforms are often computed over real signals f [n] that have
no energy at low frequencies. The scaling filter component is then negligible.
Theorem 4.5 shows that
⎛
⎞
J
2 loge a
1
f [n] ≈
Re ⎝
W f [., a j ] j [n]⎠ .
(4.67)
j
C
a
j⫽I
The error introduced by the discretization of scales decreases when the number
v of voices per octave increases. However, the approximation of continuous time
convolutions with discrete convolutions also creates high-frequency errors. Perfect
reconstructions are obtained with a more careful design of the reconstruction filters
(Exercise 4.3). Section 5.2.2 describes an exact inverse wavelet transform computed
at dyadic scales a j ⫽ 2j .
4.4 TIME-FREQUENCY GEOMETRY OF INSTANTANEOUS
FREQUENCIES
When listening to music, we perceive several frequencies that change with time.
In music, it is associated to the geometric perception of “movements.” This notion
of instantaneous frequency remains to be defined. The time variation of several
instantaneous frequencies is measured with local maxima of windowed Fourier
transforms and wavelet transforms.They define a geometric time-frequency support
from which signal approximations are recovered. Audio processing is implemented
by modifying this time-frequency support.
4.4.1 Analytic Instantaneous Frequency
The notion of instantaneous frequency is not well defined. It can, however, be
uniquely specified with the signal analytic part. A cosine modulation
f (t) ⫽ a cos(0 t ⫹ 0 ) ⫽ a cos (t)
115
116
CHAPTER 4 Time Meets Frequency
has a frequency 0 that is the derivative of the phase (t) ⫽ 0 t ⫹ 0 . To generalize
this notion, real signals f are written as an amplitude a(t) modulated with a timevarying phase (t):
f (t) ⫽ a(t) cos (t),
with
a(t) ⭓ 0.
(4.68)
The instantaneous frequency can be defined as a positive derivative of the phase:
(t) ⫽ ⬘(t) ⭓ 0.
The derivative is chosen to be positive by adapting the sign of (t). However, for
a given f (t), there are many possible choices of a(t) and (t) to satisfy (4.68), so
(t) is not uniquely defined relative to f .
A particular decomposition (4.68) is obtained from the analytic part fa of f ,
which has a Fourier transform defined in (4.47) by
f̂a () ⫽
2 f̂ () if ⭓ 0
0
if ⬍ 0
(4.69)
This complex signal is represented by separating the modulus and the complex
phase:
fa (t) ⫽ a(t) exp[i(t)].
(4.70)
Since f ⫽ Re[ fa ], it follows that
f (t) ⫽ a(t) cos (t).
We call a(t) the analytic amplitude of f (t) and ⬘(t) its instantaneous frequency;
they are uniquely defined.
EXAMPLE 4.12
If f (t) ⫽ a(t) cos(0 t ⫹ 0 ), then
f̂ () ⫽
1
exp(i0 ) â( ⫺ 0 ) ⫹ exp(⫺i0 ) â( ⫹ 0 ) .
2
If the variations of a(t) are slow compared to the period 2/0 , which is achieved by requiring
that the support of â be included in [⫺0 , 0 ], then
f̂a () ⫽ â( ⫺ 0 ) exp(i0 ),
so fa (t) ⫽ a(t) exp[i(0 t ⫹ 0 )].
If a signal f is the sum of two sinusoidal waves:
f (t) ⫽ a cos(1 t) ⫹ a cos(2 t),
then
fa (t) ⫽ a exp(i1 t) ⫹ a exp(i2 t) ⫽ 2a cos
1
(1 ⫺ 2 ) t
2
exp
i
(1 ⫹ 2 ) t .
2
4.4 Time-Frequency Geometry of Instantaneous Frequencies
The instantaneous frequency is ⬘(t) ⫽ (1 ⫹ 2 )/2 and the amplitude is
1
a(t) ⫽ 2a cos
(1 ⫺ 2 ) t .
2
This result is not satisfying because it does not reveal that the signal includes two
sinusoidal waves of the same amplitude; it measures an average frequency value.
The next sections explain how to measure the instantaneous frequencies of several
spectral components by separating them with a windowed Fourier transform or a
wavelet transform. We first describe two important applications of instantaneous
frequencies.
Frequency Modulation
In signal communications, information can be transmitted through the amplitude
a(t) (amplitude modulation) or the instantaneous frequency ⬘(t) (frequency modulation) [60]. Frequency modulation is more robust in the presence of additive
Gaussian white noise. In addition, it better resists multipath interferences, which
destroy the amplitude information. A frequency modulation sends a message m(t)
through a signal
f (t) ⫽ a cos (t)
with
⬘(t) ⫽ 0 ⫹ k m(t).
The frequency bandwidth of f is proportional to k. This constant is adjusted
depending on transmission noise and available bandwidth.At reception,the message
m(t) is restored with a frequency demodulation that computes the instantaneous
frequency ⬘(t) [120].
Additive Sound Models
Musical sounds and voiced speech segments can be modeled with sums of sinusoidal
partials:
K
K
f (t) ⫽
fk (t) ⫽
k⫽1
ak (t) cos k (t),
(4.71)
k⫽1
where ak and ⬘k vary slowly [413, 414]. Such decompositions are useful for pattern
recognition and for modifying sound properties [339]. Sections 4.4.2 and 4.4.3
explain how to compute ak and the instantaneous frequency k ⬘ and reconstruct
signals from this information.
Reducing or increasing the duration of a sound f by a factor in time is used
for radio broadcasting to adjust recorded sequences to a precise time schedule.
A scaling f (t) transforms each k (t) in k (t) and therefore ⬘k (t) in ⬘k (t). For
sound reduction, with ⬎ 1 all frequencies are thus increased. To avoid modifying
the values of k ⬘ and ak , a new sound is synthesized:
K
f (t) ⫽
ak ( t) cos
k⫽1
1
k ( t) .
(4.72)
117
118
CHAPTER 4 Time Meets Frequency
The partials of f at t ⫽ t0 / and the partials of f at t ⫽ t0 have the same amplitudes
and the same instantaneous frequencies, therefore the properties of these sounds
are perceived as identical.
A frequency transposition with the same duration is calculated by dividing each
phase by a constant in order to shift the sound harmonics:
K
f (t) ⫽
bk (t) cos k (t)/ .
(4.73)
k⫽1
The instantaneous frequency of each partial is now k ⬘(t)/. To maintain the sound
properties, the amplitudes bk (t) must be adjusted so as not to modify the global
frequency envelope F (t, ) of the harmonics:
ak (t) ⫽ F t, k ⬘(t)
and bk (t) ⫽ F t, k ⬘(t)/ .
(4.74)
Many types of sounds—musical instruments or speech—are produced by an
excitation that propagates across a wave guide. Locally, F (t, ) is the transfer function of the wave guide. In speech processing, it is called a formant. This transfer
function is often approximated with an autoregressive filter of order M, in which
case:
F (t, ) ⫽ M⫺1
C
m⫽0 cm e
⫺im
.
(4.75)
The parameters cm are identified with (4.74) from the ak and the bk are then derived
with (4.74) and (4.75).
4.4.2 Windowed Fourier Ridges
The spectrogram PS f (u, ) ⫽ |S f (u, )|2 measures the energy of f in a timefrequency neighborhood of (u, ). The ridge algorithm computes the signal instantaneous frequencies and amplitudes from the local maxima of PS f (u, ).These local
maxima define a geometric support in the time-frequency plane. Modifications of
sound durations or frequency transpositions are computed with time or frequency
dilations of the ridge support.
Time-frequency ridges were introduced by Delprat, Escudié, Guillemain,
Kronland-Martinet, Tchamitchian, and Torrésani [66, 204] to analyze musical
sounds and are used to represent time-varying frequency tones for a wide range
of signals [66, 289].
The windowed Fourier transform is computed with a symmetric window g(t) ⫽
g(⫺t) that has a support equal to [⫺1/2, 1/2]. The Fourier transform ĝ is a real
symmetric function. We suppose that |ĝ()| ⭐ ĝ(0) for all ∈ R, and that ĝ(0) ⫽
1/2
⫺1/2 g(t) dt is on the order of 1.Table 4.1 listed several examples of such windows.
The window g is normalized so that g ⫽ 1. For a fixed scale s, gs (t) ⫽ s⫺1/2 g(t/s)
4.4 Time-Frequency Geometry of Instantaneous Frequencies
has a support of size s and a unit norm. The corresponding windowed Fourier
atoms are
gs,u, (t) ⫽ gs (t ⫺ u) eit ,
and the resulting windowed Fourier transform is
⫹⬁
S f (u, ) ⫽ f , gs,u, ⫽
f (t) gs (t ⫺ u) e⫺it dt.
⫺⬁
(4.76)
Theorem 4.6 relates S f (u, ) to the instantaneous frequency of f .
Theorem 4.6. Let f (t) ⫽ a(t) cos (t). If ⭓ 0, then
√
s
a(u) exp(i[(u) ⫺ u]) ĝ(s[ ⫺ ⬘(u)]) ⫹ (u, ) .
f , gs,u, ⫽
2
(4.77)
The corrective term satisfies
|(u, )| ⭐ a,1 ⫹ a,2 ⫹ ,2 ⫹
|ĝ()|
(4.78)
s |a⬘(u)|
s2 |a⬘⬘(t)|
, a,2 ⭐ sup
,
|a(u)|
|t⫺u|⭐s/2 |a(u)|
(4.79)
sup
||⭓s⬘(u)
with
a,1 ⭐
and if s |a⬘(u)| |a(u)|⫺1 ⭐ 1, then
,2 ⭐
sup
|t⫺u|⭐s/2
s2 |⬘⬘(t)|.
(4.80)
If ⫽ ⬘(u), then
a,1 ⫽
s |a⬘(u)| ĝ⬘ 2 s ⬘(u) .
|a(u)|
(4.81)
Proof. Observe that
f , gs,u, ⫽
⫽
⫹⬁
⫺⬁
1
2
a(t) cos (t) gs (t ⫺ u) exp(⫺it) dt
⫹⬁
⫺⬁
a(t) exp[i(t)] ⫹ exp[⫺i(t)] gs (t ⫺ u) exp[⫺it] dt
⫽ I () ⫹ I (⫺).
We first concentrate on
I () ⫽
⫽
1
2
⫹⬁
⫺⬁
⫹⬁
1
2
⫺⬁
a(t) exp[i(t)] gs (t ⫺ u) exp(⫺it) dt
a(t ⫹ u) ei(t⫹u) gs (t) exp[⫺i(t ⫹ u)] dt.
119
120
CHAPTER 4 Time Meets Frequency
This integral is computed by using second-order Taylor expansions:
t2
(t) with |(t)| ⭐ sup |a⬘⬘(h)|
2
h∈[u,t⫹u]
t2
(t ⫹ u) ⫽ (u) ⫹ t ⬘(u) ⫹ (t) with |(t)| ⭐ sup |⬘⬘(h)|.
2
h∈[u,t⫹u]
a(t ⫹ u) ⫽ a(u) ⫹ t a⬘(u) ⫹
We get
2 exp ⫺i((u) ⫺ u) I ()
⫹⬁
t2
⫽
a(u) gs (t) exp ⫺it( ⫺ ⬘(u)) exp i (t) dt
2
⫺⬁
⫹⬁
t2
⫹
a⬘(u) t gs (t) exp ⫺it( ⫺ ⬘(u)) exp i (t) dt
2
⫺⬁
⫹⬁
1
⫹
(t) t 2 gs (t) exp ⫺i(t ⫹ (u) ⫺ (t ⫹ u)) dt.
2 ⫺⬁
A first-order Taylor expansion of exp(ix) gives
t2
t2
exp i (t) ⫽ 1 ⫹ (t) (t), with |(t)| ⭐ 1.
2
2
Since
⫹⬁
(4.82)
√
gs (t) exp[⫺it( ⫺ ⬘(u))] dt ⫽ s ĝ(s[ ⫺ ⬘(u)]) ,
⫺⬁
inserting (4.82) in the expression of I () yields
√
I () ⫺ s a(u) exp[i((u) ⫺ u)] ĝ( ⫺ ⬘(u))
2
√
s |a(u)| ⫹
(a,1 ⫹ a,2 ⫹ ,2 ),
⭐
4
(4.83)
1
t √ gs (t) exp[⫺it( ⫺ ⬘(u))] dt ,
s
⫺⬁
(4.84)
with
⫹⬁
2|a⬘(u)| ⫹
a,1
⫽
|a(u)|
⫹⬁
1
t 2 |(t)| √ | gs (t)| dt,
s
⫺⬁
⫹⬁
1
t 2 |(t)| √ | gs (t)| dt
,2 ⫽
s
⫺⬁
⫹⬁
|a⬘(u)|
1
⫹
|t 3 | |(t)| √ | gs (t)|dt.
|a(u)| ⫺⬁
s
a,2 ⫽
Applying (4.83) to I (⫺) gives
√
|I (⫺)| ⭐
s |a(u)|
|ĝ( ⫹ ⬘(u))| ⫹
2
√
s |a(u)| ⫺
(a,1 ⫹ a,2 ⫹ ,2 ),
4
(4.85)
(4.86)
4.4 Time-Frequency Geometry of Instantaneous Frequencies
with
⫺
a,1
⫽
2|a⬘(u)| ⫹⬁ 1
.
t
(t)
exp[⫺it(
⫹
⬘(u))]
dt
g
√
s
|a(u)|
s
⫺⬁
(4.87)
Since ⭓ 0 and ⬘(u) ⭓ 0, we derive that
| ĝ(s[ ⫹ ⬘(u)])| ⭐
therefore
|ĝ()|;
sup
||⭓s⬘(u)
√
I () ⫹ I (⫺) ⫽
s
a(u) exp[i((u) ⫺ u)] ĝ s[ ⫺ ⬘(u)] ⫹ (u, ) ,
2
with
(u, ) ⫽
⫹ ⫹ ⫺
a,1
a,1
2
⫹ a,2 ⫹ ,2 ⫹
sup
||⭓s|⬘(u)|
|ĝ()|.
⫹ ⫹ ⫺ )/2. Since g (t) ⫽
Let us now verify the upper bound (4.79) for a,1 ⫽ (a,1
s
a,1
s⫺1/2 g(t/s), a simple calculation shows that for n ⭓ 0
⫹⬁
1
|t|n √ | gs (t)| dt ⫽ sn
s
⫺⬁
1/2
⫺1/2
|t|n | g(t)| dt ⭐
sn
sn
g2 ⫽ n .
n
2
2
(4.88)
Inserting this for n ⫽ 1 in (4.84) and (4.87) gives
a,1 ⫽
⫹ ⫹ ⫺
a,1
a,1
2
⭐
s |a⬘(u)|
.
|a(u)|
The upper bounds (4.79) and (4.80) of the second-order terms a,2 and ,2 are
obtained by observing that the remainder (t) and (t) of theTaylor expansion of a(t ⫹ u)
and (t ⫹ u) satisfy
sup |(t)| ⭐
|t|⭐s/2
sup
|t⫺u|⭐s/2
|a⬘⬘(t)|,
sup |(t)| ⭐
|t|⭐s/2
sup
|t⫺u|⭐s/2
|⬘⬘(t)|.
(4.89)
Inserting this in (4.85) yields
a,2 ⭐
s2 |a⬘⬘(t)|
.
|t⫺u|⭐s/2 |a(u)|
sup
When s |a⬘(u)||a(u)|⫺1 ⭐ 1, replacing |(t)| by its upper bound in (4.86) gives
,2 ⭐
1
s |a⬘(u)| 1⫹
sup s2 |⬘⬘(t)| ⭐ sup s2 |⬘⬘(t)|.
2
|a(u)| |t⫺u|⭐s/2
|t⫺u|⭐s/2
Let us finally compute a when ⫽ ⬘(u). Since g(t) ⫽ g(⫺t), we derive from (4.84)
that
⫹
a,1
⫽
2|a⬘(u)| ⫹⬁ 1
⫽ 0.
t
(t)
dt
g
√
s
|a(u)|
s
⫺⬁
121
122
CHAPTER 4 Time Meets Frequency
We also derive from (2.22) that the Fourier transform of t √1s gs (t) is i s ĝ⬘(s), so (4.87)
gives
1 ⫺
s|a⬘(u)|
a ⫽ a,1
⫽
|ĝ⬘(2s⬘(u))|.
2
|a(u)|
■
Delprat et al. [204] give a different proof of a similar result when g(t) is a Gaussian, using a stationary phase approximation. If we can neglect the corrective term
(u, ), we will see that (4.77) enables us to measure a(u) and ⬘(u) from S f (u, ).
This implies that the decomposition f (t) ⫽ a(t) cos (t) is uniquely defined. By
reviewing the proof of Theorem 4.6, one can verify that a and ⬘ are the analytic
amplitude and instantaneous frequencies of f .
The expressions (4.79, 4.80) show that the three corrective terms a,1 , a,2 , and
,2 are small if a(t) and ⬘(t) have small relative variations over the support of
window gs . Let ⌬ be the bandwidth of ĝ defined by
|ĝ()|
1
for || ⭓ ⌬.
(4.90)
The term
sup
||⭓s|⬘(u)|
|ĝ()| of (u, )
is negligible if
⬘(u) ⭓
⌬
.
s
Ridge Points
Let us suppose that a(t) and ⬘(t) have small variations over intervals of size s and
that ⬘(t) ⭓ ⌬/s so that the corrective term (u, ) in (4.77) can be neglected.
Since | ĝ()| is maximum at ⫽ 0, (4.77) shows that for each u the spectrogram |S f (u, )|2 ⫽ | f , gs,u, |2 is maximum at (u) ⫽ ⬘(u). The corresponding
time-frequency points (u, (u)) are called ridges. At ridge points, (4.77) becomes
√
s
S f (u, ) ⫽
a(u) exp(i[(u) ⫺ u]) ĝ(0) ⫹ (u, ) .
(4.91)
2
Theorem 4.6 proves that the (u, ) is smaller at a ridge point because the first-order
term a,1 becomes negligible in (4.81).This is shown by verifying that | ĝ⬘(2s⬘(u))|
is negligible when s⬘(u) ⭓ ⌬. At ridge points, the second-order terms a,2 and
,2 are predominant in (u, ).
The ridge frequency gives the instantaneous frequency (u) ⫽ ⬘(u) and the
amplitude is calculated by
2 S f u, (u) a(u) ⫽ √
.
(4.92)
s | ĝ(0)|
Let ⌰S (u, ) be the complex phase of S f (u, ). If we neglect the corrective term,
then (4.91) proves that ridges are also stationary phase points:
⌰S (u, )
⫽ ⬘(u) ⫺ ⫽ 0.
u
Testing the stationarity of the phase locates the ridges more precisely.
4.4 Time-Frequency Geometry of Instantaneous Frequencies
Multiple Frequencies
When the signal contains several spectral lines having frequencies sufficiently apart,
the windowed Fourier transform separates each of these components and the ridges
detect the evolution in time of each spectral component. Let us consider
f (t) ⫽ a1 (t) cos 1 (t) ⫹ a2 (t) cos 2 (t),
where ak (t) and ⬘k (t) have small variations over intervals of size s and s⬘k (t) ⭓ ⌬.
Since the windowed Fourier transform is linear, we apply (4.77) to each spectral
component and neglect the corrective terms:
√
s
S f (u, ) ⫽
a1 (u) ĝ(s[ ⫺ 1 ⬘(u)]) exp(i[1 (u) ⫺ u])
2
(4.93)
√
s
a2 (u) ĝ(s[ ⫺ 2 ⬘(u)]) exp(i[2 (u) ⫺ u]).
⫹
2
The two spectral components are discriminated if for all u
ĝ(s|1 ⬘(u) ⫺ 2 ⬘(u)|)
1,
(4.94)
which means that the frequency difference is larger than the bandwidth of ĝ(s):
|1 ⬘(u) ⫺ 2 ⬘(u)| ⭓
⌬
.
s
(4.95)
In this case, when ⫽ 1 ⬘(u), the second term of (4.93) can be neglected and
the first term generates a ridge point from which we may recover ⬘1 (u) and a1 (u)
by using (4.92). Similarly, if ⫽ 2 ⬘(u), the first term can be neglected and we have
a second ridge point that characterizes 2 ⬘(u) and a2 (u). The ridge points are distributed along two time-frequency lines, (u) ⫽ 1 ⬘(u) and (u) ⫽ 2 ⬘(u). This result
is valid for any number of time-varying spectral components, as long as the distance
between any two instantaneous frequencies satisfies (4.95). If two spectral lines are
too close, they interfere, thus destroying the ridge pattern.
Time-Frequency Ridge Support
The number of instantaneous frequencies is typically unknown.The ridge support ⌳
therefore is defined as the set of all (u, )—the local maxima of |S f (u, )|2 for u fixed
and varying and points of stationary phase ⌰S (u, )/u ≈ 0. This support is often
reduced by removing small-ridge amplitudes |S f (u, )| that are mostly dominated
by the noise, or because smaller ridges may be “shadows” of other instantaneous
frequencies created by the side lobes of ĝ().
Let {gs,u, }(u,)∈⌳ be the set of ridge atoms. For discrete signals, there is a finite
number of ridge points,that define a frame of the space V⌳ they generate.A ridge signal approximation is computed as an orthogonal projection of f on V⌳ . Section 5.1.3
shows that it is obtained with the dual frame {g̃⌳,u, }(u,)∈⌳ of {gs,u, }(u,)∈⌳ in V⌳ :
f⌳ ⫽
S f (u, ) g̃⌳,u, .
(u,)∈⌳
(4.96)
123
124
CHAPTER 4 Time Meets Frequency
/2
500
400
300
200
100
0
u
0
0.2
0.4
0.6
0.8
1
FIGURE 4.12
Support of larger-amplitude ridges calculated from the spectrogram in Figure 4.3. These
ridges give the instantaneous frequencies of the linear and quadratic chirps and of lowand high-frequency transients at t ⫽ 0.5 and t ⫽ 0.87.
The dual-synthesis algorithm of Section 5.1.3 computes this orthogonal projection
by inverting the symmetric operator
Lh ⫽
h, gs,u, gs,u, .
(4.97)
(u,)∈⌳
The inversion requires iteration of this operator many times. If there are only a
few ridge points,then (4.97) is efficiently computed by evaluating the inner product
and the sum for just (u, ) ∈ ⌳. If there are many ridge points, it is more efficient to
compute the full windowed Fourier transform Sh(u, ) ⫽ h, gs,u, with the FFT
algorithm (described in Section 4.2.3); set all coefficients to zero for (u, ) ∈
/⌳
and apply the fast inverse windowed Fourier transform over all coefficients. The
normalization factor N ⫺1 in (4.28) must be removed (set to 1) to implement (4.97).
Figure 4.12 displays the ridge support computed from the modulus and phase of
the windowed Fourier transform shown in Figure 4.3. For t ∈ [0.4, 0.5], the instantaneous frequencies of the linear chirp and the quadratic chirps are close, the
frequency resolution of the window is not sufficient to discriminate them. As a
result, the ridges detect a single average instantaneous frequency.
Time-Scaling and Frequency Transpositions
A reduction of sound duration by a factor is implemented, according to the
deformation model (4.72), by dilating the ridge support ⌳ in time:
⌳ ⫽ {(u, ) : (u, ) ∈ ⌳}.
(4.98)
4.4 Time-Frequency Geometry of Instantaneous Frequencies
The windowed Fourier coefficients c(u, ) in ⌳ are derived from the modulus and
phase of ridge coefficients
᭙(v, ) ∈ ⌳ , c(v, ) ⫽ |S f (v, )| ei⌰S (v,)/ .
(4.99)
The scaled signal is reconstructed from these coefficients, with the dual-synthesis
algorithm of Section 5.1.3, as in (4.96):
f ⫽
c(v, ) g̃⌳ ,v, .
(v,)∈⌳
Similarly, a sound transposition is implemented according to the transposition
model (4.73) by dilating the ridge support ⌳ in frequency:
⌳ ⫽ {(u, ) : (u, ) ∈ ⌳}.
(4.100)
The transposed coefficient amplitudes |c(u, )| in ⌳ are calculated with (4.74).
At any fixed time u0 , the ridge amplitudes at all frequencies {a(u0 , ) ⫽
|S f (u0 , )}(u0 ,)∈⌳ are mapped to transposed amplitudes {b(u0 , )}(u0 , )∈⌳ at
frequencies ⫽ / by computing a frequency envelope. The resulting ridge
coefficients are
᭙(u, ) ∈ ⌳ , c(u, ) ⫽ b(u, ) ei⌰S (u, )/ .
(4.101)
The transposed signal is reconstructed with the dual-synthesis algorithm of
Section 5.1.3:
f ⫽
c(u, ) g̃⌳ ,u, .
(u, )∈⌳
Choice of Window
The measurement of instantaneous frequencies at ridge points is valid only if the
size s of the window gs is sufficiently small so that the second-order terms a,2 and
,2 in (4.79) and (4.80) are small:
s2 |ak ⬘⬘(t)|
|t⫺u|⭐s/2 |ak (u)|
sup
1 and
sup
|t⫺u|⭐s/2
s2 |k ⬘⬘(t)|
1.
(4.102)
On the other hand, the frequency bandwidth ⌬/s must also be sufficiently
small to discriminate consecutive spectral components in (4.95).The window scale
s therefore must be adjusted as a trade-off between both constraints.
Table 4.1 listed the spectral parameters of several windows of compact support.
For instantaneous frequency detection, it is particularly important to ensure that ĝ
has negligible side lobes at ⫾ 0 , as illustrated by Figure 4.4. The reader can verify
with (4.77) that these side lobes “react” to an instantaneous frequency ⬘(u) by
creating shadow maxima of |S f (u, )|2 at frequencies ⫽ ⬘(u) ⫾ 0 . The ratio of
the amplitude of these shadow maxima to the amplitude of the main local maxima
at ⫽ ⬘(u) is | ĝ(0 )|2 | ĝ(0)|⫺2 . They can be removed by thresholding or by testing
the stationarity of the phase.
125
126
CHAPTER 4 Time Meets Frequency
EXAMPLE 4.13
The sum of two parallel linear chirps
f (t) ⫽ a1 cos(bt 2 ⫹ ct) ⫹ a2 cos(bt 2 )
(4.103)
has two instantaneous frequencies 1 ⬘(t) ⫽ 2bt ⫹ c and 2 ⬘(t) ⫽ 2bt. Figure 4.13 gives a
numerical example. The window gs has enough frequency resolution to discriminate both
chirps if
|1 ⬘(t) ⫺ 2 ⬘(t)| ⫽ |c| ⭓
⌬
.
s
(4.104)
Its time support is small enough if
s2 |1 ⬘⬘(u)| ⫽ s2 |2 ⬘⬘(u)| ⫽ 2 b s2
1.
(4.105)
Conditions (4.104) and (4.105) prove that there exists an appropriate window g, if and only if,
c
√
b
⌬.
(4.106)
Since g is a smooth window with a support [⫺1/2, 1/2], its frequency bandwidth ⌬ is on
the order of 1. The linear chirps in Figure 4.13 satisfy (4.106). Ridges are computed with the
truncated Gaussian window of Table 4.1, with s ⫽ 0.5.
EXAMPLE 4.14
The hyperbolic chirp
f (t) ⫽ cos
for 0 ⭐ t ⬍ has an instantaneous frequency
⬘(t) ⫽
⫺t
,
( ⫺ t)2
which varies quickly when t is close to . The instantaneous frequency of hyperbolic chirps
goes from 0 to ⫹⬁ in a finite time interval. This is particularly useful for radars. Such chirps
are also emitted by the cruise sonars of bats [204].
The instantaneous frequency of hyperbolic chirps cannot be estimated with a windowed
Fourier transform because for any fixed window size instantaneous frequency varies too quickly
at high frequencies. When u is close enough to , then (4.102) is not satisfied because
s2 |⬘⬘(u)| ⫽
s2 ⬎ 1.
( ⫺ u)3
Figure 4.14 shows a signal that is a sum of two hyperbolic chirps:
1
2
⫹ a2 cos
,
f (t) ⫽ a1 cos
1 ⫺ t
2 ⫺ t
(4.107)
4.4 Time-Frequency Geometry of Instantaneous Frequencies
f (t)
0.5
0
⫺0.5
t
0
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
0.6
0.8
1
/ 2
500
400
300
200
100
0
u
0
(a)
/ 2
500
400
300
200
100
0
u
0
0.2
0.4
(b)
FIGURE 4.13
Sum of two parallel linear chirps: (a) Spectrogram PS f (u, ) ⫽ |S f (u, )|2 . (b) Ridge support
calculated from the spectrogram.
127
128
CHAPTER 4 Time Meets Frequency
f (t )
1
0
⫺1
t
0
0.2
0.4
0.6
0.8
0.4
0.6
0.8
0.6
0.8
1
/2
500
400
300
200
100
0
u
0
0.2
1
(a)
/2
500
400
300
200
100
0
u
0
0.2
0.4
1
(b)
FIGURE 4.14
Sum of two hyperbolic chirps: (a) Spectrogram PS f (u, ). (b) Ridge support calculated from
the spectrogram.
4.4 Time-Frequency Geometry of Instantaneous Frequencies
with 1 ⫽ 0.68 and 2 ⫽ 0.72. At the beginning of the signal, the two chirps have close instantaneous frequencies that are discriminated by the windowed Fourier ridge computed with a
large window. When getting close to 1 and 2 , the instantaneous frequency varies too quickly
relative to window size. The resulting ridges cannot follow the instantaneous frequencies.
4.4.3 Wavelet Ridges
Windowed Fourier atoms have a fixed scale and thus cannot follow the instantaneous
frequency of rapidly varying events such as hyperbolic chirps. In contrast,an analytic
wavelet transform modifies the scale of its time-frequency atoms.The ridge algorithm
of Delprat et al. [204] is extended to analytic wavelet transforms to accurately
measure frequency tones that are rapidly changing at high frequencies.
An approximately analytic wavelet is constructed in (4.60) by multiplying a
window g with a sinusoidal wave:
(t) ⫽ g(t) exp(i t).
As in the previous section, g is a symmetric window, with a support equal to
[⫺1/2, 1/2], and a unit norm g ⫽ 1. Let ⌬ be the bandwidth of ĝ defined in
(4.90). If ⬎ ⌬, then
ˆ () ⫽ ĝ( ⫺ )
᭙ ⬍ 0,
1.
The wavelet is not strictly analytic because its Fourier transform is not exactly
equal to zero at negative frequencies.
Dilated and translated wavelets can be rewritten as
1
t ⫺u
⫽ gs,u, (t) exp(⫺iu) ,
u,s (t) ⫽ √
s
s
with ⫽ /s and
√
gs,u, (t) ⫽ s g
t ⫺u
s
exp(it).
The resulting wavelet transform uses time-frequency atoms similar to those of a
windowed Fourier transform (4.76); however, in this case scale s varies over R⫹
while ⫽ /s:
W f (u, s) ⫽ f ,
u,s ⫽ f , gs,u, exp(iu).
Theorem 4.6 computes f , gs,u, when f (t) ⫽ a(t) cos (t), which gives
√
s
W f (u, s) ⫽
a(u) exp[i(u)] ĝ(s[ ⫺ ⬘(u)]) ⫹ (u, ) .
(4.108)
2
The corrective term (u, ) is negligible if a(t) and ⬘(t) have small variations over
the support of u,s and if ⬘(u) ⭓ ⌬/s.
129
130
CHAPTER 4 Time Meets Frequency
Ridge Detection
Instantaneous frequency is measured from ridges defined over the wavelet transform. The normalized scalogram defined by
is calculated with (4.108):
|W f (u, s)|2
s
PW f (u, ) ⫽
PW f (u, ) ⫽
for ⫽ /s
2
1 2
⬘(u) 1⫺
a (u) ĝ
⫹(u, ) .
4
Since | ĝ()| is maximum at ⫽ 0, if we neglect (u, ), this expression shows that
the scalogram is maximum at
s(u)
⫽ (u) ⫽ ⬘(u).
(4.109)
The corresponding points (u, (u)) are called wavelet ridges.The analytic amplitude
is given by
⫺1 P f (u, )
2
w
a(u) ⫽
.
(4.110)
| ĝ(0)|
The complex phase of W f (u, s) in (4.108) is ⌰W (u, ) ⫽ (u); at ridge points,
⌰W (u, )
⫽ ⬘(u) ⫽ .
u
(4.111)
When ⫽ ⬘(u),the first-order term a,1 calculated in (4.81) becomes negligible.
The corrective term is then dominated by a,2 and ,2 . To simplify the expression,
we approximate the supremum of a⬘⬘ and ⬘⬘ in the neighborhood of u by their
value at u. Since s ⫽ / ⫽ /⬘(u), (4.79) and (4.80) imply that these second-order
terms become negligible if
|a⬘⬘(u)|
|⬘(u)|2 |a(u)|
2
1 and
2 |⬘⬘(u)|
|⬘(u)|2
1.
(4.112)
The presence of ⬘ in the denominator proves that a⬘ and ⬘ must have slow
variations if ⬘ is small but may vary much more quickly for large instantaneous
frequencies.
Multispectral Estimation
Suppose that f is a sum of two spectral components:
f (t) ⫽ a1 (t) cos 1 (t) ⫹ a2 (t) cos 2 (t).
As in (4.94),we verify that the second instantaneous frequency ⬘2 does not interfere
with the ridge of ⬘1 if the dilated window has a sufficient spectral resolution at
ridge scale s ⫽ / ⫽ /⬘1 (u):
ĝ(s|1 ⬘(u) ⫺ 2 ⬘(u)|)
1.
(4.113)
4.4 Time-Frequency Geometry of Instantaneous Frequencies
Since the bandwidth of ĝ() is ⌬, this means that
|1 ⬘(u) ⫺ 2 ⬘(u)| ⌬
⭓
.
⬘1 (u)
(4.114)
Similarly, the first spectral component does not interfere with the second ridge
located at s ⫽ / ⫽ /⬘2 (u) if
|1 ⬘(u) ⫺ 2 ⬘(u)| ⌬
⭓
.
⬘2 (u)
(4.115)
To separate spectral lines that have close instantaneous frequencies, these conditions prove that the wavelet must have a small octave bandwidth ⌬/ . The bandwidth ⌬ is a fixed constant on the order of 1. The frequency is a free parameter
that is chosen as a trade-off between the time-resolution condition (4.112) and the
frequency bandwidth conditions (4.114) and (4.115).
Figure 4.15 displays the ridges computed from the normalized scalogram and
the wavelet phase shown in Figure 4.11. The ridges of the high-frequency transient
located at t ⫽ 0.87 have oscillations because of interference with the linear chirp
above. The frequency-separation condition (4.114) is not satisfied. This is also the
case in the time interval [0.35, 0.55], where the instantaneous frequencies of the
linear and quadratic chirps are too close.
Ridge Support and Processing
The wavelet ridge support ⌳ of f is the set of all ridge points (u, s) in the time-scale
plane or (u, ⫽ /s) in the time-frequency plane, corresponding to local maxima of
/ 2
400
300
200
100
0
u
0
0.2
0.4
0.6
0.8
FIGURE 4.15
Ridge support calculated from the scalogram shown in Figure 4.11; compare with the
windowed Fourier ridges in Figure 4.12.
1
131
132
CHAPTER 4 Time Meets Frequency
|W f (u, s)|/s for a fixed u and s varying, where the complex phase ⌰W (u, s) nearly
satisfies (4.111).
As in the windowed Fourier case,an orthogonal projection is computed over the
space V⌳ generated by the ridge wavelets { u,s }(u,s)∈⌳ by using the dual wavelet
frame { ˜ ⌳,u,s }(u,s)∈⌳ :
W f (u, s) ˜ ⌳,u,s .
f⌳ ⫽
(4.116)
(u,s)∈⌳
It is implemented with the dual-synthesis algorithm of Section 5.1.3 by inverting
the symmetric operator
Lh ⫽
h,
u,s u,s ,
(4.117)
(u,s)∈⌳
which is performed by computing this operator many times. When there are many
ridge points, instead of computing this sum only for (u, s) ∈ ⌳, it may require less
operations to compute w f (u, s) with the fast wavelet transform algorithm of Section
4.3.3. All coefficients (u, s) ∈
/ ⌳ are set to zero,and the fast inverse wavelet transform
algorithm is applied.The inverse wavelet transform formula (4.66) must be modified
by removing the renormalization factor a⫺j and a⫺J in the sum (set them to 1) to
implement the operator (4.117).
The same as in the windowed Fourier case, modifications of sound durations or
frequency transpositions are computed by modifying ridge support. A reduction of
sound duration by a factor transforms ridge support ⌳ into:
⌳ ⫽ {(u, s) : (u, s) ∈ ⌳}.
(4.118)
A sound transposition is implemented by modifying the scales of the time-scale ridge
support ⌳, which defines:
⌳ ⫽ {(u, s) : (u, s/) ∈ ⌳}.
(4.119)
The wavelet coefficients over these supports are derived from the deformation
model (4.72) or (4.74), similar to (4.99) and (4.101) for the windowed Fourier
transform. Processed signals are recovered from the modified wavelet coefficients
and modified supports with the dual-synthesis algorithm of Section 5.1.3.
EXAMPLE 4.15
The instantaneous frequencies of two linear chirps
f (t) ⫽ a1 cos(b t 2 ⫹ c t) ⫹ a2 cos(b t 2 )
are not precisely measured by wavelet ridges. Indeed,
c
|⬘2 (u) ⫺ ⬘1 (u)|
⫽
1 ⬘(u)
bt
4.4 Time-Frequency Geometry of Instantaneous Frequencies
converges to zero when t increases. When it is smaller than ⌬/ , the two chirps interact
and create interference patterns like those shown in Figure 4.16. The ridges follow these
interferences and do not properly estimate the two instantaneous frequencies, as opposed to
the windowed Fourier ridges shown in Figure 4.13.
/ 2
400
300
200
100
0
u
0
0.2
0.4
0.6
0.8
1
0.6
0.8
1
(a)
/ 2
400
300
200
100
0
u
0
0.2
0.4
(b)
FIGURE 4.16
(a) Normalized scalogram
(b) Wavelet ridges.
⫺1 P
W f (u, ) of two parallel linear chirps shown in Figure 4.13.
133
134
CHAPTER 4 Time Meets Frequency
EXAMPLE 4.16
The instantaneous frequency of a hyperbolic chirp
f (t) ⫽ cos
⫺t
is ⬘(t) ⫽ (1 ⫺ t)⫺2 . Wavelet ridges can measure this instantaneous frequency if the timeresolution condition (4.112) is satisfied:
⬘(t)2
⫽
.
|⬘⬘(t)| |t ⫺ |
2
This is the case if |t ⫺ | is not too large.
Figure 4.17 displays the scalogram and the ridges of two hyperbolic chirps
1
2
f (t) ⫽ a1 cos
⫹ a2 cos
,
1 ⫺ t
2 ⫺ t
with 1 ⫽ 0.68 and 2 ⫽ 0.72. As opposed to the windowed Fourier ridges shown in
Figure 4.14, the wavelet ridges follow the rapid time modification of both instantaneous
frequencies. This is particularly useful in analyzing the returns of hyperbolic chirps emitted by radar or sonar. Several techniques have been developed to detect chirps with wavelet
ridges in the presence of noise [151, 455].
Better Is More Sparse
The linear and hyperbolic chirp examples show that the best transform depends
on the signal time-frequency property. All examples also show that when the
time-frequency transform has a resolution adapted to the signal time-frequency
properties, the number of ridge points is reduced. Indeed, if signal structures do
not match dictionary time-frequency atoms, then their energy is diffused over many
more atoms, which produces more local maxima. Sparsity therefore appears as a
natural criterion to adjust the resolution of time-frequency transforms.
Section 12.3.3 studies sparse time-frequency decompositions in very redundant
Gabor time-frequency dictionaries, including windowed Fourier and wavelet atoms
with a computationally more intense matching pursuit algorithm.
4.5 QUADRATIC TIME-FREQUENCY ENERGY
Wavelet and windowed-Fourier transforms are computed by correlating the signal
with families of time-frequency atoms. The time and frequency resolution of these
transforms is limited by the time-frequency resolution of the corresponding atoms.
Ideally, one would like to define a density of energy in a time-frequency plane with
no loss of resolution.
4.5 Quadratic Time-Frequency Energy
/ 2
400
300
200
100
0
u
0
0.2
0.4
0.6
0.8
1
0.6
0.8
1
(a)
/ 2
400
300
200
100
0
u
0
0.2
0.4
(b)
FIGURE 4.17
(a) Normalized scalogram
(b) Wavelet ridges.
⫺1 P
W f (u, ) of two hyperbolic chirps shown in Figure 4.14.
The Wigner-Ville distribution is a time-frequency energy density computed by
correlating f with a time and frequency translation of itself. Despite its remarkable
properties,the application of theWigner-Ville distribution is limited by the existence
of interference terms. Such interferences can be attenuated by time-frequency averaging but this results in a loss of resolution. It will be proved that the spectrogram,
the scalogram, and all squared time-frequency decompositions can be written as a
135
136
CHAPTER 4 Time Meets Frequency
time-frequency averaging of the Wigner-Ville distribution, which gives a common
framework to relate the transforms.
4.5.1 Wigner-Ville Distribution
To analyze time-frequency structures, in 1948 Ville [475] introduced in signal processing a quadratic form that had been studied by Wigner [484] in a 1932 article on
quantum thermodynamics:
⫹⬁ PV f (u, ) ⫽
f u⫹
(4.120)
f ∗ u⫺
e⫺i d .
2
2
⫺⬁
The Wigner-Ville distribution remains real because it is the Fourier transform of
f (u ⫹ /2)f ∗ (u ⫺ /2), which has a Hermitian symmetry in ; time and frequency
have a symmetric role. This distribution can also be rewritten as a frequency
integration by applying the Parseval formula:
⫹⬁ 1
iu
∗
PV f (u, ) ⫽
f̂ ⫹
(4.121)
f̂ ⫺
e d.
2 ⫺⬁
2
2
Time-Frequency Support
The Wigner-Ville transform localizes the time-frequency structures of f . If the
energy of f is concentrated in time around u0 and in frequency around 0 , then
PV f has its energy centered at (u0 , 0 ), with a spread equal to the time and
frequency spread of f . This property is illustrated by Theorem 4.7, which relates
the time and frequency support of PV f to the support of f and f̂ .
Theorem 4.7.
■
■
If the support of f is [u0 ⫺ T /2, u0 ⫹ T /2], then for all the support in u of
PV f (u, ) is included in this interval.
If the support of f̂ is [0 ⫺ ⌬/2, 0 ⫹ ⌬/2], then for all u the support in of PV f (u, )
is included in this interval.
Proof. Let f̄ (t) ⫽ f (⫺t); the Wigner-Ville distribution is rewritten as
⫹⬁ ⫺ 2u ⫺i
⫹ 2u ∗
PV f (u, ) ⫽
f
d .
f̄
e
2
2
⫺⬁
(4.122)
Suppose that f has a support equal to [u0 ⫺ T /2, u0 ⫹ T /2]. The supports of f ( /2 ⫹ u)
and f̄ ( /2 ⫺ u) are then, respectively,
[2(u0 ⫺ u) ⫺ T , 2(u0 ⫺ u) ⫹ T ] and [⫺2(u0 ⫹ u) ⫺ T , ⫺2(u0 ⫹ u) ⫹ T ].
The Wigner-Ville integral (4.122) shows that PV f (u, ) is nonzero if these two intervals overlap, which is the case only if |u0 ⫺ u| ⬍ T . Support of PV f (u, ) along u is therefore included in the support of f . If the support of f̂ is an interval, then the same
derivation based on (4.121) shows that support of PV f (u, ) along is included in
support of f̂ .
■
4.5 Quadratic Time-Frequency Energy
EXAMPLE 4.17
Theorem 4.7 proves that the Wigner-Ville distribution does not spread the time or frequency
support of Diracs or sinusoids, unlike windowed Fourier and wavelet transforms. Direct
calculations yield
⇒ PV f (u, ) ⫽ (u ⫺ u0 )
f (t) ⫽ (u ⫺ u0 ) ⫽
⇒ PV f (u, ) ⫽
f (t) ⫽ exp(i0 t) ⫽
1
( ⫺ 0 )
2
(4.123)
(4.124)
EXAMPLE 4.18
If f is a smooth and symmetric window, then its Wigner-Ville distribution PV f (u, ) is concentrated in a neighborhood of u ⫽ ⫽ 0. A Gaussian f (t) ⫽ ( 2 )⫺1/4 exp(⫺t 2 /(2 2 )) is
transformed into a two-dimensional Gaussian because its Fourier transform is also Gaussian
(2.32), and one can verify that
1
⫺u2
2 2
PV f (u, ) ⫽ exp
⫺ .
2
(4.125)
In this particular case, PV f (u, ) ⫽ | f (u)|2 | f̂ ()|2 .
The Wigner-Ville distribution has important invariance properties. A phase shift
does not modify its value:
⇒ PV f (u, ) ⫽ PV g(u, ).
f (t) ⫽ ei g(t) ⫽
(4.126)
When f is translated in time or frequency,itsWigner-Ville transform is also translated:
⇒ PV f (u, ) ⫽ PV g(u ⫺ u0 , )
f (t) ⫽ g(t ⫺ u0 ) ⫽
(4.127)
⇒ PV f (u, ) ⫽ PV g(u, ⫺ 0 )
f (t) ⫽ exp(i0 t)g(t) ⫽
(4.128)
If f is scaled by s and thus f̂ is scaled by 1/s,then the time and frequency parameters
of PV f are also scaled, respectively, by s and 1/s
u
1 t ⇒ PV f (u, ) ⫽ PV f
f (t) ⫽ √ g
⫽
, s .
(4.129)
s
s
s
EXAMPLE 4.19
If g is a smooth and symmetric window, then PV g(u, ) has its energy concentrated in the
neighborhood of (0, 0). The time-frequency atom
t ⫺u a
0
f0 (t) ⫽ √ exp(i0 ) g
exp(i0 t)
s
s
has a Wigner-Ville distribution that is calculated with (4.126), (4.127), and (4.128):
PV f0 (u, ) ⫽ |a|2 PV g
u⫺u
s
0
, s( ⫺ 0 ) .
(4.130)
137
138
CHAPTER 4 Time Meets Frequency
Its energy therefore is concentrated in the neighborhood of (u0 , 0 ), on an ellipse with axes
that are proportional to s in time and 1/s in frequency.
Instantaneous Frequency
Ville’s original motivation for studying time-frequency decompositions was to compute the instantaneous frequency of a signal [475]. Let fa be the analytical part
of f obtained in (4.69) by setting f̂ () to zero for ⬍ 0. We write fa (t) ⫽
a(t) exp[i(t)] to define the instantaneous frequency (t) ⫽ ⬘(t). Theorem 4.8
proves that ⬘(t) is the“average”frequency computed relative to theWigner-Ville distribution PV fa .
Theorem 4.8. If fa (t) ⫽ a(t) exp[i(t)], then
⫹⬁
PV fa (u, ) d
⬘(u) ⫽ ⫺⬁
.
⫹⬁
⫺⬁ PV fa (u, ) d
(4.131)
Proof. To prove this result, we verify that any function g satisfies
g u⫹
g∗ u ⫺
exp(⫺i ) d d
2
2
⫽ ⫺i g⬘(u) g ∗ (u) ⫺ g(u) g ∗⬘ (u) .
(4.132)
This identity is obtained by observing that the Fourier transform of i is the derivative of
a Dirac, which gives an equality in the sense of distributions:
⫹⬁
⫺⬁
exp(⫺i ) d ⫽ ⫺i 2 ⬘( ).
⫹⬁
Since ⫺⬁ ⬘( ) h( )d ⫽ ⫺h⬘(0), inserting h( ) ⫽ g(u ⫹ /2) g ∗ (u ⫺ /2) proves (4.132).
If g(u) ⫽ fa (u) ⫽ a(u) exp[i(u)], then (4.132) gives
⫹⬁
⫺⬁
PV fa (u, ) d ⫽ 2 a2 (u) ⬘(u).
⫹⬁
We will see in (4.136) that | fa (u)|2 ⫽ (2)⫺1 ⫺⬁ PV fa (u, ) d, and since | fa (u)|2 ⫽
a(u)2 , we derive (4.131).
■
This theorem shows that for a fixed u the mass of PV fa (u, ) is typically
concentrated in the neighborhood of the instantaneous frequency ⫽ ⬘(u). For
example, a linear chirp f (t) ⫽ exp(iat 2 ) is transformed into a Dirac located along
the instantaneous frequency ⫽ ⬘(u) ⫽ 2au:
PV f (u, ) ⫽ ( ⫺ 2au).
Similarly, the multiplication of f by a linear chirp exp(iat 2 ) makes a frequency
translation of PV f by the instantaneous frequency 2au:
⇒ PV f (u, ) ⫽ PV g(u, ⫺ 2au).
f (t) ⫽ exp(iat 2 ) g(t) ⫽
(4.133)
4.5 Quadratic Time-Frequency Energy
Energy Density
The Moyal [379] formula proves that the Wigner-Ville transform is unitary, which
implies energy-conservation properties.
Theorem 4.9: Moyal. For any f and g in L 2 (R),
2
⫹⬁
∗
⫽ 1
f
(t)
g
(t)
dt
PV f (u, ) PV g(u, ) du d.
2
⫺⬁
(4.134)
Proof. Let us compute the integral
I⫽
PV f (u, ) PV g(u, ) du d
⬘
⬘ ∗
f ∗ u⫺ g u⫹
g u⫺
⫽
f u⫹
2
2
2
2
exp[⫺i( ⫹ ⬘)] d d ⬘ du d.
The Fourier transform
of h(t) ⫽ 1 is ĥ() ⫽ 2 (), which means that we have a dis
tribution equality exp[⫺i( ⫹ ⬘)]d ⫽ 2 ( ⫹ ⬘). As a result,
⬘ ∗
⬘
f u⫹
f ∗ u⫺ g u⫹
g u⫺
( ⫹ ⬘) d d ⬘ du
2
2
2
2
f ∗ u ⫺ g u ⫺ g∗ u ⫹
d du.
⫽ 2
f u⫹
2
2
2
2
I ⫽ 2
The change of variable t ⫽ u ⫹ /2 and t⬘ ⫽ u ⫺ /2 yields (4.134).
■
One can consider | f (t)|2 and | f̂ ()|2 /(2) as energy densities in time and
frequency that satisfy a conservation equation:
⫹⬁
⫹⬁
1
f 2 ⫽
| f (t)|2 dt ⫽
| f̂ ()|2 d.
2
⫺⬁
⫺⬁
Theorem 4.10 shows that these time and frequency densities are recovered with
marginal integrals over the Wigner-Ville distribution.
Theorem 4.10. For any f ∈ L 2 (R),
⫹⬁
⫺⬁
and
1
2
PV f (u, ) du ⫽ | f̂ ()|2
⫹⬁
⫺⬁
PV f (u, ) d ⫽ | f (u)|2 .
(4.135)
(4.136)
Proof. The frequency integral (4.121) proves that the one-dimensional Fourier transform of
g (u) ⫽ PV f (u, ), with respect to u, is
∗
ĝ () ⫽ f̂ ⫹
f̂ ⫺
.
2
2
139
140
CHAPTER 4 Time Meets Frequency
We derive (4.135) from the fact that it is
ĝ (0) ⫽
⫹⬁
⫺⬁
g (u) du.
Similarly, (4.120) shows that PV f (u, ) is the one-dimensional Fourier transform of
f (u ⫹ /2)f ∗ (u ⫺ /2) with respect to , where is the Fourier variable. Its integral in therefore gives the value for ⫽ 0, which is the identity (4.136).
■
This theorem suggests interpreting the Wigner-Ville distribution as a joint
time-frequency energy density. However, the Wigner-Ville distribution misses one
fundamental property of an energy density: positivity. Let us compute, for example,
the Wigner-Ville distribution of f ⫽ 1[⫺T ,T ] with the integral (4.120):
2 sin 2(T ⫺ |u|)
PV f (u, ) ⫽
1[⫺T ,T ] (u).
It is an oscillating function that takes negative values. In fact, one can prove
that translated and frequency-modulated Gaussians are the only functions with
positive Wigner-Ville distributions. As we will see in the next section, to obtain positive energy distributions for all signals, it is necessary to average the Wigner-Ville
transform, thus losing some time-frequency resolution.
4.5.2 Interferences and Positivity
At this point, the Wigner-Ville distribution may seem to be an ideal tool for analyzing
the time-frequency structures of a signal. This, however, is not the case because of
interferences created by the transform’s quadratic properties. Interference can be
removed by averaging the Wigner-Ville distribution with appropriate kernels that
yield positive time-frequency densities. However, this reduces time-frequency resolution. Spectrograms and scalograms are examples of positive quadratic distributions
obtained by smoothing the Wigner-Ville distribution.
Cross Terms
Let f ⫽ f1 ⫹ f2 be a composite signal. Since the Wigner-Ville distribution is a quadratic form,
PV f ⫽ PV f1 ⫹ PV f2 ⫹ PV [ f1 , f2 ] ⫹ PV [ f2 , f1 ],
where PV [h, g] is the cross Wigner-Ville distribution of two signals:
⫹⬁ PV [h, g](u, ) ⫽
h u⫹
g∗ u ⫺
e⫺i d .
2
2
⫺⬁
The interference term
I [ f1 , f2 ] ⫽ PV [ f1 , f2 ] ⫹ PV [ f2 , f1 ]
(4.137)
(4.138)
4.5 Quadratic Time-Frequency Energy
is a real function that creates nonzero values at unexpected locations of the (u, )
plane.
Let us consider two time-frequency atoms defined by
f1 (t) ⫽ a1 ei1 g(t ⫺ u1 ) ei1 t and f2 (t) ⫽ a2 ei2 g(t ⫺ u2 ) ei2 t ,
where g is a time window centered at t ⫽ 0. Their Wigner-Ville distributions
computed in (4.130) are
PV f1 (u, ) ⫽ a21 PV g(u ⫺ u1 , ⫺ 1 ) and PV f2 (u, ) ⫽ a22 PV g(u ⫺ u2 , ⫺ 2 ).
Since the energy of PV g is centered at (0, 0), the energy of PV f1 and PV f2 is
concentrated in the neighborhoods of (u1 , 1 ) and (u2 , 2 ), respectively. A direct
calculation verifies that the interference term is
I [ f1 , f2 ](u, ) ⫽ 2a1 a2 PV g(u ⫺ u0 , ⫺ 0 ) cos (u ⫺ u0 )⌬ ⫺ ( ⫺ 0 )⌬u ⫹ ⌬
with
u0 ⫽
u 1 ⫹ u2
1 ⫹ 2
, 0 ⫽
2
2
⌬u ⫽ u1 ⫺ u2 , ⌬ ⫽ 1 ⫺ 2
⌬ ⫽ 1 ⫺ 2 ⫹ u0 ⌬.
The interference term is an oscillatory waveform centered at the middle point
(u0 , 0 ). This is quite counterintuitive because f and f̂ have very little energy in
the neighborhood of u0 and
0 . The frequency of the oscillations is proportional
to the Euclidean distance ⌬ 2 ⫹ ⌬u2 of (u1 , 1 ) and (u2 , 2 ). The direction of the
oscillations is perpendicular to the line that joins (u1 , 1 ) and (u2 , 2 ). Figure 4.18
on the next page displays the Wigner-Ville distribution of two atoms obtained with a
Gaussian window g. Oscillating interference appears at the middle time-frequency
point.
This figure’s example shows that interference I [ f1 , f2 ](u, ) has some energy in
regions where | f (u)|2 ≈ 0 and | f̂ ()|2 ≈ 0. Such interferences can have a complicated structure [26, 302], but they are necessarily oscillatory because the marginal
integrals (4.135) and (4.136) vanish:
⫹⬁
⫹⬁
PV f (u, )d ⫽ 2| f (u)|2 ,
PV f (u, )du ⫽ | f̂ ()|2 .
⫺⬁
⫺⬁
Analytic Part
Interference terms also exist in a real signal f with a single instantaneous frequency
component. Let fa (t) ⫽ a(t) exp[i(t)] be its analytic part:
f ⫽ Re[ fa ] ⫽
1
( fa ⫹ fa∗ ).
2
Theorem 4.8 proves that for fixed u, PV fa (u, ) and PV fa∗ (u, ) have an energy
concentrated, respectively, in the neighborhood of 1 ⫽ ⬘(u) and 2 ⫽ ⫺⬘(u). Both
141
142
CHAPTER 4 Time Meets Frequency
f (t)
2
0
22
t
0
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
/2
100
50
0
u
0
FIGURE 4.18
Wigner-Ville distribution PV f (u, ) of two Gabor atoms (top); the oscillating interferences are
centered at the middle time-frequency location.
components create an interference term at the intermediate zero frequency 0 ⫽
(1 ⫹ 2 )/2 ⫽ 0. To avoid this low-frequency interference, we often compute PV fa
as opposed to PV f .
Figure 4.19 displays PV fa for a real signal f that includes a linear chirp, a quadratic chirp, and two isolated time-frequency atoms. The linear and quadratic chirps
are localized along narrow time-frequency lines; they are spread on wider bands
by the spectrogram shown in Figure 4.3 and the scalogram shown in Figure 4.4,
respectively. However, interference terms create complex oscillatory patterns that
make it difficult to detect the existence of the two time-frequency transients at
t ⫽ 0.5 and t ⫽ 0.87, which clearly appear in the spectrogram and the scalogram.
Positivity
Since interference terms include positive and negative oscillations, they can be
partially removed by smoothing PV f with a kernel K :
⫹⬁ ⫹⬁
PK f (u, ) ⫽
PV f (u⬘, ⬘) K (u, u⬘, , ⬘) du⬘ d⬘.
(4.139)
⫺⬁
⫺⬁
4.5 Quadratic Time-Frequency Energy
f (t)
2
0
⫺2
t
0
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
/ 2
500
400
300
200
100
0
u
0
FIGURE 4.19
The Wigner-Ville distribution PV fa (u, ) (bottom) of the analytic part of the signal (top).
The time-frequency resolution of this distribution depends on the spread of the
kernel K in the neighborhood of (u, ). Since interferences take negative values,
one can guarantee that all interferences are removed by imposing that this timefrequency distribution remain positive PK f (u, ) ⭓ 0 for all (u, ) ∈ R2 .
The spectrogram (4.12) and scalogram (4.55) are examples of positive timefrequency energy distributions. In general,let us consider a family of time-frequency
atoms { }∈⌫ . Suppose that for any (u, ) there exists a unique atom (u,)
centered in time and frequency at (u, ). The resulting time-frequency energy
density is
Pf (u, ) ⫽ | f , (u,) |2 .
The Moyal formula (4.134) proves that this energy density can be written as a
time-frequency averaging of the Wigner-Ville distribution:
1
Pf (u, ) ⫽
(4.140)
PV f (u⬘, ⬘) PV (u,) (u⬘, ⬘) du⬘ d⬘.
2
The smoothing kernel is the Wigner-Ville distribution of the atoms:
1
K (u, u⬘, , ⬘) ⫽
PV (u,) (u⬘, ⬘).
2
143
144
CHAPTER 4 Time Meets Frequency
The loss of time-frequency resolution depends on the spread of the distribution
PV (u,) (u⬘, ⬘) in the neighborhood of (u, v).
EXAMPLE 4.20
A spectrogram is computed with windowed Fourier atoms:
(u,) (t) ⫽ g(t ⫺ u) eit .
The Wigner-Ville distribution calculated in (4.130) yields
K (u, u⬘, , ⬘) ⫽
1
1
PV (u,) (u⬘, ⬘) ⫽
PV g(u⬘ ⫺ u, ⬘ ⫺ ).
2
2
(4.141)
For a spectrogram, the Wigner-Ville averaging (4.140) is therefore a two-dimensional convolution with PV g. If g is a Gaussian window, then PV g is a two-dimensional Gaussian. This
proves that averaging PV f with a sufficiently wide Gaussian defines a positive energy density.
The general class of time-frequency distributions obtained by convolving PV f with an arbitrary
kernel K is studied in Section 4.5.3.
EXAMPLE 4.21
Let be an analytic wavelet with a center frequency that is . The wavelet atom
s⫺1/2 ((t ⫺ u)/s) is centered at (u, ⫽ /s), and the scalogram is defined by
PW f (u, ) ⫽ | f ,
u,s |
2
u,s (t) ⫽
for ⫽ /s.
Properties (4.127, 4.129) prove that the averaging kernel is
K (u, u⬘, , ⬘) ⫽
1
PV
2
u⬘ ⫺ u
1
, s⬘ ⫽
PV
(u⬘ ⫺ u), ⬘ .
s
2
Positive time-frequency distributions totally remove the interference terms but
produce a loss of resolution. This is emphasized by Theorem 4.11 described by
Wigner [485].
Theorem 4.11: Wigner. There is no positive quadratic energy distribution P f that
satisfies
⫹⬁
⫹⬁
P f (u, ) d ⫽ 2 | f (u)|2 and
P f (u, ) du ⫽ | f̂ ()|2 .
(4.142)
⫺⬁
⫺⬁
Proof. Suppose that P f is a positive quadratic distribution that satisfies these marginals.
Since P f (u, ) ⭓ 0, the integrals (4.142) imply that, if the support of f is included in an
interval I , then Pf (u, ) ⫽ 0 for u ∈
/ I . We can associate to the quadratic form Pf a bilinear
distribution defined for any f and g by
P[ f , g] ⫽
1
P( f ⫹ g) ⫺ P( f ⫺ g) .
4
4.5 Quadratic Time-Frequency Energy
Let f1 and f2 be two nonzero signals having their support in two disjoint intervals I1
and I2 so that f1 f2 ⫽ 0. Let f ⫽ a f1 ⫹ b f2 :
P f ⫽ |a|2 Pf1 ⫹ ab∗ P[ f1 , f2 ] ⫹ a∗ b P[ f2 , f1 ] ⫹ |b|2 P f2 .
Since I1 does not intersect I2 ,Pf1 (u, ) ⫽ 0 for u ∈ I2 . Remember that Pf (u, ) ⭓ 0 for all a
and b, so necessarily P[ f1 , f2 ](u, ) ⫽ P[ f2 , f1 ](u, ) ⫽ 0 for u ∈ I2 . Similarly, we can prove
that these cross terms are zero for u ∈ I1 ; therefore
P f (u, ) ⫽ |a|2 P f1 (u, ) ⫹ |b|2 P f2 (u, ).
Integrating this equation and inserting (4.142) yields
| f̂ ()|2 ⫽ |a|2 | f̂1 ()|2 ⫹ |b|2 | f̂2 ()|2 .
Since f̂ () ⫽ a f̂1 () ⫹ b f̂2 (), it follows that f̂1 () f̂2∗ () ⫽ 0. But this is not possible
because f1 and f2 have a compact support in time and Theorem 2.7 proves that f̂1
and f̂2 are C⬁ functions that cannot vanish on a whole interval. Thus, we conclude that
one cannot construct a positive quadratic distribution Pf that satisfies the marginals
(4.142).
■
4.5.3 Cohen’s Class
While attenuating the interference terms with a smoothing kernel K , we may want
to retain certain important properties. Cohen [177] introduced a general class of
quadratic time-frequency distributions that satisfy the time translation and frequency
modulation invariance properties (4.127) and (4.128). If a signal is translated in
time or frequency, its energy distribution is translated just by the corresponding
amount. This was the beginning of a systematic study of quadratic time-frequency
distributions obtained as a weighted average of a Wigner-Ville distribution [8, 26,
178, 301].
Section 2.1 proves that linear translation-invariant operators are convolution products. The translation-invariance properties (4.127, 4.128) therefore are
equivalent to having a smoothing kernel in (4.139) be a convolution kernel:
K (u, u⬘, , ⬘) ⫽ K (u ⫺ u⬘, ⫺ ⬘);
therefore
PK f (u, ) ⫽ PV f K (u, ) ⫽
(4.143)
K (u ⫺ u⬘, ⫺ ⬘) PV f (u⬘, ⬘) du⬘ d⬘.
(4.144)
The spectrogram is an example of Cohen’s class distribution with a kernel in (4.141)
that is the Wigner-Ville distribution of the window:
⫹⬁ 1
1
K (u, ) ⫽
g u⫹
(4.145)
PV g(u, ) ⫽
g u⫺
e⫺i d .
2
2 ⫺⬁
2
2
145
146
CHAPTER 4 Time Meets Frequency
Ambiguity Function
The properties of the convolution (4.144) are more easily studied by calculating the
two-dimensional Fourier transform of PV f (u, ) with respect to u and . We denote
by Af ( , ) this Fourier transform:
⫹⬁ ⫹⬁
A f ( , ) ⫽
PV f (u, ) exp[⫺i(u ⫹ )] du d.
⫺⬁
⫺⬁
Note that the Fourier variables and are inverted with respect to the usual Fourier
notation. Since the one-dimensional Fourier transform of PV f (u, ),with respect to
u, is f̂ ( ⫹ /2) f̂ ∗ ( ⫺ /2), applying the one-dimensional Fourier transform with
respect to gives
⫹⬁ ∗
⫺i f̂ ⫹
A f ( , ) ⫽
d.
(4.146)
f̂ ⫺
e
2
2
⫺⬁
The Parseval formula yields
A f ( , ) ⫽
⫹⬁ f u⫹
f ∗ u⫺
e⫺iu du.
2
2
⫺⬁
(4.147)
We recognize the ambiguity function encountered in (4.24) when studying the
time-frequency resolution of a windowed Fourier transform. It measures the energy
concentration of f in time and in frequency.
Kernel Properties
The Fourier transform of K (u, ) is
⫹⬁ ⫹⬁
K̂ ( , ) ⫽
K (u, ) exp[⫺i(u ⫹ )] du d.
⫺⬁
⫺⬁
As in the definition of the ambiguity function (4.146), the Fourier parameters
and of K̂ are inverted. Theorem 4.12 gives necessary and sufficient conditions
to ensure that PK satisfies marginal energy properties such as those of the WignerVille distribution. Wigner’s Theorem 4.11 proves that in this case PK f (u, ) takes
negative values.
Theorem 4.12. For all f ∈ L 2 (R),
⫹⬁
⫺⬁
PK f (u, ) d ⫽ 2 | f (u)| ,
2
⫹⬁
⫺⬁
PK f (u, ) du ⫽ | f̂ ()|2 ,
(4.148)
if and only if
᭙( , ) ∈ R2 , K̂ ( , 0) ⫽ K̂ (0, ) ⫽ 1.
(4.149)
4.5 Quadratic Time-Frequency Energy
Proof. Let AK f ( , ) be the two-dimensional Fourier transform of PK f (u, ). The Fourier
integral at (0, ) gives
⫹⬁ ⫹⬁
⫺⬁
⫺⬁
PK f (u, ) e⫺iu d du ⫽ AK f (0, ).
(4.150)
Since the ambiguity function Af ( , ) is the Fourier transform of PV f (u, ), the twodimensional convolution (4.144) gives
AK ( , ) ⫽ A f ( , ) K̂ ( , ).
(4.151)
The Fourier transform of 2| f (u)|2 is f̂ f̂ (), with f̂ () ⫽ f̂ ∗ (⫺). The relation
(4.150) shows that (4.148) is satisfied, if and only if,
AK f (0, ) ⫽ A f (0, ) K̂ (0, ) ⫽ f̂ f̂ ().
(4.152)
Since PV f satisfies the marginal property (4.135), we similarly prove that
A f (0, ) ⫽ f̂ f̂ ().
Requiring (4.152) to be valid for any f̂ () is equivalent to requiring that K̂ (0, ) ⫽ 1 for all
∈ R. The same derivation applied to other marginal integration yields K̂ (, 0) ⫽ 1.
■
In addition to requiring time-frequency translation invariance, it may be useful to guarantee that PK satisfies the same scaling property as a Wigner-Ville
distribution:
u
1
t
⇒ PK g(u, ) ⫽ PK f
g(t) ⫽ √ f
⫽
, s .
s
s
s
Such a distribution PK is affine invariant. One can verify that affine invariance is
equivalent to imposing that
᭙s ∈ R⫹ , K s u,
⫽ K (u, ),
(4.153)
s
therefore
K (u, ) ⫽ K (u , 1) ⫽ (u ).
EXAMPLE 4.22
The Rihaczek distribution is an affine invariant distribution whose convolution kernel is
K̂ ( , ) ⫽ exp
i 2
.
(4.154)
A direct calculation shows that
PK f (u, ) ⫽ f (u) f̂ ∗ () exp(⫺iu).
(4.155)
147
148
CHAPTER 4 Time Meets Frequency
EXAMPLE 4.23
The kernel of the Choi-William distribution is [161]
K̂ ( , ) ⫽ exp(⫺ 2
2
2 ).
(4.156)
It is symmetric and thus corresponds to a real function K (u, ). This distribution satisfies
the marginal conditions (4.149). Since lim→0 K̂ ( , ) ⫽ 1, when is small the Choi-William
distribution is close to a Wigner-Ville distribution. Increasing attenuates the interference
terms but spreads K (u, ), which reduces the time-frequency resolution of the distribution.
Figure 4.20 shows that the interference terms of two modulated Gaussians nearly disappear when the Wigner-Ville distribution of Figure 4.18 is averaged by a Choi-William kernel
having a sufficiently large . Figure 4.21 gives the Choi-William distribution corresponding
to the Wigner-Ville distribution shown in Figure 4.19. The energy of the linear and quadratic
chirps are spread over wider time-frequency bands but the interference terms are attenuated,
although not totally removed. It remains difficult to isolate the two modulated Gaussians at
t ⫽ 0.5 and t ⫽ 0.87, which clearly appear in the spectrogram of Figure 4.3.
f(t)
2
0
⫺2
t
0
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
/2
100
50
0
u
0
FIGURE 4.20
Choi-William distribution PK f (u, ) of two Gabor atoms (top); the interference term that
appears in the Wigner-Ville distribution of Figure 4.18 has nearly disappeared.
4.5 Quadratic Time-Frequency Energy
f(t)
2
0
⫺2
t
0
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
/2
500
400
300
200
100
0
u
0
FIGURE 4.21
Choi-William distribution PK fa (u, ) of signal’s analytic part (top); the interferences remain
visible.
4.5.4 Discrete Wigner-Ville Computations
The Wigner integral (4.120) is the Fourier transform of f (u ⫹ /2)f ∗ (u ⫺ /2):
PV f (u, ) ⫽
⫹⬁ f u⫹
f ∗ u⫺
e⫺i d .
2
2
⫺⬁
(4.157)
For a discrete signal f [n], defined over 0 ⭐ n ⬍ N , the integral is replaced by a
discrete sum:
p
p
⫺i2kp
f n⫹
f ∗ n⫺
exp
.
2
2
N
p⫽⫺N
N ⫺1
PV f [n, k] ⫽
(4.158)
When p is odd, this calculation requires knowing the value of f at half integers.
Such values are computed by interpolating f by adding zeroes to its Fourier transform. This is necessary to avoid the aliasing produced by the discretization of the
Wigner-Ville integral [165].
149
150
CHAPTER 4 Time Meets Frequency
The interpolation f̃ of f is a signal of size 2N that has a DFT f̃ defined from the
discrete Fourier transform f̂ of f by
⎧
2 f̂ [k]
⎪
⎪
⎪
⎪
⎨0
f̃ [k] ⫽
⎪
2 f̂ [k ⫺ N ]
⎪
⎪
⎪
⎩
f̂ [N /2]
if 0 ⭐ k ⬍ N /2
if N /2 ⬍ k ⬍ 3N /2
if 3N /2 ⬍ k ⬍ 2N
if k ⫽ N /2, 3N /2
Computing the inverse DFT shows that f̃ [2n] ⫽ f [n] for n ∈ [0, N ⫺ 1]. When
n∈
/ [0, 2N ⫺ 1],we set f̃ [n] ⫽ 0.The Wigner summation (4.158) is calculated from f̃ :
N ⫺1
PV f [n, k] ⫽
f̃ [2n ⫹ p] f̃ ∗ [2n ⫺ p] exp
p⫽⫺N
⫺i2kp
N
⫺i2(2k)p
⫽
f̃ [2n ⫹ p ⫺ N ] f̃ [2n ⫺ p ⫹ N ] exp
.
2N
p⫽0
2N ⫺1
∗
For 0 ⭐ n ⬍ N fixed, PV f [n, k] is the discrete Fourier transform of size 2N of
g[ p] ⫽ f̃ [2n ⫹ p ⫺ N ] f̃ ∗ [2n ⫺ p ⫹ N ] at the frequency 2k. Thus, the discrete WignerVille distribution is calculated with N FFT procedures of size 2N , which requires
O(N 2 log N ) operations. To compute the Wigner-Ville distribution of the analytic
part fa of f , we use (4.48).
Cohen’s Class
A Cohen’s class distribution is calculated with a circular convolution of the discrete
Wigner-Ville distribution with a kernel K [ p, q]:
PK [n, k] ⫽ PV K [n, k].
(4.159)
Its two-dimensional discrete Fourier transform is therefore
AK [ p, q] ⫽ A f [ p, q] K̂ [ p, q].
(4.160)
The signal Af [ p, q] is the discrete ambiguity function, calculated with a twodimensional FFT of the discreteWigner-Ville distribution PV f [n, k].As in the case of
continuous time, we have inverted the index p and q of the usual two-dimensional
Fourier transform. The Cohen’s class distribution (4.159) is obtained by calculating
the inverse Fourier transform of (4.160). This also requires a total of O(N 2 log N )
operations.
4.6 Exercises
4.6 EXERCISES
4.1
2 Instantaneous frequency. Let f (t) ⫽ exp[i(t)].
⫹⬁
(a) Prove that ⫺⬁ |S f (u, )|2 d ⫽ 2. Hint:S f (u, ) is a Fourier transform;
use the Parseval formula.
( b) Similarly, show that
⫹⬁
⫹⬁
|S f (u, )|2 d ⫽ 2
⬘(t) | g(t ⫺ u)|2 dt,
⫺⬁
⫺⬁
and interpret this result.
4.2
1 When g(t) ⫽ ( 2 )⫺1/4 exp(⫺t 2 /(2 2 )), compute the ambiguity function
4.3
1 Prove that the approximate reconstruction formula (4.66) is exact if and
Ag( , ).
only if
J
loge a
1 ˆ
1 ˆ
J [k] ⫽ 1.
[k] ⫹
j j
C
a
C
aJ
j⫽I
Compute numerically the left equation value for different a when j [n]
is constructed from the Gabor wavelet (4.60) and (4.62) with ⫽ 1 and
2J ⬍ N /4.
4.4
1 Let g[n] be a window with L nonzero coefficients. For signals of size
N , describe a fast algorithm that computes the discrete windowed Fourier
transform (4.27) with O(N log2 L) operations.
4.5
3 Let K be the reproducing kernel (4.21) of a windowed Fourier transform:
K (u0 , u, 0 , ) ⫽ gu, , gu0 ,0 .
(a) For any ⌽ ∈ L 2 (R2 ) we define
⫹⬁ ⫹⬁
1
P⌽(u0 , 0 ) ⫽
⌽(u, ) K (u0 , u, 0 , ) du d.
2 ⫺⬁ ⫺⬁
Prove that P is an orthogonal projector on the space of functions ⌽(u, )
that are windowed Fourier transforms of functions in L 2 (R).
( b) Suppose that for all (u, ) ∈ R2 we are given S̃f (u, ) ⫽ Q S f (u, ) ,
which is a quantization of the windowed Fourier coefficients. How
2
2
can we reduce
the norm
L (R ) of the quantification error (u, ) ⫽
S f (u, ) ⫺ Q S f (u, ) ?
4.6
3 Prove the wavelet reconstruction formula (4.45).
4.7
3 Prove that a scaling function defined by (4.42) satisfies ⫽ 1.
151
152
CHAPTER 4 Time Meets Frequency
⫹⬁
be a real and even wavelet such that C ⫽ 0 ⫺1 ˆ () d ⬍ ⫹⬁.
Prove that
1 ⫹⬁
ds
᭙f ∈ L 2 (R), f (t) ⫽
W f (t, s) 3/2 .
(4.161)
C 0
s
4.8
2 Let
4.9
3 Analytic continuation. Let f ∈ L 2 (R) be a function such that f̂ () ⫽ 0 for
⬍ 0. For any complex z ∈ C such that Im(z) ⭓ 0, we define
1 ⫹⬁
( p)
f (z) ⫽
(i) p f̂ () eiz d.
0
(a) Verify that if f is Cp then f ( p) (t) is the derivative of order p of f (t).
( b) Prove that if Im(z) ⬎ 0, then f ( p) (z) is differentiable relative to the complex variable z. Such a function is said to be analytic on the upper-half
complex plane.
(c) Prove that this analytic extension can be written as a wavelet transform
f ( p) (x ⫹ iy) ⫽ y⫺p⫺1/2 W f (x, y),
calculated with an analytic wavelet
4.10
that you will specify.
2 Let f (t) ⫽ cos(a cos bt). We want to compute precisely the instantaneous
frequency of f from the ridges of its windowed Fourier transform. Find a
necessary condition on the window support as a function of a and b. If f (t) ⫽
cos(a cos bt) ⫹ cos(a cos bt ⫹ ct), find a condition on a, b, and c in order
to measure both instantaneous frequencies with the ridges of a windowed
Fourier transform.Verify your calculations with a numerical implementation.
4.11
4 Noise removal. We want to suppress noise from audio signals by threshold-
ing ridge coefficients. Implement a dual-synthesis algorithm that reconstructs
audio signal approximations from windowed Fourier ridge points (4.96) or
wavelet ridge points (4.116), with the conjugate-gradient inverse frame algorithm of Theorem 5.8. Study the SNR of audio denoising by thresholding
the ridge coefficients. Try to improve this SNR by averaging the spectrogram
values along ridges. Compare the SNR with a linear filtering estimator.
4.12
4 Sound duration. Make a program that modifies the sound duration with
the formula (4.72) by modifying the ridges of a window Fourier transform
with (4.99) or of a wavelet transform with (4.118), and by reconstructing a
signal with a dual synthesis.
4.13
4 Sound transposition. Implement a sound transposition, with windowed
Fourier or wavelet ridges, with the transposition model (4.73). The resulting
modifications of the ridge supports are specified by (4.100) and (4.119).
The amplitude of the transposed harmonics can be computed with the autoregressive model (4.75). A signal is restored with a dual-synthesis algorithm.
4.6 Exercises
4.14
4 The sinusoidal model (4.71) is improved for speech signals by adding a
nonharmonic component B(t) to the partials [339]:
K
F (t) ⫽
ak (t) cos k (t) ⫹ B(t).
(4.162)
k⫽1
Given a signal f (t) that is considered to be a realization of F (t), compute
the ridges of a windowed Fourier transform, and find the “main” partials
and compute their amplitude ak and phase k . These partials are subtracted
from the signal. Over intervals of fixed size, the residue is modeled as the
realization of an autoregressive process B(t),of order 10 to 15. Use a standard
algorithm to compute the parameters of this autoregressive process [57].
Evaluate the audio quality of the sound restored from the calculated model
(4.162). Study an application to audio compression by quantizing and coding
the parameters of the model.
4.15
2 Prove that Pf (u, ) ⫽ f ⫺2 | f (u)|2 | f̂ ()|2 satisfies the marginal properties
4.16
1 Let g be a Gaussian of variance 2 . Prove that P f (u, ) ⫽ P f (u, ) is
V
(4.135) and (4.136). Why can’t we apply the Wigner Theorem 4.11?
a positive distribution if (u, ) ⫽ g (u) g () with ⭓ 1/2. Hint: Consider
a spectrogram calculated with a Gaussian window.
4.17
3 Let {g (t)}
2
n
n∈N be an orthonormal basis of L (R). Prove that
⫹⬁
᭙(t, ) ∈ R2 ,
PV gn (t, ) ⫽ 1.
n⫽0
4.18
2 Let f (t) ⫽ a(t) exp[i(t)] be the analytic part of f (t). Prove that
a
⫹⬁ ⫺⬁
4.19
2
⫺ ⬘(t)
PV fa (t, ) d ⫽ ⫺ a2 (t)
d 2 log a(t)
.
dt 2
4 To avoid the time-frequency resolution limitations of a windowed Fourier
transform, we want to adapt the window size to the signal content. Let g(t)
be a window supported in [⫺ 12 , 12 ]. We denote by Sj f (u, ) the windowed
Fourier transform calculated with the dilated window gj (t) ⫽ 2⫺j/2 g(2⫺j t).
Find a procedure that computes a single map of ridges by choosing a “best”
window size at each (u, ). One approach is to choose the scale 2l for each
(u, ) such that |Sl f (u, )|2 ⫽ supj |Sj f (u, )|2 . Test your algorithm on the
linear and hyperbolic chirp signals (4.103, 4.107). Test it on the Tweet and
Greasy signals in WAVELAB.
153
CHAPTER
5
Frames
A signal representation may provide “analysis” coefficients that are inner products
with a family of vectors, or “synthesis” coefficients that compute an approximation
by recombining a family of vectors. Frames are families of vectors where “analysis”
and“synthesis”representations are stable. Signal reconstructions are computed with
a dual frame. Frames are potentially redundant and thus more general than bases,
with a redundancy measured by frame bounds. They provide the flexibility needed
to build signal representations with unstructured families of vectors.
Complete and stable wavelet and windowed Fourier transforms are constructed
with frames of wavelets and windowed Fourier atoms. In two dimensions,frames of
directional wavelets and curvelets are introduced to analyze and process multiscale
image structures.
5.1 FRAMES AND RIESZ BASES
5.1.1 Stable Analysis and Synthesis Operators
The frame theory was originally developed by Duffin and Schaeffer [235] to reconstruct band-limited signals from irregularly spaced samples.They established general
conditions to recover a vector f in a Hilbert space H from its inner products with a
family of vectors {n }n∈⌫ . The index set ⌫ might be finite or infinite. The following
frame definition gives an energy equivalence to invert the operator ⌽ defined by
᭙n ∈ ⌫,
⌽ f [n] ⫽ f , n .
(5.1)
Definition 5.1: Frame and Riesz Basis. The sequence {n }n∈⌫ is a frame of H if there
exist two constants B ⭓ A ⬎ 0 such that
᭙f ∈ H, A f 2 ⭐
| f , n |2 ⭐ B f 2 .
(5.2)
n∈⌫
When A ⫽ B the frame is said to be tight. If the {n }n∈⌫ are linearly independant then the
frame is not redundant and is called a Riesz basis.
155
156
CHAPTER 5 Frames
If the frame condition is satisfied, then ⌽ is called a frame analysis operator.
Section 5.1.2 proves that (5.2) is a necessary and sufficient condition guaranteeing
that ⌽ is invertible on its image space,with a bounded inverse.Thus,a frame defines
a complete and stable signal representation, which may also be redundant.
Frame Synthesis
Let us consider the space of finite energy coefficients
2 (⌫) ⫽ {a : a2 ⫽
|a[n]|2 ⬍ ⫹⬁}.
n∈⌫
The adjoint ⌽∗ of ⌽ is defined over 2 (⌫) and satisfies for any f ∈ H and a ∈ 2 (⌫):
⌽∗ a, f ⫽ a, ⌽ f ⫽
a[n] f , n ∗ .
n∈⌫
It is therefore the synthesis operator
⌽∗ a ⫽
a[n] n .
(5.3)
n∈⌫
The frame condition (5.2) can be rewritten as
᭙f ∈ H,
A f 2 ⭐ ⌽ f 2 ⫽ ⌽∗ ⌽ f , f ⭐ B f 2 ,
with
⌽∗ ⌽ f ⫽
(5.4)
f , n n .
m∈⌫
It results that A and B are the infimum and supremum values of the spectrum of the
symmetric operator ⌽∗ ⌽, which correspond to the smallest and largest eigenvalues
in finite dimension.The eigenvalues are also called singular values of ⌽ or singular
spectrum. Theorem 5.1 derives that the frame synthesis operator is also stable.
Theorem 5.1. The family {n }n∈⌫ is a frame with bounds 0 ⬍ A ⭐ B if and only if
᭙a ∈ Im⌽, A a2 ⭐ a[n] n 2 ⭐ B a2 .
(5.5)
Proof. Since ⌽∗ a ⫽
n∈⌫
n∈⌫ a[n] n , it results that
a[n] n 2 ⫽ ⌽ ⌽∗ a, a.
n∈⌫
The operator ⌽ is a frame if and only if the spectrum of ⌽∗ ⌽ is bound by A and B. The
inequality (5.5) states that the spectrum of ⌽ ⌽∗ over Im⌽ is also bounded by A and B.
Both statements are proved to be equivalent by verifying that the supremum and infimum
of the spectrum of ⌽∗ ⌽ are equal to the supremum and infimum of the spectrum of ⌽∗ ⌽.
In finite dimension, if is an eigenvalue of ⌽∗ ⌽ with eigenvector f , then is also
an eigenvalue of ⌽∗ ⌽ with eigenvector ⌽ f . Indeed, ⌽∗ ⌽ f ⫽ f so ⌽ ⌽∗ (⌽ f ) ⫽ ⌽ f
5.1 Frames and Riesz Bases
and ⌽ f ⫽ 0, because the left frame inequality (5.2) implies that ⌽ f 2 ⭐ A f 2 . It
results that the maximum and minimum eigenvectors of ⌽∗ ⌽ and ⌽ ⌽∗ on Im⌽ are
identical.
In a Hilbert space of infinite dimension, we prove that the supremum and infimum
of the spectrum of both operators remain identical by growing the space dimension, and
computing the limit of the largest and smallest eigenvalues when the space dimension
tends to infinity.
■
This theorem proves that linear combination of frame vectors define a stable
signal representation. Section 5.1.2 proves that synthesis coefficients are computed
with a dual frame. The operator ⌽ ⌽∗ is the Gram matrix {n , p }(m,p)∈2 (⌫) :
⌽ ⌽∗ a[ p] ⫽
a[m] n , p .
(5.6)
m∈⌫
One must be careful because (5.5) is only valid for a ∈ Im⌽. If it is valid for all
a ∈ 2 (⌫) with A ⬎ 0 then the family is linearly independant and is thus a Riesz
basis.
Redundancy
When the frame vectors are normalized n ⫽ 1,Theorem 5.2 shows that the frame
redundancy is measured by the frame bounds A and B.
Theorem 5.2. In a space of finite dimension N , a frame of P ⭓ N normalized vectors has
frame bounds A and B, which satisfy
A⭐
P
⭐ B.
N
(5.7)
For a tight frame A ⫽ B ⫽ P/N .
Proof. It results from (5.4) that all eigenvalues of ⌽∗ ⌽ are between A and B. The trace of
⌽∗ ⌽ thus satisfies
A N ⭐ tr(⌽∗ ⌽) ⭐ B N .
But since the trace is not modified by commuting matrices (Exercise 5.4), and n ⫽ 1,
A N ⭐ tr(⌽∗ ⌽) ⫽ tr(⌽ ⌽∗ ) ⫽
P
|n , n |2 ⫽ P ⭐ B N ,
n⫽1
which implies (5.7).
■
If {n }n∈⌫ is a normalized Riesz basis and is therefore linearly independent, then
(5.7) proves that A ⭐ 1 ⭐ B. This result remains valid in infinite dimension. Inserting
f ⫽ n in the frame inequality (5.2) proves that the frame is orthonormal if and only
if B ⫽ 1, in which case A ⫽ 1.
157
158
CHAPTER 5 Frames
EXAMPLE 5.1
Let { g1 , g2 } be an orthonormal basis of an N ⫽ 2 two-dimensional plane H. The P ⫽ 3
normalized vectors
1 ⫽ g1 , 2 ⫽ ⫺
√
√
g1
g1
3
3
⫹
g2 , 3 ⫽ ⫺ ⫺
g2
2
2
2
2
have equal angles of 2/3 between each other. For any f ∈ H,
3
| f , n |2 ⫽
n⫽1
3
f 2 .
2
Thus, these three vectors define a tight frame with A ⫽ B ⫽ 3/2.
EXAMPLE 5.2
For any 0 ⭐ k ⬍ K , suppose that {k,n }n∈⌫ is an orthonormal basis of H. The union of these
K orthonormal bases {k,n }n∈⌫,0⭐k⬍K is a tight frame with A ⫽ B ⫽ K . Indeed, the energy
conservation in an orthonormal basis implies that for any f ∈ H,
| f , k,n |2 ⫽ f 2 ,
n∈Z
therefore,
K
⫺1 | f , k,n |2 ⫽ K f 2 .
k⫽0 n∈Z
One can verify (Exercise 5.3) that a finite set of N vectors {n }1⭐n⭐N is always
a frame of the space V generated by linear combinations of these vectors. When N
increases,the frame bounds A and B may go respectively to 0 and ⫹⬁.This illustrates
the fact that in infinite dimensional spaces, a family of vectors may be complete and
not yield a stable signal representation.
Irregular Sampling
Let Us be the space of L 2 (R) functions having a Fourier transform support
included in [⫺/s, /s]. For a uniform sampling, tn ⫽ ns, Theorem 3.5 proves
that if s (t) ⫽ s1/2 sin(s⫺1 t)/(t), then {s (t ⫺ ns)}n∈Z is an orthonormal basis
of Us . The reconstruction of f from its samples is then given by the sampling
Theorem 3.2.
The irregular sampling conditions of Duffin and Schaeffer [235] for constructing a frame were later refined by several researchers [81, 102, 500]. Gröchenig
proved [285] that if lim tn ⫽ ⫹⬁ and lim tn ⫽ ⫺⬁,and if the maximum sampling
distance ␦ satisfies
n→⫺⬁
n→⫹⬁
␦ ⫽ sup |tn⫹1 ⫺ tn | ⬍ s,
n∈Z
(5.8)
5.1 Frames and Riesz Bases
then
{n s (t ⫺ tn )}n∈Z
with
n ⫽
tn⫹1 ⫺ tn⫺1
2s
is a frame with frame bounds A ⭓ (1 ⫺ ␦/s)2 and B ⭐ (1 ⫹ ␦/s)2 .The amplitude factor
n compensates for the increase of sample density relatively to s.The reconstruction
of f requires inverting the frame operator ⌽ f [n] ⫽ f (u), n , s (u ⫺ tn ).
5.1.2 Dual Frame and Pseudo Inverse
The reconstruction of f from its frame coefficients ⌽ f [n] is calculated with a pseudo
inverse also called Moore-Penrose pseudo inverse.This pseudo inverse is a bounded
operator that implements a dual-frame reconstruction. For Riesz bases, this dual
frame is a biorthogonal basis.
For any operator U , we denote by ImU the image space of all U f and by NullU
the null space of all h, such that U h ⫽ 0.
Theorem 5.3. If {n }n∈⌫ is a frame but not a Riesz basis, then ⌽ admits an infinite
number of left inverses.
Proof. We know that Null⌽∗ ⫽ (Im⌽)⊥ is the orthogonal complement of Im⌽ in 2 (⌫)
(Exercise 5.7). If ⌽ is a frame and not a Riesz basis, then {n }n∈⌫ is linearly dependent,
so there exists a ∈ Null⌽∗ ⫽ (Im⌽)⊥ with a ⫽ 0.
A frame operator ⌽ is injective (one to one). Indeed,the frame inequality (5.2) guarantees that ⌽ f ⫽ 0 implies f ⫽ 0. Its restriction to Im⌽ is thus invertible, which means that
⌽ admits a left inverse. There is an infinite number of left inverses since the restriction
of a left inverse to (Im⌽)⊥ ⫽ {0} may be any arbitrary linear operator.
■
The more redundant the frame {n }n∈⌫ , the larger the orthogonal complement
(Im⌽)⊥ of Im⌽ in 2 (⌫). The pseudo inverse, written as ⌽⫹ , is defined as the left
inverse that is zero on (Im⌽)⊥ :
᭙f ∈ H, ⌽⫹ ⌽ f ⫽ f
and
᭙a ∈ (Im⌽)⊥ , ⌽⫹ a ⫽ 0.
(5.9)
Theorem 5.4 computes this pseudo inverse.
Theorem 5.4: Pseudo Inverse. If ⌽ is a frame operator, then ⌽∗ ⌽ is invertible and the
pseudo inverse satisfies
⌽⫹ ⫽ (⌽∗ ⌽)⫺1 ⌽∗ .
(5.10)
Proof. The frame condition in (5.4) is rewritten as
᭙f ∈ H,
A f 2 ⭐ ⌽∗ ⌽ f , f ⭐ B f 2 .
The result is that ⌽∗ ⌽ is an injective self-adjoint operator: ⌽∗ ⌽ f ⫽ 0 if and only if f ⫽ 0.
It is therefore invertible. For all f ∈ H,
⌽⫹ ⌽ f ⫽ (⌽∗ ⌽)⫺1 ⌽∗ ⌽ f ⫽ f ,
so ⌽⫹ is a left inverse. Since (Im⌽)⊥ ⫽ Null⌽∗ , it results that ⌽⫹ a ⫽ 0 for any
a ∈ (Im⌽)⊥ ⫽ Null⌽∗ . Since this left inverse vanishes on (Im⌽)⊥ , it is the pseudo
inverse.
■
159
160
CHAPTER 5 Frames
Dual Frame
The pseudo inverse of a frame operator implements a reconstruction with a dual
frame, which is specified by Theorem 5.5.
Theorem 5.5. Let {n }n∈⌫ be a frame with bounds 0 ⬍ A ⭐ B. The dual operator
defined by
⌽̃ f [n] ⫽ f , ˜ n ᭙n ∈ ⌫,
satisfies ⌽̃∗ ⫽ ⌽⫹ , and thus
f⫽
f , n ˜ n ⫽
n∈⌫
˜ n ⫽ (⌽∗ ⌽)⫺1 n
with
f , ˜ n n .
(5.11)
(5.12)
n∈⌫
It defines a dual frame as
1
1
| f , ˜ n |2 ⭐ f 2 .
f 2 ⭐
B
A
᭙f ∈ H,
(5.13)
n∈⌫
If the frame is tight (i.e., A ⫽ B), then ˜ n ⫽ A⫺1 n .
Proof. The dual operator can be written as ⌽̃ ⫽ ⌽(⌽∗ ⌽)⫺1 . Indeed,
˜ n ⫽ f , (⌽∗ ⌽)⫺1 n ⫽ (⌽∗ ⌽)⫺1 f , n ⫽ ⌽(⌽∗ ⌽)⫺1 f .
⌽̃ f [n] ⫽ f ,
Thus, we derive from (5.10) that its adjoint is the pseudo inverse of ⌽:
⌽̃∗ ⫽ (⌽∗ ⌽)⫺1 ⌽∗ ⫽ ⌽⫹ .
It results that ⌽⫹ ⌽ ⫽ ⌽̃∗ ⌽ ⫽ Id and thus that ⌽∗ ⌽̃ ⫽ Id, which proves (5.12).
Let us now prove the frame bounds (5.13). Frame conditions are rewritten in (5.4):
᭙f ∈ H,
A f 2 ⭐ ⌽∗ ⌽ f , f ⭐ B f 2 .
(5.14)
Lemma 5.1 applied to L ⫽ ⌽∗ ⌽ proves that
᭙f ∈ H,
B⫺1 f 2 ⭐ (⌽∗ ⌽)⫺1 f , f ⭐ A⫺1 f 2 .
(5.15)
Since for any f ∈ H
⌽̃ f 2 ⫽ ⌽(⌽∗ ⌽)⫺1 f , ⌽(⌽∗ ⌽)⫺1 f ⫽ f , (⌽∗ ⌽)⫺1 f ,
the dual-frame bounds (5.13) are derived from (5.15).
If A ⫽ B, then ⌽∗ ⌽ f , f ⫽ A f 2 . Thus, the spectrum of ⌽∗ ⌽ is reduced to A and
˜ n ⫽ (⌽∗ ⌽)⫺1 n ⫽ A⫺1 n .
therefore ⌽∗ ⌽ ⫽ A Id. As a result,
Lemma 5.1. If L is a self-adjoint operator such that there exist B ⭓ A ⬎ 0 satisfying
᭙f ∈ H,
A f 2 ⭐ L f , f ⭐ B f 2 ,
(5.16)
1
1
f 2 ⭐ L⫺1 f , f ⭐ f 2 .
B
A
(5.17)
then L is invertible and
᭙f ∈ H,
5.1 Frames and Riesz Bases
In finite dimensions, since L is self-adjoint we know that it is diagonalized in an
orthonormal basis. The inequality (5.16) proves that its eigenvalues are between A and B.
It is therefore invertible with eigenvalues between B⫺1 and A⫺1 , which proves (5.17).
In a Hilbert space of infinite dimension, we prove that same result on the supremum and
infimum of the spectrum by growing the space dimension, and computing the limit of
the largest and smallest eigenvalues when the space dimension tends to infinity.
■
This theorem proves that f is reconstructed from frame coefficients ⌽ f [n] ⫽
f , n with the dual frame {˜ n }n∈⌫ . The synthesis coefficients of f in {n }n∈⌫
are the dual-frame coefficients ⌽̃ f [n] ⫽ f , n . If the frame is tight, then both
decompositions are identical:
1
f⫽
f , n n .
(5.18)
A
n∈⌫
Biorthogonal Bases
A Riesz basis is a frame of vectors that are linearly independent, which implies
that Im⌽ ⫽ 2 (⌫), so its dual frame is also linearly independent. Inserting f ⫽ p in
(5.12) yields
˜ n n ,
p ⫽
p ,
n∈⌫
and the linear independence implies that
˜ n ⫽ ␦[ p ⫺ n].
p ,
Thus, dual Riesz bases are biorthogonal families of vectors. If the basis is normalized
(i.e., n ⫽ 1), then
A ⭐ 1 ⭐ B.
(5.19)
This is proved by inserting f ⫽ p in the frame inequality (5.13):
1
˜ n |2 ⫽ 1 ⭐ 1 p 2 .
|p ,
p 2 ⭐
B
A
n∈⌫
5.1.3 Dual-Frame Analysis and Synthesis Computations
Suppose that {n }n∈⌫ is a frame of a subspace V of the whole signal space. The best
linear approximation of f in V is the orthogonal projection of f in V. Theorem 5.6
shows that this orthogonal projection is computed with the dual frame.Two iterative
numerical algorithms are described to implement such computations.
Theorem 5.6. Let {n }n∈⌫ be a frame of V, and {˜ n }n∈⌫ it’s dual frame in V. The
orthogonal projection of f ∈ H in V is
˜n⫽
PV f ⫽
f , n
f , ˜ n n .
(5.20)
n∈⌫
n∈⌫
161
162
CHAPTER 5 Frames
Proof. Since both frames are dual in V, if f ∈ V, then, (5.12) proves that the operator PV
defined in (5.20) satisfies PV f ⫽ f . To prove that it is an orthogonal projection it is
sufficient to verify that if f ∈ H then f ⫺ PV f , p ⫽ 0 for all p ∈ ⌫. Indeed,
f ⫺ PV f , p ⫽ f , p ⫺
f , n ˜ n , p ⫽ 0
n∈⌫
because the dual-frame property implies that
˜
n∈⌫ n , p n ⫽ p .
■
If ⌫ is finite, then {n }n∈⌫ is necessarily a frame of the space V it generates, and
(5.20) reconstructs the best linear approximation of f in V.This result is particularly
important for approximating signals from a finite set of vectors.
Since ⌽ is not a frame of the whole signal space H, but of a subspace V then ⌽
is only invertible on this subspace and the pseudo-inverse definition becomes:
᭙f ∈ V, ⌽⫹ ⌽ f ⫽ f
and ᭙a ∈ (Im⌽)⊥ , ⌽⫹ a ⫽ 0.
(5.21)
Let ⌽V be the restriction of ⌽ to V. The operator ⌽∗ ⌽V is invertible on V and we
write (⌽∗ ⌽V )⫺1 its inverse. Similar to (5.10), we verify that ⌽⫹ ⫽ (⌽∗ ⌽V )⫺1 ⌽∗ .
Dual Synthesis
In a dual synthesis problem, the orthogonal projection is computed from the frame
coefficients {⌽ f [n] ⫽ f , n }n∈⌫ with the dual-frame synthesis operator:
PV f ⫽ ⌽̃∗ ⌽ f ⫽
f , n ˜ n .
(5.22)
n∈⌫
If the frame {n }n∈⌫ does not depend on the signal f ,then the dual-frame vectors
are precomputed with (5.11):
᭙n ∈ ⌫,
˜ n ⫽ (⌽∗ ⌽V )⫺1 n ,
(5.23)
and the dual synthesis is solved directly with (5.22). In many applications,the frame
vectors {n }n∈⌫ depend on the signal f , in which case the dual-frame vectors ˜ n
cannot be computed in advance, and it is highly inefficient to compute them. This
is the case when coefficients { f , n }n∈⌫ are selected in a redundant transform, to
build a sparse signal representation. For example,the time-frequency ridge vectors in
Sections 4.4.2 and 4.4.3 are selected from the local maxima of f in highly redundant
windowed Fourier or wavelet transforms.
The transform coefficients ⌽ f are known and we must compute
PV f ⫽ ⌽̃∗ ⌽ f ⫽ (⌽∗ ⌽V )⫺1 ⌽∗ ⌽ f .
A dual-synthesis algorithm computes first
y ⫽ ⌽∗ ⌽ f ⫽
f , n n ∈ V
n∈⌫
5.1 Frames and Riesz Bases
and then derives PV f ⫽ L⫺1 y ⫽ z by applying the inverse of the symmetric operator
L ⫽ ⌽∗ ⌽V to y, with
᭙h ∈ V, Lh ⫽
h, n n .
(5.24)
n∈⌫
The eigenvalues of L are between A and B.
Dual Analysis
In a dual analysis,the orthogonal projection PV f is computed from the frame vectors
{n }n∈⌫ with the dual-frame analysis operator ⌽̃ f [n] ⫽ f , ˜ n :
PV f ⫽ ⌽∗ ⌽̃ f ⫽
f , ˜ n n .
(5.25)
n∈⌫
˜ n }n∈⌫ is precomputed with (5.23).
If {n }n∈⌫ does not depend upon f then {
The {n }n∈⌫ may also be selected adaptively from a larger dictionary, to provide a sparse approximation of f . Computing the orthogonal projection PV f is
called a backprojection. In Section 12.3, matching pursuits implement this backprojection.
When {n }n∈⌫ depends on f , computing the dual frame is inefficient. The dual
coefficient a[n] ⫽ ⌽̃ f [n] is calculated directly, as well as
PV f ⫽ ⌽ ∗ a ⫽
a[n] n .
(5.26)
n∈⌫
Since ⌽PV f ⫽ ⌽ f ,we have ⌽ ⌽∗ a ⫽ ⌽ f . Let ⌽∗Im⌽ be the restriction of ⌽∗ to Im⌽.
Since ⌽ ⌽∗Im⌽ is invertible on Im⌽
a ⫽ (⌽ ⌽∗Im⌽ )⫺1 ⌽ f .
Thus, the dual-analysis algorithm computes y ⫽ ⌽ f ⫽ { f , n }n∈⌫ and derives the
dual coefficients a ⫽ L⫺1 y ⫽ z by applying the inverse of the Gram operator L ⫽
⌽ ⌽∗Im⌽ to y, with
Lh[n] ⫽
h[n] n , p .
(5.27)
p∈⌫
The eigenvalues of L are also between A and B. The orthogonal projection of f is
recovered with (5.26).
Richardson Inversion of Symmetric Operators
The key computational step of a dual-analysis or a dual-synthesis problem is to compute z ⫽ L⫺1 y, where L is a symmetric operator with eigenvalues that are between
A and B. Theorems 5.7 and 5.8 describe two iterative algorithms with exponential
convergence. The Richardson iteration procedure is simpler but requires knowing
the frame bounds A and B. Conjugate gradient iterations converge more quickly
when B/A is large, and do not require knowing the values of A and B.
163
164
CHAPTER 5 Frames
Theorem 5.7. To compute z ⫽ L⫺1 y, let z0 be an initial value and ␥ ⬎ 0 be a relaxation
parameter. For any k ⬎ 0, define
zk ⫽ zk⫺1 ⫹ ␥ ( y ⫺ Lzk⫺1 ).
(5.28)
␦ ⫽ max {|1 ⫺ ␥A|, |1 ⫺ ␥B|} ⬍ 1,
(5.29)
z ⫺ zk ⭐ ␦k z ⫺ z0 ,
(5.30)
If
then
and therefore lim zk ⫽ z.
k→⫹⬁
Proof. The induction equation (5.28) can be rewritten as
z ⫺ zk ⫽ z ⫺ zk⫺1 ⫺ ␥ L(z ⫺ zk⫺1 ).
Let
R ⫽ Id ⫺ ␥ L,
z ⫺ zk ⫽ R(z ⫺ zk⫺1 ) ⫽ Rk (z ⫺ z0 ).
(5.31)
Since the eigenvalues of L are between A and B,
A z2 ⭐ Lz, z ⭐ B z2 .
This implies that R ⫽ I ⫺ ␥L satisfies
|Rz, z| ⭐ ␦ z2 ,
where ␦ is given by (5.29). Since R is symmetric, this inequality proves that R ⭐ ␦. Thus,
we derive (5.30) from (5.31). The error z ⫺ zk clearly converges to zero if ␦ ⬍ 1.
■
The convergence is guaranteed for all initial values z0 . If an estimation z0 of
the solution z is known, then this estimation can be chosen; otherwise, z0 is often
set to 0. For frame inversion, the Richardson iteration algorithm is sometimes called
the frame algorithm [19]. The convergence rate is maximized when ␦ is minimum:
␦⫽
B ⫺ A 1 ⫺ A/B
⫽
,
B ⫹ A 1 ⫹ A/B
which corresponds to the relaxation parameter
␥⫽
2
.
A⫹B
(5.32)
The algorithm converges quickly if A/B is close to 1. If A/B is small then
␦≈1⫺2
A
.
B
(5.33)
5.1 Frames and Riesz Bases
The inequality (5.30) proves that we obtain an error smaller than for a number n
of iterations, which satisfies
z ⫺ zk ⭐ ␦k ⫽ .
z ⫺ z0 Inserting (5.33) gives
loge
⫺B
k≈
(5.34)
≈
loge .
loge (1 ⫺ 2A/B) 2A
Therefore, the number of iterations increases proportionally to the frame-bound
ratio B/A.
The exact values of A and B are often not known,and A is generally more difficult
to compute. The upper frame bound is B ⫽ ⌽ ⌽∗ S ⫽ ⌽∗ ⌽S . If we choose
␥ ⬍ 2 ⌽ ⌽∗ ⫺1
S ,
(5.35)
then (5.29) shows that the algorithm is guaranteed to converge,but the convergence
rate depends on A. Since 0 ⬍ A ⭐ B, the optimal relaxation parameter ␥ in (5.32) is
∗ ⫺1
in the range ⌽ ⌽∗ ⫺1
S ⭐ ␥ ⬍ 2 ⌽ ⌽ S .
Conjugate-Gradient Inversion
The conjugate-gradient algorithm computes z ⫽ L⫺1 y with a gradient descent
along orthogonal directions with respect to the norm induced by the symmetric
operator L:
z2L ⫽ Lz2 .
(5.36)
This L norm is used to estimate the error. Gröchenig’s [287] implementation of the
conjugate-gradient algorithm is given by Theorem 5.8.
Theorem 5.8: Conjugate Gradient. To compute z ⫽ L⫺1 y, we initialize
z0 ⫽ 0,
r0 ⫽ p0 ⫽ y,
p⫺1 ⫽ 0.
(5.37)
For any k ⭓ 0, we define by induction:
rk , pk pk , Lpk (5.38)
zk⫹1 ⫽ zk ⫹ k pk
(5.39)
rk⫹1 ⫽ rk ⫺ k Lpk
(5.40)
k ⫽
pk⫹1 ⫽ Lpk ⫺
√
√
Lpk , Lpk Lpk , Lpk⫺1 pk ⫺
pk⫺1 .
pk , Lpk pk⫺1 , Lpk⫺1 (5.41)
2 k
zL ,
1 ⫹ 2k
(5.42)
If ⫽ √B⫺√A , then
B⫹ A
z ⫺ zk L ⭐
and therefore lim zk ⫽ z.
k→⫹⬁
165
166
CHAPTER 5 Frames
Proof. We give the main steps of the proof as outlined by Gröchenig [287].
Step 1. Let Uk be the subspace generated by {L j z}1⭐j⭐k . By induction on k, we derive from
(5.41) that pj ∈ Uk , for j ⬍ k.
Step 2. We prove by induction that {pj }0⭐j⬍k is an orthogonal basis of Uk with respect to
the inner product z, hL ⫽ z, Lh. Assuming that pk , Lpj ⫽ 0, for j ⭐ k ⫺ 1, it can
be shown that pk⫹1 , Lpj ⫽ 0, for j ⭐ k.
Step 3. We verify that zk is the orthogonal projection of z onto Uk with respect to ., .L ,
which means that
᭙g ∈ Uk ,
z ⫺ gL ⭓ z ⫺ zk L .
Since zk ∈ Uk , this requires proving that z ⫺ zk , pj L ⫽ 0, for j ⬍ k.
Step 4. We compute the orthogonal projection of z in embedded spaces Uk of dimension k,
and one can verify that limk→⫹⬁ z ⫺ zk L ⫽ 0. The exponential convergence (5.42)
is proved in [287].
■
As opposed to the Richardson algorithm, the initial value z0 must be set to 0. As
in the Richardson iteration algorithm,the convergence is slower when A/B is small.
In this case,
√
1 ⫺ A/B
A
⫽
.
≈1⫺2
√
B
1 ⫹ A/B
The upper bound (5.42) proves that we obtain a relative error
z ⫺ zk L
⭐
zL
for a number of iterations
√
loge 2 ⫺ B
k≈
≈ √ loge .
loge 2 A
2
Comparing this result with (5.34) shows that when A/B is small, the conjugategradient algorithm needs much less iterations than the Richardson iteration algorithm to compute z ⫽ L⫺1 y at a fixed precision.
5.1.4 Frame Projector and Reproducing Kernel
Frame redundancy is useful in reducing noise added to the frame coefficients. The
vector computed with noisy frame coefficients is projected on the image of ⌽ to
reduce the amplitude of the noise. This technique is used for high-precision analogto-digital conversion based on oversampling. The following theorem specifies the
orthogonal projector on Im⌽.
Theorem 5.9: Reproducing Kernel. Let {n }n∈⌫ be a frame of H or of a subspace V.
The orthogonal projection from 2 (⌫) onto Im⌽ is
Pa[n] ⫽ ⌽ ⌽⫹ a[n] ⫽
a[ p] ˜ p , n .
(5.43)
p∈⌫
5.1 Frames and Riesz Bases
Coefficients a ∈ 2 (⌫) are frame coefficients a ∈ Im⌽ if and only if they satisfy the
reproducing kernel equation
a[n] ⫽ ⌽ ⌽⫹ a[n] ⫽
a[ p] ˜ p , n .
(5.44)
p∈⌫
Proof. If a ∈ Im⌽, then a ⫽ ⌽ f and
Pa ⫽ ⌽ ⌽⫹ ⌽ f ⫽ ⌽ f ⫽ a.
If a ∈ (Im⌽)⊥ ,then Pa ⫽ 0 because ⌽⫹ a ⫽ 0.This
proves that P is an orthogonal projector
˜ p , we derive (5.43).
on Im⌽. Since ⌽ f [n] ⫽ f , n and ⌽⫹ a ⫽ p∈⌫ a[ p]
2
A vector a ∈ (⌫) belongs to Im⌽ if and only if a ⫽ Pa, which proves (5.44).
■
The reproducing kernel equation (5.44) expresses the redundancy of frame coefficients. If the frame is not redundant and is a Riesz basis, then ˜ p , n ⫽ 0, so this
equation vanishes.
Noise Reduction
Suppose that each frame coefficient ⌽ f [n] is contaminated by an additive noise
W [n], which is a random variable. Applying the projector P gives
P(⌽ f ⫹ W ) ⫽ ⌽ f ⫹ PW ,
with
PW [n] ⫽
˜ p , n .
W [ p]
p∈⌫
Since P is an orthogonal projector,PW ⭐ W .This projector removes the component of W that is in (Im⌽)⊥ . Increasing the redundancy of the frame reduces the size
of Im⌽ and thus increases (Im⌽)⊥ ,so a larger portion of the noise is removed. If W
is a white noise,its energy is uniformly distributed in the space 2 (⌫).Theorem 5.10
proves that its energy is reduced by at least A if the frame vectors are normalized.
Theorem 5.10. Suppose that n ⫽ C, for all n ∈ ⌫. If W is a zero-mean white noise of
variance E{|W [n]|2 } ⫽ 2 , then
E{|PW [n]|2 } ⭐
2 C 2
.
A
If the frame is tight then this inequality is an equality.
Proof. Let us compute
⎧⎛
⎫
⎞
⎨ ⎬
E{|PW [n]|2 } ⫽ E ⎝
W [ p] ˜ p , n ⎠
W ∗ [l] ˜ l , n ∗
.
⎩
⎭
p∈⌫
l∈⌫
(5.45)
167
168
CHAPTER 5 Frames
Since W is white,
E{W [ p] W ∗ [l]} ⫽ 2 ␦[ p ⫺ l],
and therefore,
E{|PW [n]|2 } ⫽ 2
|˜ p , n |2 ⭐
p∈⌫
2 n 2 2 C 2
⫽
.
A
A
■
The last inequality is an equality if the frame is tight.
Oversampling
This noise-reduction strategy is used by high-precision analog-to-digital converters.
After a low-pass filter, a band-limited analog signal f (t) is uniformly sampled and
quantized. In hardware, it is often easier to increase the sampling rate rather than
the quantization precision. Increasing the sampling rate introduces a redundancy
between the sample values of the band-limited signal. Thus, these samples can be
interpreted as frame coefficients. For a wide range of signals it has been shown
that the quantization error is nearly a white noise [277]. Thus, it can be significantly
reduced by a frame projector, which in this case is a low-pass convolution operator
(Exercise 5.16).
The noise can be further reduced if it is not white and if its energy is better
concentrated in (Im⌽)⊥ . This can be done by transforming the quantization noise
into a noise that has energy mostly concentrated at high frequencies. Sigma–delta
modulators produce such quantization noises by integrating the signal before its
quantization [89]. To compensate for the integration, the quantized signal is differentiated. This differentiation increases the energy of the quantized noise at high
frequencies and reduces its energy at low frequencies [456].
5.1.5 Translation-Invariant Frames
Section 4.1 introduces translation-invariant dictionaries obtained by translating a
family of generators {n }n∈⌫ , which are used to construct translation-invariant
signal representations. In multiple dimensions for n ∈ L 2 (Rd ), the resulting dictionary can be written D ⫽ {u,n (x)}n∈⌫,u∈Rd , with u,n (x) ⫽ u,n n (x ⫺ u). In a
translation-invariant wavelet dictionary, the generators are obtained by dilating a
⫺1/2
wavelet (t) with scales sn : n (t) ⫽ sn (x/sn ). In a window Fourier dictionary,
the generators are obtained by modulating a window g(x) at frequencies n :n (x) ⫽
ein x g(x).
The decomposition coefficients of f in D are convolution products
⌽ f (u, n) ⫽ f , u,n ⫽ u,n f ¯ n (u)
with
¯ n (x) ⫽ ∗n (⫺x).
(5.46)
Suppose that ⌫ is a countable set. The overall index set Rd ⫻ ⌫ is not countable, so
the dictionary D cannot strictly speaking be considered as a frame. However, if we
5.1 Frames and Riesz Bases
consider the overall energy of dictionary coefficients, calculated with a sum and a
multidimensional integral
⌽ f (u, n)2 ⫽
|⌽ f (u, n)| du,
n∈⌫
n∈⌫
and if there exist two constants A ⬎ 0 and B ⬎ 0 such that for all f ∈ L 2 (R),
A f 2 ⭐
⌽ f (u, n)2 ⭐ B f 2 ,
(5.47)
n∈⌫
then the frame theory results of the previous section apply. Thus, with an abuse
of language, such translation-invariant dictionaries will also be called frames. Theorem 5.11 proves that the frame condition (5.47) is equivalent to a condition on the
ˆ n () of the generators.
Fourier transform
Theorem 5.11. If there exist two constants B ⭓ A ⬎ 0 such that for almost all in Rd
ˆ n ()|2 ⭐ B,
A⭐
|
(5.48)
n∈⌫
˜ n }n∈⌫ that satisfies for
then the frame inequality (5.47) is valid for all f ∈ L 2 (Rd ). Any {
d
almost all in R
ˆ ∗ () ˜ n () ⫽ 1,
(5.49)
n
n∈⌫
defines a left inverse
f (t) ⫽
˜ n (t).
⌽ f (., n)
(5.50)
n∈⌫
The pseudo inverse (dual frame) is implemented by
ˆ
˜ n () ⫽ n ()
.
2
ˆ
n∈⌫ |n ()|
(5.51)
Proof. The frame condition (5.47) means that ⌽∗ ⌽ has a spectrum bounded by A and B. It
results from (5.46) that
⌽∗ ⌽ f (x) ⫽ f (
¯ n )(x).
n
(5.52)
n∈⌫
The
operator is given by the Fourier transform of
spectrum of this convolution
2 . Thus, the frame inequality (5.47) is equivalent
¯
ˆ
(x),
which
is
|
()|
n
n
n∈⌫ n
n∈⌫
to condition‘(5.48).
Equation (5.50) is proved by taking the Fourier transform on both sides and inserting
(5.49).
Theorem 5.5 proves that the dual-frame vectors implementing the pseudo inverse
are ˜ n,u ⫽ (⌽∗ ⌽)⫺1 n,u . Since ⌽∗ ⌽ is the convolution operator (5.52), its inverse is
calculated by inverting its transfer function, which yields (5.51).
■
169
170
CHAPTER 5 Frames
For wavelet or windowed Fourier translation-invariant dictionaries, the theorem
ˆ
condition (5.48) becomes a condition on the Fourier transform of the wavelet ()
or on the Fourier transform of the window ĝ(). As explained in Sections 5.3 and
5.4, more conditions are needed to obtain a frame by discretizing the translation
parameter u.
Discrete Translation-Invariant Frames
For finite-dimensional signals f [n] ∈ CN a circular translation-invariant frame is
obtained with a periodic shift modulo N of a finite number of generators
{m [n]}0⭐m⬍M :
D ⫽ {m,p [n] ⫽ m [(n ⫺ p)modN ]}0⭐m⬍M,0⭐p⬍N .
Such translation-invariant frames appear in Section 11.2.3 to define translationinvariant thresholding estimators for noise removal. Similar to Theorem 5.11,
Theorem 5.12 gives a necessary and sufficient condition on the discrete Fourier
ˆ m [k] ⫽ N ⫺1 m [n]e⫺i2k/N of the generators m [n] to obtain a frame.
transform
n⫽0
Theorem 5.12. A circular translation-invariant dictionary D ⫽ {m,p [n]}0⭐m⬍M,
0⭐n⬍N is a frame with frame bounds 0 ⬍ A ⭐ B if and only if
᭙0 ⭐ k ⬍ N A ⭐
M⫺1
|ˆ m [k]|2 ⭐ B.
(5.53)
m⫽0
The proof proceeds essentially like the proof of Theorem 5.11, and is left in
Exercise 5.8.
5.2 TRANSLATION-INVARIANT DYADIC WAVELET TRANSFORM
The continuous wavelet transform of Section 4.3 decomposes one-dimensional
signals f ∈ L 2 (R) over a dictionary of translated and dilated wavelets
1
t ⫺u
u,s (t) ⫽ √
.
s
s
Translation-invariant wavelet dictionaries are constructed by sampling the scale
parameter s along an exponential sequence { j }j∈Z , while keeping all translation
parameters u. We choose ⫽ 2 to simplify computer implementations:
1
t ⫺u
D ⫽ u,2 j (t) ⫽ √
.
2j
2j
u∈R,j∈Z
The resulting dyadic wavelet transform of f ∈ L 2 (R) is defined by
⫹⬁
1
t ⫺u
j
W f (u, 2 ) ⫽ f , u,2 j ⫽
f (t) √
dt ⫽ f ¯ 2 j (u),
2j
2j
⫺⬁
(5.54)
5.2 Translation-Invariant Dyadic Wavelet Transform
with
1
¯ 2 j (t) ⫽ 2 j (⫺t) ⫽ j
2
⫺t
.
2j
Translation-invariant dyadic wavelet transforms are used in pattern-recognition applications and for denoising with translation-invariant wavelet thresholding estimators,
as explained in Section 11.3.1. Fast computations with filter banks are presented in
the next two sections.
Theorem 5.11 on translation-invariant dictionaries can be applied to the mulˆ j ), the Fourier
tiscale wavelet generators n (t) ⫽ 2⫺j/2 2 j (t). Since ˆ n () ⫽ (2
condition (5.48) means that there exist two constants A ⬎ 0 and B ⬎ 0 such that
⫹⬁
ˆ j )|2 ⭐ B,
᭙ ∈ R ⫺ {0}, A ⭐
|(2
(5.55)
j⫽⫺⬁
and since ⌽ f (u, n) ⫽ 2⫺j/2 W f (u, n),Theorem 5.11 proves the frame inequality
A f 2 ⭐
⫹⬁
1
W f (u, 2 j )2 ⭐ B f 2 .
j
2
j⫽⫺⬁
(5.56)
This shows that if the frequency axis is completely covered by dilated dyadic
wavelets, as shown in Figure 5.1, then a dyadic wavelet transform defines a complete and stable representation.
Moreover, if ˜ satisfies
᭙ ∈ R ⫺ {0},
⫹⬁
˜ j ) ⫽ 1,
ˆ ∗ (2 j ) (2
(5.57)
j⫽⫺⬁
˜ ⫺j t) proves that
then (5.50) applied to ˜ n (t) ⫽ 2⫺j (2
f (t) ⫽
⫹⬁
1
W f (., 2 j ) ˜ 2 j (t).
j
2
j⫽⫺⬁
(5.58)
0.25
0.2
0.15
0.1
0.05
0
22
0
2
FIGURE 5.1
ˆ j )|2 computed with (5.69), for 1 ⭐ j ⭐ 5 and ∈ [⫺, ].
Scaled Fourier transforms |(2
171
172
CHAPTER 5 Frames
Signal
227
226
225
224
223
Approximation
223
FIGURE 5.2
Dyadic wavelet transform W f (u, 2 j ) computed at scales 2⫺7 ⭐ 2 j ⭐ 2⫺3 with filter bank
algorithm from Section 5.2.2, for a signal defined over [0, 1]. The bottom curve carries lower
frequencies corresponding to scales larger than 2⫺3 .
Figure 5.2 gives a dyadic wavelet transform computed over five scales with the
quadratic spline wavelet shown later in Figure 5.3.
5.2.1 Dyadic Wavelet Design
A discrete dyadic wavelet transform can be computed with a fast filter bank algorithm
if the wavelet is appropriately designed. The synthesis of these dyadic wavelets is
similar to the construction of biorthogonal wavelet bases, explained in Section 7.4.
All technical issues related to the convergence of infinite cascades of filters are
avoided in this section. Reading Chapter 7 first is necessary for understanding the
main results.
Let h and g be a pair of finite impulse-response filters.
Suppose that h is a low-pass
√
filter with a transfer function that satisfies ĥ(0) ⫽ 2. As in the case of orthogonal
and biorthogonal wavelet bases, we construct a scaling function with a Fourier
transform:
⫹⬁
1 ˆ
ĥ(2⫺p )
ˆ
⫽ √ ĥ
.
(5.59)
()
⫽
√
2
2
2
2
p⫽1
We suppose here that this Fourier transform is a finite-energy function so that ∈
L 2 (R). The corresponding wavelet has a Fourier transform defined by
1 ˆ
ˆ
()
⫽ √ ĝ
.
(5.60)
2
2
2
5.2 Translation-Invariant Dyadic Wavelet Transform
Theorem 7.5 proves that both and have a compact support because h and g
have a finite number of nonzero coefficients. The number of vanishing moments of
ˆ
ˆ
is equal to the number of zeroes of ()
at ⫽ 0. Since (0)
⫽ 1, (5.60) implies
that it is also equal to the number of zeros of ĝ() at ⫽ 0.
Reconstructing Wavelets
Reconstructing wavelets that satisfy (5.49) are calculated with a pair of finite impulse
response dual filters h̃ and g̃. We suppose that the following Fourier transform has
a finite energy:
⫹⬁ ⫺p
1 h̃(2 )
˜
⫽ √ h̃
()
⫽
˜
.
√
2
2
2
2
p⫽1
(5.61)
1
˜
g̃
()
⫽√ ˜
.
2
2
2
(5.62)
Let us define
Theorem 5.13 gives a sufficient condition to guarantee that ˜ is the Fourier
transform of a reconstruction wavelet.
Theorem 5.13. If the filters satisfy
᭙ ∈ [⫺, ],
g̃() ĝ ∗ () ⫽ 2,
h̃() ĥ∗ () ⫹
(5.63)
then
᭙ ∈ R ⫺ {0},
⫹⬁
˜ j ) ⫽ 1.
ˆ ∗ (2 j ) (2
(5.64)
j⫽⫺⬁
Proof. The Fourier transform expressions (5.60) and (5.62) prove that
1 ∗ ˜
ˆ∗ .
˜
()
ˆ ∗ () ⫽ g̃
ĝ
2
2
2
2
2
Equation (5.63) implies
1 ∗ ˆ∗
˜
()
ˆ ∗ () ⫽
2 ⫺ h̃
ĥ
˜
2
2
2
2
2
ˆ∗
˜
⫽ ˜
⫺ ()
ˆ ∗ ().
2
2
Therefore,
k
˜ ⫺l ) ⫺ ˆ ∗ (2k ) ˜ k ).
˜ j ) ˆ ∗ (2 j ) ⫽ ˆ ∗ (2⫺l ) (2
(2
(2
j⫽⫺l
√
Since ĝ(0) ⫽ 0, (5.63) implies h̃(0) ĥ∗ (0) ⫽ 2. We also impose that ĥ(0) ⫽ 2, so one can
˜
ˆ ∗ (0) ⫽ 1. Since and
˜ belong to L 1 (R),
derive from (5.59) and (5.61) that (0)
⫽
173
174
CHAPTER 5 Frames
ˆ and ˜ are continuous, and the Riemann-Lebesgue lemma (Exercise 2.8) proves that
ˆ
˜
|()|
and |()|
decrease to zero when goes to ⬁. For ⫽ 0, letting k and l go to ⫹⬁
yields (5.64).
■
Observe that (5.63) is the same as the unit gain condition (7.117) for biorthogonal
wavelets.The aliasing cancellation condition (7.116) of biorthogonal wavelets is not
required because the wavelet transform is not sampled in time.
Finite Impulse Response Solution
Let us shift h and g to obtain causal filters. The resulting transfer functions ĥ() and
ĝ() are polynomials in e⫺i . We suppose that these polynomials have no common
zeros.The Bezout theorem (7.8) on polynomials proves that if P(z) and Q(z) are two
polynomials of degree n and l, with no common zeros, then there exists a unique
pair of polynomials P̃(z) and Q̃(z) of degree l ⫺ 1 and n ⫺ 1 such that
P(z) P̃(z) ⫹ Q(z) Q̃(z) ⫽ 1.
(5.65)
This guarantees the existence of h̃() and g̃(), which are polynomials in e⫺i and
satisfy (5.63). These are the Fourier transforms of the finite impulse response filters
˜ in
h̃ and g̃. However, one must be careful, because the resulting scaling function
(5.61) does not necessarily have a finite energy.
Spline Dyadic Wavelets
A box spline of degree m is a translation of m ⫹ 1 convolutions of 1[0,1] with itself.
It is centered at t ⫽ 1/2 if m is even and at t ⫽ 0 if m is odd. Its Fourier transform is
⫺i
sin(/2) m⫹1
1 if m is even
ˆ
exp
()
⫽
with ⫽
,
(5.66)
0 if m is odd
/2
2
so
ˆ
√ √ (2)
m⫹1
⫺i
⫽ 2 cos
ĥ() ⫽ 2
exp
.
ˆ
2
2
()
(5.67)
We construct a wavelet that has one vanishing moment by choosing ĝ() ⫽ O()
in the neighborhood of ⫽ 0. For example,
√
⫺i
ĝ() ⫽ ⫺i 2 sin exp
.
(5.68)
2
2
The Fourier transform of the resulting wavelet is
1 ˆ ⫺i
ˆ
⫽
()
⫽ √ ĝ
2
2
4
2
sin(/4)
/4
m⫹2
exp
⫺i(1 ⫹ )
.
4
(5.69)
It is the first derivative of a box spline of degree m ⫹ 1 centered at t ⫽ (1 ⫹
)/4. For m ⫽ 2, Figure 5.3 shows the resulting quadratic splines and . The
5.2 Translation-Invariant Dyadic Wavelet Transform
(t )
(t )
0.8
0.5
0.6
0
0.4
0.2
20.5
20.5
0
0.5
1
1.5
0
21
0
1
2
FIGURE 5.3
Quadratic spline wavelet and scaling function.
Table 5.1 Coefficients of Filters Computed from the
Transfer Functions (5.67, 5.68, 5.70) for m ⫽ 2
√
√
√
√
n
h[n]/ 2
g[n]/ 2
g̃[n]/ 2
h̃[n]/ 2
⫺2
⫺0.03125
⫺1
0.125
0.125
0
0.375
0.375
⫺0.5
⫺0.21875
⫺0.6875
1
0.375
0.375
0.5
0.6875
2
0.125
0.125
3
0.21875
0.03125
Note: These filters generate the quadratic spline scaling functions
and wavelets shown in Figure 5.3.
dyadic admissibility condition (5.48) is verified numerically for A ⫽ 0.505 and
B ⫽ 0.522.
˜ and wavelets ˜ that are splines, we choose
To design dual-scaling functions
˜
h̃ ⫽ ĥ. As a consequence, ⫽ and the reconstruction condition (5.63) imply that
m
√
⫺i
2n
2 ⫺ |ĥ()|2
.
(5.70)
cos
g̃() ⫽
⫽ ⫺i 2 exp
sin
∗
ĝ ()
2
2 n⫽0
2
Table 5.1 gives the corresponding filters for m ⫽ 2.
5.2.2 Algorithme à Trous
˜
Suppose that the scaling functions and wavelets ,, ,and
˜ are designed with the
filters h, g, h̃, and g̃. A fast dyadic wavelet transform is calculated with a filter bank
175
176
CHAPTER 5 Frames
algorithm, called algorithme à trous, introduced by Holschneider et al. [303]. It is
similar to a fast biorthogonal wavelet transform, without subsampling [367, 433].
Fast Dyadic Transform
The samples a0 [n] of the input discrete signal are written as a low-pass filtering with
of an analog signal f , in the neighborhood of t ⫽ n:
⫹⬁
¯
a0 [n] ⫽ f (n)
⫽ f (t), (t ⫺ n) ⫽
f (t) (t ⫺ n) dt.
⫺⬁
This is further justified in Section 7.3.1. For any j ⭓ 0, we denote
1 t aj [n] ⫽ f (t), 2 j (t ⫺ n) with 2 j (t) ⫽ √ j .
2
2j
The dyadic wavelet coefficients are computed for j ⬎ 0 over the integer grid
dj [n] ⫽ W f (n, 2 j ) ⫽ f (t), 2 j (t ⫺ n).
For any filter x[n], we denote by xj [n] the filters obtained by inserting 2 j ⫺ 1
zeros between each sample of x[n]. Its Fourier transform is x̂(2 j ). Inserting zeros
in the filters creates holes (trous in French). Let x̄j [n] ⫽ xj [⫺n]. Theorem 5.14 gives
convolution formulas that are cascaded to compute a dyadic wavelet transform and
its inverse.
Theorem 5.14. For any j ⭓ 0,
a j⫹1 [n] ⫽ aj h̄j [n],
and
d j⫹1 [n] ⫽ aj ḡj [n],
1
a j⫹1 h̃j [n] ⫹ d j⫹1 g̃j [n] .
2
a j [n] ⫽
Proof of (5.71). Since
¯ 2 j⫹1 (n)
a j⫹1 [n] ⫽ f
and
d j⫹1 [n] ⫽ f ¯ 2 j⫹1 (n),
we verify with (3.3) that their Fourier transforms are, respectively,
â j⫹1 () ⫽
⫹⬁
f̂ ( ⫹ 2k) ˆ ∗2 j⫹1 ( ⫹ 2k)
k⫽⫺⬁
and
d̂ j⫹1 () ⫽
⫹⬁
f̂ ( ⫹ 2k) ˆ 2∗ j⫹1 ( ⫹ 2k).
k⫽⫺⬁
The properties (5.61) and (5.62) imply that
√
√
ˆ 2 j⫹1 () ⫽ 2 j⫹1 (2
ˆ j⫹1 ) ⫽ ĥ(2 j ) 2 j (2
ˆ j ),
√
√
ˆ j ).
ˆ j⫹1 ) ⫽ ĝ(2 j ) 2 j (2
ˆ 2 j⫹1 () ⫽ 2 j⫹1 (2
(5.71)
(5.72)
5.2 Translation-Invariant Dyadic Wavelet Transform
Since j ⭓ 0, both ĥ(2 j ) and ĝ(2 j ) are 2 periodic, so
â j⫹1 () ⫽ ĥ∗ (2 j ) â j ()
and
d̂ j⫹1 () ⫽ ĝ ∗ (2 j ) â j ().
(5.73)
These two equations are the Fourier transforms of (5.71).
Proof of (5.72). Equation (5.73) implies
â j⫹1 () h̃(2 j ) ⫹ d̂ j⫹1 () g̃(2 j ) ⫽
g̃(2 j ).
â j () ĥ∗ (2 j ) h̃(2 j ) ⫹ â j () ĝ ∗ (2 j ) Inserting the reconstruction condition (5.63) proves that
â j⫹1 () h̃(2 j ) ⫹ d̂ j⫹1 () g̃(2 j ) ⫽ 2 â j (),
■
which is the Fourier transform of (5.72).
The dyadic wavelet representation of a0 is defined as the set of wavelet
coefficients up to a scale 2 J plus the remaining low-frequency information a J :
{dj }1⭐ j⭐ J , aJ .
(5.74)
It is computed from a0 by cascading the convolutions (5.71) for 0 ⭐ j ⬍ J , as illustrated in Figure 5.4(a). The dyadic wavelet transform of Figure 5.2 is calculated
with the filter bank algorithm. The original signal a0 is recovered from its wavelet
representation (5.74) by iterating (5.72) for J ⬎ j ⭓ 0, as illustrated in Figure 5.4(b).
If the input signal a0 [n] has a finite size of N samples,the convolutions (5.71) are
N , and
replaced by circular convolutions. The maximum scale 2 J is then limited
to
⫺1
for J ⫽ log2 N ,one can verify that a J [n] is constant and equal to N ⫺1/2 N
n⫽0 a0 [n].
Suppose that h and g have, respectively, Kh and Kg nonzero samples. The “dilated”
filters hj and gj have the same number of nonzero coefficients. Therefore, the number of multiplications needed to compute a j⫹1 and d j⫹1 from aj or the reverse
aj
hj
aj⫹1
hj⫹1
aj⫹2
gj
dj⫹1
gj⫹1
dj⫹2
(a)
aj⫹2
hj⫹1
dj⫹2
gj⫹1
⫹ ⫻ 1/2
aj⫹1
hj
dj⫹1
gj
⫹ ⫻ 1/2
aj
( b)
FIGURE 5.4
(a) The dyadic wavelet coefficients are computed by cascading convolutions with dilated filters
h̄j and ḡj . (b) The original signal is reconstructed through convolutions with h̃j and g̃j .
A multiplication by 1/2 is necessary to recover the next finer scale signal a j .
177
178
CHAPTER 5 Frames
is equal to (Kh ⫹ Kg )N . Thus, for J ⫽ log2 N , the dyadic wavelet representation
(5.74) and its inverse are calculated with (Kh ⫹ Kg )N log2 N multiplications and
additions.
5.3 SUBSAMPLED WAVELET FRAMES
Wavelet frames are constructed by sampling the scale parameter but also the translation parameter of a wavelet dictionary. A real continuous wavelet transform of
f ∈ L 2 (R) is defined in Section 4.3 by
1
t ⫺u
W f (u, s) ⫽ f , u,s with u,s (t) ⫽ √
,
s
s
where is a real wavelet. Imposing ⫽ 1 implies that u,s ⫽ 1.
Intuitively, to construct a frame we need to cover the time-frequency plane with
the Heisenberg boxes of the corresponding discrete wavelet family. A wavelet u,s
has an energy in time that is centered at u over a domain proportional to s. For
positive frequencies,its Fourier transform ˆ u,s has a support centered at a frequency
/s, with a spread proportional to 1/s. To obtain a full cover, we sample s along
an exponential sequence {a j }j∈Z , with a sufficiently small dilation step a ⬎ 1. The
time translation u is sampled uniformly at intervals proportional to the scale a j , as
illustrated in Figure 5.5. Let us denote
1
t ⫺ nu0 a j
.
j,n (t) ⫽ √
aj
aj
a j21
aj
cjn
0
nu0a j
u0 a j
t
FIGURE 5.5
The Heisenberg box of a wavelet j,n scaled by s ⫽ a j has a time and frequency width
proportional to a j and a⫺j , respectively. The time-frequency plane is covered by these boxes
if u0 and a are sufficiently small.
5.3 Subsampled Wavelet Frames
In the following, we give without proof, some necessary conditions and sufficient
conditions on , a and u0 so that {j,n }( j,n)∈Z2 is a frame of L 2 (R).
Necessary Conditions
We suppose that is real, normalized, and satisfies the admissibility condition of
Theorem 4.4:
⫹⬁ ˆ
|()|2
C ⫽
d ⬍ ⫹⬁.
(5.75)
0
Theorem 5.15: Daubechies. If {j,n }( j,n)∈Z2 is a frame of L 2 (R), then the frame bounds
satisfy
A⭐
᭙ ∈ R ⫺ {0},
C
⭐ B,
u0 loge a
A⭐
⫹⬁
1 ˆ j 2
|(a )| ⭐ B.
u0 j⫽⫺⬁
(5.76)
(5.77)
This theorem is proved in [19, 163]. Condition (5.77) is equivalent to the frame
condition (5.55) for a translation-invariant dyadic wavelet transform, for which the
parameter u is not sampled. It requires that the Fourier axis is covered by wavelets
dilated by {a j }j∈Z .The inequality (5.76),which relates the sampling density u0 loge a
to the frame bounds, is proved in [19]. It shows that the frame is an orthonormal
basis if and only if
A⫽B⫽
C
⫽ 1.
u0 loge a
Chapter 7 constructs wavelet orthonormal bases of L 2 (R) with regular wavelets of
compact support.
Sufficient Conditions
Theorem 5.16 proved by Daubechies [19] provides a lower and upper bound for
the frame bounds A and B, depending on , u0 , and a.
Theorem 5.16: Daubechies. Let us define
() ⫽ sup
⫹⬁
ˆ j )| |(a
ˆ j ⫹ )|
|(a
1⭐||⭐a j⫽⫺⬁
and
⌬⫽
⫹⬁ 2k
⫺2k 1/2
.
u0
u0
k⫽⫺⬁
k ⫽0
(5.78)
179
180
CHAPTER 5 Frames
If u0 and a are such that
⎞
⎛
⫹⬁
1 ⎝
ˆ j )|2 ⫺ ⌬⎠ ⬎ 0,
|(a
inf
A0 ⫽
u0 1⭐||⭐a j⫽⫺⬁
and
⎞
⎛
⫹⬁
1 ⎝
j
2
ˆ
B0 ⫽
|(a
)| ⫹ ⌬⎠ ⬍ ⫹⬁,
sup
u0 1⭐||⭐a j⫽⫺⬁
(5.79)
(5.80)
then {j,n }( j,n)∈Z2 is a frame of L 2 (R). The constants A0 and B0 are respectively lower
and upper bounds of the frame bounds A and B.
The sufficient conditions (5.79) and (5.80)
to the necessary condition
are similar
ˆ j 2
(5.77). If ⌬ is small relative to inf 1⭐||⭐a ⫹⬁
j⫽⫺⬁ |(a )| ,then A0 and B0 are close
to the optimal frame bounds A and B. For a fixed dilation step a, the value of ⌬
decreases when the time-sampling interval u0 decreases.
Dual Frame
Theorem 5.5 gives a general formula for computing the dual-wavelet frame vectors
˜ j,n ⫽ (⌽∗ ⌽)⫺1 j,n .
(5.81)
One could reasonably hope that the dual functions ˜ j,n would be obtained by scaling
˜ The unfortunate reality is that this is generally
and translating a dual wavelet .
not true. In general, the operator ⌽∗ ⌽ does not commute with dilations by a j , so
(⌽∗ ⌽)⫺1 does not commute with these dilations either. On the other hand, one can
prove that (⌽∗ ⌽)⫺1 commutes with translations by na j u0 , which means that
˜ j,n (t) ⫽ ˜ j,0 (t ⫺ na j u0 ).
(5.82)
Thus, the dual frame {˜ j,n }( j,n)∈Z2 is obtained by calculating each elementary function ˜ j,0 with (5.81),and translating them with (5.82).The situation is much simpler
for tight frames, where the dual frame is equal to the original wavelet frame.
Mexican Hat Wavelet
The normalized second derivative of a Gaussian is
2
2
⫺t
(t) ⫽ √ ⫺1/4 (t 2 ⫺ 1) exp
.
2
3
Its Fourier transform is
√ 1/4 2
8
⫺2
ˆ
()
⫽⫺
.
exp
√
2
3
The graph of these functions is shown in Figure 4.6.
(5.83)
5.4 Windowed Fourier Frames
Table 5.2 Estimated Frame Bounds for
the Mexican Hat Wavelet
a
u0
A0
B0
B0 /A0
2
2
2
2
0.25
0.5
1.0
1.5
13.091
6.546
3.223
0.325
14.183
7.092
3.596
4.221
1.083
1.083
1.116
12.986
0.25
0.5
1.0
1.75
27.273
13.673
6.768
0.517
27.278
13.639
6.870
7.276
1.0002
1.0002
1.015
14.061
0.25
0.5
1.0
1.75
54.552
27.276
13.586
2.928
54.552
27.276
13.690
12.659
1.0000
1.0000
1.007
4.324
1
22
1
22
1
22
1
22
1
24
1
24
1
24
1
24
Source: Computed with Theorem 5.16 [19].
The dilation step a is generally set to be a ⫽ 21/v where v is the number of
intermediate scales (voices) for each octave. Table 5.2 gives the estimated frame
bounds A0 and B0 computed by Daubechies [19] with the formula of Theorem 5.16.
For v ⭓ 2 voices per octave, the frame is nearly tight when u0 ⭐ 0.5, in which case
the dual frame can be approximated by the original wavelet frame. As expected
from (5.76), when A ≈ B,
C
v
A≈B≈
C log2 e.
⫽
u0 loge a u0
The frame bounds increase proportionally to v/u0 . For a ⫽ 2, we see that A0
decreases brutally from u0 ⫽ 1 to u0 ⫽ 1.5. For u0 ⫽ 1.75, the wavelet family is not a
frame anymore. For a ⫽ 21/2 , the same transition appears for a larger u0 .
5.4 WINDOWED FOURIER FRAMES
Frame theory gives conditions for discretizing the windowed Fourier transform
while retaining a complete and stable representation. The windowed Fourier
transform of f ∈ L 2 (R) is defined in Section 4.2 by
S f (u, ) ⫽ f , gu, ,
with
gu, (t) ⫽ g(t ⫺ u) eit .
181
182
CHAPTER 5 Frames
k
gu
n k
0
u0
0
un
t
FIGURE 5.6
A windowed Fourier frame is obtained by covering the time-frequency plane with a regular grid
of windowed Fourier atoms, translated by un ⫽ n u0 in time and by k ⫽ k 0 in frequency.
Setting g ⫽ 1 implies that gu, ⫽ 1. A discrete windowed Fourier transform
representation
{S f (un , k ) ⫽ f , gun ,k }(n,k)∈Z2
is complete and stable if { gu,n,k }(n,k)∈Z2 is a frame of L 2 (R).
Intuitively, one can expect that the discrete windowed Fourier transform is
complete if the Heisenberg boxes of all atoms { gu,n,k }(n,k)∈Z2 fully cover the timefrequency plane. Section 4.2 shows that the Heisenberg box of gun ,k is centered
in the time-frequency plane at (un , k ). Its size is independent of un and k . It
depends on the time-frequency spread of the window g. Thus, a complete cover of
the plane is obtained by translating these boxes over a uniform rectangular grid, as
illustrated in Figure 5.6. The time and frequency parameters (u, ) are discretized
over a rectangular grid with time and frequency intervals of size u0 and 0 . Let us
denote
gn,k (t) ⫽ g(t ⫺ nu0 ) exp(ik0 t).
The sampling intervals (u0 , 0 ) must be adjusted to the time-frequency spread of g.
Window Scaling
Suppose that {gn,k }(n,k)∈Z2 is a frame of L 2 (R) with frame bounds A and B. Let
us dilate the window gs (t) ⫽ s⫺1/2 g(t/s). It increases by s the time width of the
Heisenberg box of g and reduces by s its frequency width. Thus, we obtain the
same cover of the time-frequency plane by increasing u0 by s and reducing 0
by s. Let
0
gs,n,k (t) ⫽ gs (t ⫺ nsu0 ) exp ik t .
(5.84)
s
5.4 Windowed Fourier Frames
We prove that {gs,n,k }(n,k)∈Z2 satisfies the same frame inequalities as {gn,k }(n,k)∈Z2 ,
with the same frame bounds A and B, by a change of variable t⬘ ⫽ ts in the inner
product integrals.
5.4.1 Tight Frames
Tight frames are easier to manipulate numerically since the dual frame is equal to the
original frame. Daubechies, Grossmann, and Meyer [197] give sufficient conditions
for building a window of compact support that generates a tight frame.
Theorem 5.17: Daubechies, Grossmann, Meyer. Let g be a window that has a support
included in [⫺/0 , /0 ]. If
᭙t ∈ R,
⫹⬁
2 |g(t ⫺ nu0 )|2 ⫽ A ⬎ 0,
0 n⫽⫺⬁
(5.85)
then { gn,k (t) ⫽ g(t ⫺ nu0 ) eik0 t }(n,k)∈Z2 is a tight frame L 2 (R) with a frame bound equal
to A.
Proof. The function g(t ⫺ nu0 ) f (t) has a support in [nu0 ⫺ /0 , nu0 ⫹ /0 ]. Since
{eik0 t }k∈Z is an orthogonal basis of this space, we have
⫹⬁
⫺⬁
|g(t ⫺ nu0 )|2 | f (t)|2 dt ⫽
⫽
nu0 ⫹/0
nu0 ⫺/0
|g(t ⫺ nu0 )|2 | f (t)|2 dt
⫹⬁
0 |g(u ⫺ nu0 ) f (u), eik0 u |2 .
2 k⫽⫺⬁
Since gn,k (t) ⫽ g(t ⫺ nu0 ) eik0 t , we get
⫹⬁
⫺⬁
|g(t ⫺ nu0 )|2 | f (t)|2 dt ⫽
⫹⬁
0 | f , gn,k |2 .
2 k⫽⫺⬁
Summing over n and inserting (5.85) proves that A f 2 ⫽
therefore, that { gn,k }(n,k)∈Z2 is a tight frame of L 2 (R).
⫹⬁
2
k,n⫽⫺⬁ | f , gn,k | , and
■
Since g has a support in [⫺/0 , /0 ] the condition (5.85) implies that
2
⭓ 1,
u0 0
so that there is no whole between consecutive windows g(t ⫺ nu0 ) and g(t ⫺ (n ⫹
1)u0 ). If we impose that 1 ⭐ 2/(u0 0 ) ⭐ 2, then only consecutive windows have
supports that overlap. The square root of a Hanning window
0
0 t
g(t) ⫽
cos
1[⫺/0 ,/0 ] (t)
2
is a positive normalized window that satisfies (5.85) with u0 ⫽ /0 and a redundancy factor of A ⫽ 2. The design of other windows is studied in Section 8.4.2 for
local cosine bases.
183
184
CHAPTER 5 Frames
Discrete Window Fourier Tight Frames
To construct a windowed Fourier tight frame of CN , the Fourier basis {eik0 t }k∈Z of
L 2 [⫺/0 , /0 ] is replaced by the discrete Fourier basis {ei2kn/K }0⭐k⬍K of CK .
Theorem 5.18 is a discrete equivalent of Theorem 5.17.
Theorem 5.18. Let g[n] be an N periodic discrete window with a support and restricted
to [⫺N /2, N /2] that is included in [⫺K /2, K /2 ⫺ 1]. If M divides N and
᭙0 ⭐ n ⬍ N ,
K
N /M⫺1
|g[n ⫺ mM]|2 ⫽ A ⬎ 0,
(5.86)
m⫽0
then { gm,k [n] ⫽ g[n ⫺ mM] ei2kn/K }0⭐k⬍K ,0⭐m⬍N /M is a tight frame CN with a frame
bound equal to A.
The proof of this theorem follows the same steps as the proof of Theorem 5.17.
It is left in Exercise 5.10. There are N /M translated windows and thus NK /M windowed Fourier coefficients. For a fixed window position indexed by m,the discrete
windowed Fourier coefficients are the discrete Fourier coefficients of the windowed
signal
S f [m, k] ⫽ f , gm,k ⫽
K
/2⫺1
f [n] g[n ⫺ mM] e⫺i2kn/K
for
0⭐k⬍K.
n⫽K /2
They are computed with O(K log2 K ) operations with an FFT. Over all windows,this
requires a total of O(NK /M log2 K ) operations. We generally choose 1 ⬍ K /M ⭐ 2
so that√
only consecutive windows overlap. The square root of a Hanning window
g[n] ⫽ 2/K cos(n/K ) satisfies (5.86) for M ⫽ K /2 and a redundancy factor A ⫽ 2.
Figure 5.7 shows the log spectrogram log |S f [m, k]|2 of the windowed Fourier
frame coefficients computed with a square root Hanning window for a musical
recording.
5.4.2 General Frames
For general windowed Fourier frames of L 2 (R2 ), Daubechies [19] proved several
necessary conditions on g, u0 and 0 to guarantee that { gn,k }(n,k)∈Z2 is a frame of
L 2 (R). We do not reproduce the proofs, but summarize the main results.
Theorem 5.19: Daubechies. The windowed Fourier family { gn,k }(n,k)∈Z2 is a frame
only if
2
⭓ 1.
u 0 0
(5.87)
The frame bounds A and B necessarily satisfy
A⭐
2
⭐ B,
u 0 0
(5.88)
5.4 Windowed Fourier Frames
(a)
5000
Frequency (Hz)
4000
3000
2000
1000
0
0
500
1000
1500
2000
Time (ms)
(b)
2500
3000
FIGURE 5.7
(a) Musical recording. (b) Log spectrogram log |S f [m, k]|2 computed with a square root
Hanning window.
᭙t ∈ R,
A⭐
⫹⬁
2 |g(t ⫺ nu0 )|2 ⭐ B,
0 n⫽⫺⬁
(5.89)
᭙ ∈ R,
A⭐
⫹⬁
1 |ĝ( ⫺ k0 )|2 ⭐ B.
u0 k⫽⫺⬁
(5.90)
The ratio 2/(u0 0 ) measures the density of windowed Fourier atoms in the timefrequency plane. The first condition (5.87) ensures that this density is greater than
1 because the covering ability of each atom is limited. Inequalities (5.89) and (5.90)
are proved in full generality by Chui and Shi [163].They show that the uniform time
translations of g must completely cover the time axis,and the frequency translations
of its Fourier transform ĝ must similarly cover the frequency axis.
Since all windowed Fourier vectors are normalized, the frame is an orthogonal
basis only if A ⫽ B ⫽ 1. The frame bound condition (5.88) shows that this is possible
only at the critical sampling density u0 0 ⫽ 2. The Balian-Low theorem 5.20 [93]
proves that g is then either nonsmooth or has a slow time decay.
185
186
CHAPTER 5 Frames
Theorem 5.20: Balian-Low. If { gn,k }(n,k)∈Z2 is a windowed Fourier frame with
u0 0 ⫽ 2, then
⫹⬁
⫹⬁
2
2
t |g(t)| dt ⫽ ⫹⬁ or
2 |ĝ()|2 d ⫽ ⫹⬁.
(5.91)
⫺⬁
⫺⬁
This theorem proves that we cannot construct an orthogonal windowed Fourier
basis with a differentiable window g of compact support. On the other hand, one
can verify that the discontinuous rectangular window
1
g ⫽ √ 1[⫺u0 /2,u0 /2]
u0
yields an orthogonal windowed Fourier basis for u0 0 ⫽ 2. This basis is rarely used
because of the bad frequency localization of ĝ.
Sufficient Conditions
Theorem 5.21 proved by Daubechies [195] gives sufficient conditions on u0 ,0 ,and
g for constructing a windowed Fourier frame.
Theorem 5.21: Daubechies. Let us define
⫹⬁
(u) ⫽ sup
|g(t ⫺ nu0 )| |g(t ⫺ nu0 ⫹ u)|
(5.92)
0⭐t⭐u0 n⫽⫺⬁
and
⌬⫽
⫹⬁ 2k
⫺2k 1/2
.
0
0
k⫽⫺⬁
(5.93)
k ⫽0
If u0 and 0 satisfy
A0 ⫽
2
0
⫹⬁
|g(t ⫺ nu0 )|2 ⫺ ⌬ ⬎ 0
(5.94)
|g(t ⫺ nu0 )|2 ⫹ ⌬ ⬍ ⫹⬁,
(5.95)
0⭐t⭐u0 n⫽⫺⬁
and
B0 ⫽
2
0
sup
⫹⬁
0⭐t⭐u0 n⫽⫺⬁
then { gn,k }(n,k)∈Z2 is a frame. The constants A0 and B0 are, respectively, lower bounds
and upper bounds of the frame bounds A and B.
Observe that the only difference between the sufficient conditions (5.94, 5.95)
and the necessary condition
(5.89) is the addition and subtraction of ⌬. If ⌬ is
2
small compared to inf 0⭐t⭐u0 ⫹⬁
n⫽⫺⬁ |g(t ⫺ nu0 )| , then A0 and B0 are close to the
optimal frame bounds A and B.
5.4 Windowed Fourier Frames
Dual Frame
Theorem 5.5 proves that the dual-windowed frame vectors are
g̃n,k ⫽ (⌽∗ ⌽)⫺1 gn,k .
(5.96)
Theorem 5.22 shows that this dual frame is also a windowed Fourier frame, which
means that its vectors are time and frequency translations of a new window g̃.
Theorem 5.22. Dual-windowed Fourier vectors can be rewritten as
g̃n,k (t) ⫽ g̃(t ⫺ nu0 ) exp(ik0 t),
where g̃ is the dual window
g̃ ⫽ (⌽∗ ⌽)⫺1 g.
(5.97)
Proof. This result is proved by showing first that ⌽∗ ⌽ commutes with time and frequency
translations proportional to u0 and 0 . If ∈ L 2 (R) and m,l (t) ⫽ (t ⫺ mu0 ) exp(il0 t),
we verify that
⌽∗ ⌽m,l (t) ⫽ exp(il0 t) ⌽∗ ⌽h(t ⫺ mu0 ).
Indeed,
⌽∗ ⌽m,l ⫽
m,l , gn,k gn,k
(n,k)∈Z2
and a change of variable yields
m,l , gn,k ⫽ , gn⫺m,k⫺l .
Consequently,
⌽∗ ⌽m,l (t) ⫽
, gn⫺m,k⫺l exp(il0 t) gn⫺m,k⫺l (t ⫺ mu0 )
(n,k)∈Z2
⫽ exp(il0 t) ⌽∗ ⌽(t ⫺ mu0 ).
Since ⌽∗ ⌽ commutes with these translations and frequency modulations, we verify that
(⌽∗ ⌽)⫺1 necessarily commutes with the same group operations. Thus,
g̃n,k (t) ⫽ (⌽∗ ⌽)⫺1 gn,k ⫽ exp(ik0 ) (⌽∗ ⌽)⫺1 g0,0 (t ⫺ nu0 ) ⫽ exp(ik0 ) g̃(t ⫺ nu0 ).
■
Gaussian Window
The Gaussian window
2
⫺t
g(t) ⫽ ⫺1/4 exp
2
(5.98)
has a Fourier transform ĝ that is a Gaussian with the same variance. The time and
frequency spreads of this window are identical. Therefore, let us choose equal sampling intervals in time and frequency: u0 ⫽ 0 . For the same product u0 0 other
187
188
CHAPTER 5 Frames
Table 5.3 Frame Bounds for the
Gaussian Window (5.98) and u0 ⫽ 0
u0 0
A0
B0
B0 /A0
/2
3.9
4.1
1.05
3/4
2.5
2.8
1.1
1.6
2.4
1.5
4/3
0.58
2.1
3.6
1.9
0.09
2.0
22
choices would degrade the frame bounds. If g is dilated by s then the time and
frequency sampling intervals must become su0 and 0 /s.
If the time-frequency sampling density is above the critical value of 2/(u0 0 ) ⬎
1, then Daubechies [195] proves that { gn,k }(n,k)∈Z2 is a frame. When u0 0 tends
to 2, the frame bound A tends to 0. For u0 0 ⫽ 2, the family { gn,k }(n,k)∈Z2 is
complete in L 2 (R), which means that any f ∈ L 2 (R) is entirely characterized by the
inner products { f , gn,k }(n,k)∈Z2 . However, the Balian-Low theorem (5.20) proves
that it cannot be a frame and one can indeed verify that A ⫽ 0 [195].This means that
the reconstruction of f from these inner products is unstable.
Table 5.3 gives the estimated frame bounds A0 and B0 calculated with Theorem 5.21, for different values of u0 ⫽ 0 . For u0 0 ⫽ /2, which corresponds to time
and frequency sampling intervals that are half the critical sampling rate, the frame
is nearly tight. As expected, A ≈ B ≈ 4, which verifies that the redundancy factor is
4 (2 in time and 2 in frequency). Since the frame is almost tight, the dual frame
is approximately equal to the original frame, which means that g̃ ≈ g. When u0 0
increases we see that A decreases to zero and g̃ deviates more and more from a
Gaussian. In the limit u0 0 ⫽ 2, the dual window g̃ is a discontinuous function
that does not belong to L 2 (R). These results can be extended to discrete windowed
Fourier transforms computed with a discretized Gaussian window [501].
5.5 MULTISCALE DIRECTIONAL FRAMES FOR IMAGES
To reveal geometric image properties, wavelet frames are constructed with mother
wavelets having a direction selectivity, providing information on the direction
of sharp transitions such as edges and textures. Directional wavelet frames are
described in Section 5.5.1.
Wavelet frames yield high-amplitude coefficients in the neighborhood of edges,
and cannot take advantage of their geometric regularity to improve the sparsity
of the representation. Curvelet frames, described in Section 5.5.2, are constructed
with elongated waveforms that follow directional image structures and improve the
representation sparsity.
5.5 Multiscale Directional Frames for Images
5.5.1 Directional Wavelet Frames
A directional wavelet transform decomposes images over directional wavelets that
are translated, rotated, and dilated at dyadic scales. Such transforms appear in many
image-processing applications and physiological models. Applications to texture
discrimination are also discussed.
A directional wavelet ␣ (x) with x ⫽ (x1 , x2 ) ∈ R2 of angle ␣ is a wavelet
having p directional vanishing moments along any one-dimensional line of direction
␣ ⫹ /2 in the plane
᭙ ∈ R,
␣ ( cos ␣ ⫺ u sin ␣, sin ␣ ⫹ u cos ␣) uk du ⫽ 0
for
0 ⭐ k ⬍ p, (5.99)
but does not have directional vanishing moments along the direction ␣. Such a
wavelet oscillates in the direction of ␣ ⫹ /2 but not in the direction ␣. It is
orthogonal to any two-dimensional polynomial of degree strictly smaller than p
(Exercise 5.21).
The set ⌰ of chosen directions are typically uniform in [0, ]: ⌰ ⫽ {␣ ⫽
k/K for 0 ⭐ k ⬍ K }. Dilating these directional wavelets by factors 2 j and
translating them by any u ∈ R yields a translation-invariant directional wavelet
family:
{2␣j (x ⫺ u)}u∈R2 ,j∈Z,␣∈⌰
with 2␣j (x) ⫽ 2⫺j ␣ (2⫺j x).
(5.100)
Directional wavelets may be derived by rotating a single mother wavelet (x1 , x2 )
having vanishing moments in the horizontal direction, with a rotation operator R␣
of angle ␣ in R2 .
A dyadic directional wavelet transform of f computes the inner product with
each wavelet:
W f (u, 2 j , ␣) ⫽ f , 2␣j ,u where
2␣j ,u (x) ⫽ 2␣j (x ⫺ u).
This dyadic wavelet transform can also be written as convolutions with directional
wavelets:
W f (u, 2 j , ␣) ⫽ f ¯ 2␣j (u)
where ¯ 2␣j (x) ⫽ 2␣j (⫺x).
A wavelet 2␣j (x ⫺ u) has a support dilated by 2 j , located in the neighborhood of
u and oscillates in the direction of ␣ ⫹ /2. If f (x) is constant over the support of
2␣j ,u along lines of direction ␣ ⫹ /2, then f , 2␣j ,u ⫽ 0 because of its directional
vanishing moments. In particular, this coefficient vanishes in the neighborhood of
an edge having a tangent in the direction ␣ ⫹ /2. If the edge angle deviates from
␣ ⫹ /2, then it produces large amplitude coefficients, with a maximum typically
when the edge has a direction ␣.Thus,the amplitude of wavelet coefficients depends
on the local orientation of the image structures.
189
190
CHAPTER 5 Frames
Theorem 5.11 proves that the translation-invariant wavelet family is a frame
if there exists B ⭓ A ⬎ 0 such that the generators n (x) ⫽ 2⫺j 2␣j (x) have Fourier
ˆ n () ⫽ ˆ ␣ (), which satisfy
transforms
᭙ ⫽ (1 , 2 ) ∈ R2 ⫺ {(0, 0)},
A⭐
⫹⬁
|ˆ ␣ (2 j )|2 ⭐ B.
(5.101)
␣∈⌰ j⫽⫺⬁
It results from Theorem 5.11 that there exists a dual family of reconstructing
wavelets {˜ ␣ }␣∈⌰ that have Fourier transforms that satisfy
⫹⬁ ˜ ␣ (2 j ) ␣∗ (2 j ) ⫽ 1,
(5.102)
j⫽⫺⬁ ␣∈⌰
which yields
f (x) ⫽
⫹⬁
1 W f (·, 2 j , ␣) ˜ 2␣j (x).
2j
2
j⫽⫺⬁
(5.103)
␣∈⌰
Examples of directional wavelets obtained by rotating a single mother wavelet are
constructed with Gabor functions and steerable derivatives.
Gabor Wavelets
In the cat’s visual cortex, Hubel and Wiesel [306] discovered a class of cells, called
simple cells, having a response that depends on the frequency and direction of
the visual stimuli. Numerous physiological experiments [401] have shown that
these cells can be modeled as linear filters with impulse responses that have been
measured at different locations of the visual cortex. Daugmann [200] showed that
these impulse responses can be approximated by Gabor wavelets, obtained with a
2
2
Gaussian window g(x1 , x2 ) ⫽ (2)⫺1 e⫺(x1 ⫹x2 )/2 multiplied by a sinusoidal wave:
␣ (x1 , x2 ) ⫽ g(x1 , x2 ) exp[⫺i(⫺x1 sin ␣ ⫹ x2 cos ␣)].
(5.104)
These findings suggest the existence of some sort of wavelet transform in the visual
cortex,combined with subsequent nonlinearities [403].The“physiological”wavelets
have a frequency resolution on the order of 1 to 1.5 octaves, and are thus similar to
dyadic wavelets.
2
2
The Fourier transform of g(x1 , x2 ) is ĝ(1 , 2 ) ⫽ e⫺(1 ⫹2 )/2 . It results from
(5.104) that
√
ˆ 2␣j (1 , 2 ) ⫽ 2 j ĝ(2 j 1 ⫹ sin ␣, 2 j 2 ⫺ cos ␣).
In the Fourier plane,the energy of this Gabor wavelet is mostly concentrated around
(⫺2⫺j sin ␣, 2⫺j cos ␣), in a neighborhood proportional to 2⫺j .
5.5 Multiscale Directional Frames for Images
2
1
FIGURE 5.8
Each circle represents the frequency domain in a direction ␣ ⫹ /2 where the amplitude of a
Gabor wavelet Fourier transform |ˆ 2␣j ()| is large. It is proportional to 2⫺j and its position
rotates with ␣.
The direction is chosen to be uniform ␣ ⫽ l/K for ⫺K ⬍ l ⭐ K . Figure 5.8 shows
a cover of the frequency plane by dyadic Gabor wavelets 5.105 with K ⫽ 6. If K ⭓ 4
and is of the order of 1 then 5.101 is satisfied with stable bounds. Since images
f (x) are real, f̂ (⫺) ⫽ f̂ ∗ () and f can be reconstructed by covering only half of
the frequency plane, with ⫺K ⬍ l ⭐ 0. This is a two-dimensional equivalent of the
one-dimensional analytic wavelet transform, studied in Section 4.3.2, with wavelets
having a Fourier transform support restricted to positive frequencies. For texture
analysis, Gabor wavelets provide information on the local image frequencies.
Texture Discrimination
Despite many attempts, there are no appropriate mathematical models for “homogeneous image textures.”The notion of texture homogeneity is still defined with
respect to our visual perception. A texture is said to be homogeneous if it is
preattentively perceived as being homogeneous by a human observer.
The texton theory of Julesz [322] was a first important step in understanding
the different parameters that influence the perception of textures. The direction
of texture elements and their frequency content seem to be important clues for
discrimination. This motivated early researchers to study the repartition of texture energy in the Fourier domain [92]. For segmentation purposes it is necessary
to localize texture measurements over neighborhoods of varying sizes. Thus, the
Fourier transform was replaced by localized energy measurements at the output
of filter banks that compute a wavelet transform [315, 338, 402, 467]. Besides
the algorithmic efficiency of this approach, this model is partly supported by
physiological studies of the visual cortex.
191
192
CHAPTER 5 Frames
Since W f (u, 2 j , ␣) ⫽ f ¯ 2␣j (u), Gabor wavelet coefficients measure the energy
of f in a spatial neighborhood of u of size 2 j , and in a frequency neighborhood of
(⫺2⫺j sin ␣, 2⫺j cos ␣) of size 2⫺j , where the support of ˆ 2␣j () is located, illustrated in Figure 5.8. Varying the scale 2 j and the angle ␣ modifies the frequency
channel [119]. The wavelet transform energy |W f (u, 2 j , ␣)|2 is large when the
angle ␣ and scale 2 j match the direction and scale of high-energy texture components in the neighborhood of u. Thus, the amplitude of |W f (u, 2 j , ␣)|2 can be
used to discriminate textures. Figure 5.9 shows the dyadic wavelet transform of
two textures, computed along horizontal and vertical directions, at the scales 2⫺4
and 2⫺5 (the image support is normalized to [0, 1]2 ). The central texture is regular
vertically and has more energy along horizontal high frequencies than the peripheric texture. These two textures are therefore discriminated by the wavelet of angle
␣ ⫽ /2, whereas the other wavelet with ␣ ⫽ 0 produces similar responses for both
textures.
For segmentation, one must design an algorithm that aggregates the wavelet
responses at all scales and directions in order to find the boundaries of homogeneous textured regions. Both clustering procedures and detection of sharp transitions over wavelet energy measurements have been used to segment the image
[315, 402, 467]. These algorithms work well experimentally but rely on ad hoc
parameter settings.
uW f (u, 225, /2)u2
uW f (u, 225, 0)u2
uW f (u, 224, /2)u2
uW f (u, 224, 0)u2
FIGURE 5.9
Directional Gabor wavelet transform |W f (u, 2 j , ␣)|2 of a texture patch, at the scales
2 j ⫽ 2⫺4 , 2⫺5 , along two directions ␣ ⫽ 0, /2. The darker the pixel, the larger the wavelet
coefficient amplitude.
5.5 Multiscale Directional Frames for Images
Steerable Wavelets
Steerable directional wavelets along any angle ␣ can be written as a linear expansion
of few mother wavelets [441]. For example, a steerable wavelet in the direction ␣
can be defined as the partial derivative of order p of a window þ(x) in the direction
of the vector n ⫽ (⫺ sin ␣, cos ␣):
␣ (x) ⫽
⭸p (x) ⭸
⭸ p
⫽
⫺
sin
␣
⫹
cos
␣
(x).
⭸np
⭸x1
⭸x2
(5.105)
Let R␣ be the planar rotation by an angle ␣. If the window is invariant under rotations
(x) ⫽ (R␣ x), then these wavelets are generated by the rotation of a single mother
p
wavelet: ␣ (x) ⫽ (R␣ x) with ⫽ ⭸p /⭸x2 .
Furthermore, the expansion of the derivatives in (5.105) proves that each ␣
can be expressed as a linear combination of p ⫹ 1 partial derivatives
␣ (x) ⫽
p
ai (␣) i (x),
p
(⫺ sin ␣)i (cos ␣)p⫺i ,
i
where
ai (␣) ⫽
᭙ 0 ⭐ i ⭐ p,
i (x) ⫽
i⫽0
(5.106)
with
⭸p (x)
p⫺i
⭸x1i ⭸x2
.
The waveforms i (x) can also be considered as wavelets functions with vanishing
moments. It results from 5.106 that the directional wavelet transform at any angle
␣ can be calculated from p ⫹ 1 convolutions of f with the i dilated:
W f (u, 2 j , ␣) ⫽
p
ai (␣)( f ¯ 2i j )(u)
for
¯ 2i j (x) ⫽ 2⫺j i (⫺2⫺j x).
i⫽0
Exercise 5.22 gives conditions on so that for a set ⌰ of p ⫹ 1 angles ␣ ⫽
k/(p ⫹ 1) with 0 ⭐ k ⬍ p the resulting oriented wavelets ␣ define a family of
dyadic wavelets that satisfy 5.101. Section 6.3 uses such directional wavelets, with
p ⫽ 1, to detect multiscale edges in images.
Discretization of the Translation
A translation-invariant wavelet transforms W f (u, 2 j , ␣) for all scales 2 j , and angle
␣ requires a large amount of memory. To reduce computation and memory storage, the translation parameter is discretized. In the one-dimensional case a frame is
obtained by uniformly sampling the translation parameter u with intervals u0 2 j n
with n ⫽ (n1 , n2 ) ∈ Z2 , proportional to the scale 2 j . The discretized wavelet derived
from the translation-invariant wavelet family (5.100) is
{2␣j (x ⫺ 2 j u0 n)}n∈Z2 ,j∈Z,␣∈⌰
with
2␣j (x) ⫽ 2⫺j ␣ (2⫺j x).
(5.107)
193
194
CHAPTER 5 Frames
Necessary and sufficient conditions similar to Theorems 5.15 and 5.16 can be
established to guarantee that such a wavelet family defines a frame of L 2 (R2 ).
To decompose images in such wavelet frames with a fast filter filter bank, directional wavelets can be synthesized as a product of discrete filters. The steerable
pyramid of Simoncelli et al. [441] decomposes images in such a directional wavelet
frame,with a cascade of convolutions with low-pass filters and directional band-pass
filters. The filter coefficients are optimized to yield wavelets that satisfy approximately the steerability condition 5.106 and produce a tight frame. The sampling
interval is u0 ⫽ 1/2.
Figure 5.10 shows an example of decomposition on such a steerable pyramid
with K ⫽ 4 directions. For discrete images of N pixels, the finest scale is 2 j ⫽ 2N ⫺1 .
Since u0 ⫽ 1/2, wavelet coefficients at the finest scale define an image of N pixels
for each direction. The wavelet image size then decreases as the scale 2 j increases.
The total number of wavelet coefficients is 4KN /3 and the tight frame factor is 4K /3
[441]. Steerable wavelet frames are used to remove noise with wavelet thresholding
estimators [404] and for texture analysis and synthesis [438].
Chapter 9 explains that sparse image representation can be obtained by keeping
large-amplitude coefficients above a threshold. Large-amplitude wavelet coefficients
appear where the image has a sharp transition,when the wavelet oscillates in a direction approximately perpendicular to the direction of the edge. However,even when
directions are not perfectly aligned,wavelet coefficients remain nonnegligible in the
neighborhood of edges. Thus, the number of large-amplitude wavelet coefficients
is typically proportional to the length of edges in images. Reducing the number
of large coefficients requires using waveforms that are more sensitive to direction
properties, as shown in the next section.
5.5.2 Curvelet Frames
Curvelet frames were introduced by Candès and Donoho [134] to construct sparse
representation for images including edges that are geometrically regular. Similar to
directional wavelets, curvelet frames are obtained by rotating, dilating, and translating elementary waveforms. However, curvelets have a highly elongated support
obtained with a parabolic scaling using different scaling factors along the curvelet
width and length. These anisotropic waveforms have a much better direction
sensitivity than directional wavelets. Section 9.3.2 studies applications to sparse
approximations of geometrically regular images.
Dyadic Curvelet Transform
A curvelet is function c(x) having vanishing moments along the horizontal direction like a wavelet. However, as opposed to wavelets, dilated curvelets are obtained
with a parabolic scaling law that produces highly elongated waveforms at fine
scales:
c2 j (x1 , x2 ) ≈ 2⫺3j/4 c(2⫺j/2 x1 , 2⫺j x2 ).
(5.108)
They have a width proportional to their length2 . Dilated curvelets are then rotated
c2␣j ⫽ c2 j (R␣ x),where R␣ is the planar rotation of angle ␣,and translated like wavelets:
5.5 Multiscale Directional Frames for Images
Image f
␣ 5 /2
␣ 5 3/4
␣50
␣ 5 /4
FIGURE 5.10
Decomposition of an image in a frame of steerable directional wavelets [441] along four
directions: ␣ ⫽ 0, /4, /2, 3/4, at two consecutive scales, 2 j and 2 j ⫹ 1. Black, gray, and
white pixels correspond respectively to wavelet coefficients of negative, zero, and positive
values.
c2␣j ,u ⫽ c2␣j (x ⫺ u). The resulting translation-invariant dyadic curvelet transform of
f ∈ L 2 (R2 ) is defined by
C f (u, 2 j , ␣) ⫽ f , c2␣j ,u ⫽ f c̄2␣j (u)
with
c̄2␣j (x) ⫽ c2␣j (⫺x).
To obtain a tight frame, the Fourier transform of a curvelet at a scale 2 j is
defined by
2
3j/4 ˆ
j ˆ
ĉ2 j () ⫽ 2 (2 r)
, with ⫽ r(cos , sin ),
(5.109)
2 j/2
195
196
CHAPTER 5 Frames
1
a
22 j / 2
␣1 2
22 j
u2
2
u1
(a)
( b)
FIGURE 5.11
(a) Example of curvelet c2␣j ,u (x). (b) The frequency support of ĉ2␣j ,u () is a wedge obtained as a
product of a radial window with an angular window.
where ˆ is the Fourier transform of a one-dimensional wavelet and ˆ is a onedimensional angular window that localizes the frequency support of ĉ2 j in a polar
parabolic wedge, illustrated in Figure 5.11. The wavelet ˆ is chosen to have a
compact support in [1/2, 2] and satisfies the dyadic frequency covering:
⫹⬁
᭙ r ∈ R∗ ,
ˆ j r)|2 ⫽ 1.
|(2
(5.110)
j⫽⫺⬁
One may, for example, choose a Meyer wavelet as defined in (7.82). The angular
ˆ is chosen to be supported in [⫺1, 1] and satisfies
window
᭙u,
⫹⬁
ˆ ⫺ k)|2 ⫽ 1.
|(u
(5.111)
k⫽⫺⬁
As a result of these two properties, one can verify that for uniformly distributed
angles,
⌰j ⫽ {␣ ⫽ k2 j/2⫺1
for
0 ⭐ k ⬍ 2⫺ j/2⫹2 }
curvelets cover the frequency plane
᭙ ∈ R2 ⫺ {0},
j∈Z ␣∈⌰j
2⫺3j/2 |ĉ2␣j ()|2 ⫽ 1.
(5.112)
5.5 Multiscale Directional Frames for Images
Real valued curvelets are obtained with a symmetric version of 5.109: ĉ2 j () ⫹
ĉ2 j (⫺). Applying Theorem 5.11 proves that a translation-invariant dyadic curvelet
dictionary {c2␣j ,u }␣∈⌰j ,j∈Z,u∈R2 is a dyadic translation-invariant tight frame that
defines a complete and stable signal representation [142].
Theorem 5.23: Candès, Donoho. For any f ∈ L 2 (R2 ),
f 2 ⫽
2⫺3j/2
C f (·, 2 j , ␣)2 ,
j∈Z
␣∈⌰j
and
f (x) ⫽
2⫺3j/2
j∈Z
C f (·, 2 j , ␣) ∗ c2␣j (x).
␣∈⌰j
Curvelet Properties
Since ĉ2 j () is a smooth function with a support included in a rectangle of size
proportional to 2⫺j/2 ⫻ 2⫺j , the spatial curvelet c2 j (x) is a regular function with a
fast decay outside a rectangle of size 2 j/2 ⫻ 2 j . The rotated and translated curvelet
c2␣j ,u is supported around the point u in an elongated rectangle along the direction
␣; its shape has a parabolic ratio width ⫽ length2 , as shown in Figure 5.11.
Since the Fourier transform ĉ2 j (1 , 2 ) is zero in the neighborhood of the vertical
axis 1 ⫽ 0,c2 j (x1 , x2 ) has an infinite number of vanishing moments in the horizontal
direction
⭸q ĉj
q
⇒
᭙ 1 ,
(0,
)
⫽
0
⫽
᭙
q
⭓
0,
᭙
x
,
c2 j (x1 , x2 )x1 dx1 ⫽ 0.
1
2
⭸q 1
A rotated curvelet c2␣j ,u has vanishing moments in the direction ␣ ⫹ /2, and therefore oscillates in the direction ␣ ⫹ /2, whereas its support is elongated in the
direction ␣.
Discretization of Translation
Curvelet tight frames are constructed by sampling the translation parameter u
[134].These tight frames provide sparse representations of signals including regular
geometric structures.
The curvelet sampling grid depends on the scale 2 j and on the angle ␣. Sampling
intervals are proportional to the curvelet width 2 j in the direction ␣ ⫹ /2 and to
its length 2 j/2 in the direction ␣:
᭙m ⫽ (m1 , m2 ) ∈ Z2 ,
( j,␣)
um
⫽ R␣ (2 j/2 m1 , 2 j m2 ).
(5.113)
Figure 5.12 illustrates this sampling grid. The resulting dictionary of translated
curvelets is
( j,␣)
c ␣j,m (x) ⫽ c2␣j (x ⫺ um )
.
2
j∈Z,␣∈⌰j ,m∈Z
197
198
CHAPTER 5 Frames
␣ 1 Ⲑ 2
~2
~ 2 jⲐ2
2jⲐ2
␣
~2
~2j
2j
um
( j, d )
um
(b)
(a)
FIGURE 5.12
(a) Curvelet polar tiling of the frequency plane with parabolic wedges. (b) Curvelet spatial
( j,␣)
sampling grid um at a scale 2 j and direction ␣.
Theorem 5.24 proves that this curvelet family is a tight frame of L2 (R2 ). The proof
is not given here, but can be found in [142].
Theorem 5.24: Candès, Donoho. For any f ∈ L 2 (R2 ),
f 2 ⫽
|f , c ␣j,m |2
(5.114)
j∈Z ␣∈⌰j m∈Z2
and
f (x) ⫽
f , c ␣j,m c ␣j,m (x).
j∈Z ␣∈⌰j m∈Z2
Wavelet versus Curvelet Coefficients
In the neighborhood of an edge having a tangent in a direction , large-amplitude
coefficients are created by curvelets and wavelets of direction ␣ ⫽ , which have
their vanishing moment in the direction ⫹ /2. These curvelets have a support
elongated in the edge direction , as illustrated in Figure 5.13. In this direction, the
sampling grid of a curvelet frame has an interval 2 j/2 , which is much larger than the
sampling interval 2 j of a wavelet frame. Thus, an edge is covered by fewer curvelets
than wavelets having a direction equal to the edge direction. If the angle ␣ of the
curvelet deviates from , then curvelet coefficients decay quickly because of the
narrow frequency localizaton of curvelets. This gives a high-directional selectivity
to curvelets.
5.5 Multiscale Directional Frames for Images
␣3
␣3
␣1
2j, u1
2j, u3
c2j, u3
␣1
c2j, u1
␣2
␣2
2j, u2
c2j, u2
␣'
2j3, u'3
␣'
c2j3, u'3
(a)
(b)
FIGURE 5.13
(a) Directional wavelets: a regular edge creates more large-amplitude wavelet coefficients than
curvelet coefficients. (b) Curvelet coefficients have a large amplitude when their support is
aligned with the edge direction, but there are few such curvelets.
Even though wavelet coefficients vanish when ␣ ⫽ ⫹ /2, they have a smaller
directional selectivity than curvelets, and wavelet coefficients’ amplitudes decay
more slowly as ␣ deviates from . Indeed, the frequency support of wavelets is
much wider and their spatial support is nearly isotropic. As a consequence, an
edge produces large-amplitude wavelet coefficients in several directions. This is
illustrated by Figure 5.10, where Lena’s shoulder edge creates large coefficients
in three directions, and small coefficients only when the wavelet and the edge
directions are aligned.
As a result, edges and image structures with some directional regularity create
fewer large-amplitude curvelet coefficients than wavelet coefficients. The theorem
derived from Section 9.3.3 proves that for images having regular edges, curvelet
tight frames are asymptotically more efficient than wavelet bases when building
sparse representations.
Fast Curvelet Decomposition Algorithm
To compute the curvelet transform of a discrete image f [n1 , n2 ] uniformly sampled
over N pixels, one must take into account the discretization grid, which imposes
constraints on curvelet angles. The fast curvelet transform [140] replaces the polar
tiling of the Fourier domain by a recto-polar tiling, illustrated in Figure 5.14(b). The
directions ␣ are uniformly discretized so that the slopes of the wedges containing
the support of the curvelets are uniformly distributed in each of the north, south,
west, and east Fourier quadrants. Each wedge is the support of the two-dimensional
DFT ĉ ␣j [k1 , k2 ] of a discrete curvelet c ␣j [n1 , n2 ].The curvelet translation parameters
are not chosen according to 5.113, but remain on a subgrid of the original image
sampling grid.At a scale 2 j ,there is one sampling grid (2j/2 m1 , 2 j m2 ) for curvelets
in the east and west quadrants. In these directions, curvelet coefficients are
f [n1 , n2 ], c ␣j [n1 ⫺ 2 j/2 m1 , n2 ⫺ 2 j m2 ] ⫽ f c̄ ␣j [2 j/2 m1 , 2 j m2 ]
(5.115)
199
200
CHAPTER 5 Frames
1
k2
22 j / 2
22 j
2
(a)
k1
(b)
FIGURE 5.14
(a) Curvelet frequency plane tiling. The dark gray area is a wedge obtained as the product of a
radial window and an angular window. (b) Discrete curvelet frequency tiling. The radial and
angular windows define trapezoidal wedges as shown in dark gray.
with c̄ ␣j [n1 , n2 ] ⫽ c ␣j [⫺n1 , ⫺n2 ]. For curvelets in the north and south quadrants the
translation grid is (2 j m1 , 2 j/2 m2 ), which corresponds to curvelet coefficients
f [n1 , n2 ], c ␣j [n1 ⫺ 2 j m1 , n2 ⫺ 2 j/2 m2 ] ⫽ f c̄ ␣j [2 j m1 , 2 j/2 m2 ].
(5.116)
The discrete curvelet transform computes the curvelet filtering and sampling
with a two-dimensional FFT. The two-dimensional discrete Fourier transforms of
f [n] and c̄ ␣j [n] are ĉ ␣j [⫺k] and f̂ [k]. The algorithm proceeds as:
■
■
■
Computation of the two-dimensional DFT f̂ [k] of f [n].
For each j and the corresponding 2⫺ j/2⫹2 angles ␣, calculation of
f̂ [k] ĉ ␣j [⫺k].
Computation of the inverse Fourier transform of f̂ [k] ĉ ␣j [⫺k] on the smallest
possible warped frequency rectangle including the wedge support of ĉ ␣j [⫺k].
The critical step is the last inverse Fourier transform.A computationally more expensive one would compute f c̄ ␣j [n] for all n ⫽ (n1 , n2 ) and then subsample this
convolution along the grids (5.115) and (5.116).
Instead, the subsampled curvelet coefficients are calculated directly by restricting the FFT to a bounding box that contains the support of ĉ ␣j [⫺k]. A horizontal or
vertical warping maps this bounding box to an elongated rectangular frequency box
on which the inverse FFT is calculated. One can verify that the resulting coefficients
correspond to the subsampled curvelet coefficients. The overall complexity of the
algorithm is then O(N log2 (N )), as detailed in [140]. The tight frame redundancy
bound obtained with this discrete algorithm is A ≈ 5.
5.6 Exercises
An orthogonal curvelet type transform has been developed by Do and Vetterli
[212]. The resulting contourlets are not redundant but do not have the appropriate
time and frequency localization needed to obtain asymptotic approximation results
similar to curvelets.
5.6 EXERCISES
!
"
#$
p [n] ⫽ exp i2pn/(KN ) 0⭐p⬍KN is a tight
frame of CN . Compute the frame bound.
!
"
#$
5.2 1 Prove that if K ∈ R ⫺ {0},then p (t) ⫽ exp i2pnt/K p∈Z is a tight frame
of L 2 [0, 1]. Compute the frame bound.
5.1
1 Prove that if K ∈ Z ⫺ {0},then
5.3
2 Prove that a finite set of N vectors { }
n 1⭐n⭐N is always a frame of the space
5.4
1 If U and U are two operators from CN to CN , prove that the trace (sum
1
2
V generated by linear combinations of these vectors.
of diagonal values) satisfies tr(U1 U2 ) ⫽ tr(U2 U1 ).
5.5
2 Prove that the translation-invariant frame
{p [n] ⫽ ␦[(n ⫺ p)modN ] ⫺ ␦[(n ⫺ p ⫺ 1)modN ]}0⭐p⬍N
⫺1
is a translation-invariant frame of the space V ⫽ { f ∈ RN : N
n⫽0 f [n] ⫽ 0}.
Compute the frame bounds. Is it a numerically stable frame when N is
large ?
5.6
2 Construct a Riesz basis in CN with a lower frame bound A that tends to
5.7
1 If U is an operator from CN to CP , prove that NullU∗ is the orthogonal
zero and an upper frame bound that tends to ⫹⬁ as N increases.
complement of ImU in CP.
5.8
3 Prove Theorem 5.12.
5.9
1
5.10
Let ĝ ⫽ 1[⫺0 ,0 ] . Prove that {g(t ⫺ p2/0 ) exp (ik0 t)}(k,p)∈Z2 is an
orthonormal basis of L 2 (R).
2 Let g
m,k [n] ⫽ g[n ⫺ mM] exp(i2kn/K ), where g[n] is a window with a
support included in [⫺K /2, K /2 ⫺ 1].
⫺1
mM⫹M/2⫺1
2
(a) Prove that n⫽mM⫺M/2 |g[n ⫺ mM]|2 | f [n]|2 ⫽ K ⫺1 K
k⫽0 | f , gm,k | .
(b) Prove Theorem 5.18 with arguments similar to Theorem 5.17.
5.11
2 Compute the trigonometric polynomials h̃() and g̃() of minimum degree
that satisfy (5.63) for the spline filters (5.67, 5.68) with m ⫽ 2. Compute
˜ and .
˜ Are they finite-energy functions?
numerically the graph of
201
202
CHAPTER 5 Frames
5.12
1 Compute a cubic spline dyadic wavelet with two vanishing moments using
the filter h defined by (5.67) for m ⫽ 3, with a filter g having three nonzero
coefficients. Compute in WAVELAB the dyadic wavelet transform of the Lady
signal with this new wavelet. Calculate g̃[n] if h̃[n] ⫽ h[n].
5.13
2
5.14
1 Let {g(t ⫺ n) exp(ikt)}
5.15
2 Sigma–Delta converter. A signal f (t) is sampled and quantized.We suppose
Prove the tight-frame energy conservation (4.29) of a discrete windowed Fourier transform. Derive (4.28) from general tight-frame properties.
Compute the resulting discrete windowed Fourier transform reproducing
kernel.
(n,k)∈Z2 be a windowed Fourier frame defined
by g(t) ⫽ ⫺1/4 exp(⫺t 2 /2) with  ⫽ and  ⬍ 2. With the conjugate
gradient algorithm of Theorem 5.8, compute numerically the window g̃(t)
that generates the dual frame, for the values of  in Table 5.3. Compare
g̃ with g and explain why they are progressively more different as 
tends to 2.
that f̂ has a support in [⫺/T , /T ].
(a) Let x[n] ⫽ f (nT /K ). Show that if ∈ [⫺, ], then x̂() ⫽ 0 only if ∈
[⫺/K , /K ].
(b) Let x̃[n] ⫽ Q(x[n]) be the quantized samples. We now consider x[n] as
a random vector, and we model the error x[n] ⫺ x̃[n] ⫽ W [n] as a white
noise process of variance 2 . Find the filter h[n] that minimizes
⫽ E{x̃ h ⫺ x2 },
and compute this minimum as a function of 2 and K .
(c) Let ĥp () ⫽ (1 ⫺ e⫺i )⫺p be the transfer function of a discrete integration of order p. We quantize x̃[n] ⫽ Q(x hp [n]). Find the filter h[n] that
minimizes ⫽ E{x̃ h ⫺ x2 }, and compute this minimum as a function
of 2 , K , and p. For a fixed oversampling factor K , how can we reduce
this error?
5.16
3 Oversampled analog-to-digital conversion.Let (t) ⫽ s 1/2 sin(t/s)/(t).
s
(a) Prove that the following family is a union of orthogonal bases:
s
s t ⫺ k ⫺ ns
K
1⭐k⭐K ,n∈Z
Compute the tight-frame bound.
(b) Prove that the frame projector P defined in Proposition 5.9 is a discrete
convolution. Compute its impulse response h[n].
(c) Characterize the signals a[n] that belong to the image space Im⌽ of this
frame.
(d) Let f (t) be a signal with a Fourier transform supported in [⫺/s, /s].
Prove that f s (ns) ⫽ s⫺1/2 f (ns).
5.6 Exercises
(e) Let s0 ⫽ s/K . For all n ∈ Z, we measure the oversampled noisy signal
Y [n] ⫽ f s (ns0 ) ⫹ W [n] where W [n] is a Gaussian white noise of variance 2 . With the frame projector P, compute the error E{|PY [Kn] ⫺
s⫺1/2 f (ns)|2 } and show that it decreases when K increases.
5.17
1 Let be a dyadic wavelet that satisfies (5.48). Let 2 (L 2 (R)) be the space
2
of sequences { gj (u)}j∈Z such that ⫹⬁
j⫽⫺⬁ gj ⬍ ⫹⬁.
(a) Verify that if f ∈ L 2 (R), then {W f (u, 2 j )}j∈Z ∈ 2 (L 2 (R)). Next, let ˜ be
defined by
ˆ
()
˜
()
⫽
⫹⬁
ˆ j 2
j⫽⫺⬁ |(2 )|
,
and W ⫺1 be the operator defined by
W ⫺1 { gj (u)}j∈Z ⫽
⫹⬁
1
g ˜ 2 j (t).
j j
2
j⫽⫺⬁
Prove that W ⫺1 is the pseudo inverse of W in 2 (L 2 (R)).
(b) Verify that ˜ has the same number of vanishing moments as .
(c) Let V be the subspace of 2 (L 2 (R)) that regroups all the dyadic wavelet
transforms of functions in L 2 (R). Compute the orthogonal projection of
{ gj (u)}j∈Z in V.
5.18
1 Prove that if there exist A ⬎ 0 and B ⭓ 0 such that
A (2 ⫺ |ĥ()|2 ) ⭐ |ĝ()|2 ⭐ B (2 ⫺ |ĥ()|2 ),
and if defined in (5.59) belongs to L 2 (R), then the wavelet given by
(5.60) satisfies the dyadic wavelet condition (5.55).
5.19
3 Zak transform. The Zak transform associates to any f ∈ L 2 (R)
Zf (u, ) ⫽
⫹⬁
ei2l f (u ⫺ l).
l⫽⫺⬁
(a) Prove that it is a unitary operator from L 2 (R) to L 2 [0, 1]2 :
⫹⬁
1 1
∗
f (t) g (t) dt ⫽
Zf (u, ) Zg ∗ (u, ) du d,
⫺⬁
0
0
by verifying that for g ⫽ 1[0,1] , it transforms the orthogonal basis
{gn,k (t) ⫽ g(t ⫺ n) exp(i2kt)}(n,k)∈Z2 of L 2 (R) into an orthonormal
basis of L 2 [0, 1]2 .
(b) Prove that the inverse Zak transform is defined by
1
2
2
⫺1
᭙h ∈ L [0, 1] , Z h(u) ⫽
h(u, ) d.
0
203
204
CHAPTER 5 Frames
(c) Prove that if g ∈ L 2 (R) then {g(t ⫺ n) exp(i2kt)}(n,k)∈Z2 is a frame of
L 2 (R) if and only if there exist A ⬎ 0 and B such that
᭙(u, ) ∈ [0, 1]2 ,
A ⭐ |Zg(u, )|2 ⭐ B,
(5.117)
where A and B are the frame bounds.
(d) Prove that if (5.117) holds, then the dual window g̃ of the dual frame is
defined by Z g̃(u, ) ⫽ 1/Zg ∗ (u, ).
5.20
3 Suppose that f̂ has a support in [⫺/T , /T ]. Let { f (t )}
n n∈Z be irregular
samples that satisfy (5.8). With the conjugate gradient Theorem 5.8, implement numerically a procedure that computes a uniform sampling { f (nT )}n∈Z
(from which f can be recovered with the sampling Theorem 3.2). Analyze
the convergence rate of the conjugate-gradient algorithm as a function of ␦.
What happens if the condition (5.8) is not satisfied?
5.21
2 Prove that if (x , x ) is a directional wavelet having p vanishing moments
1 2
in a direction ␣ ⫹ /2 as defined in (5.99), then it is orthogonal to any twodimensional polynomial of degree p ⫺ 1.
5.22
2 Let n
2
k ⫽ (cos(2k/K ), sin(2k/K )) ∈ R .
frame of K vectors and that for any
(a) Prove that {nk }0⭐k⬍K is a tight
⫺1
2
2
⫽ (1 , 2 ) ∈ R2 , it satisfies K
k⫽0 |.nk | ⫽ K /2|| where .n is the
2
inner product in R .
(b) Let k ⫽ ⭸(x)/⭸nk be the derivative of (x) in the direction of nk with
x ∈ R2 . If (x) is rotationally invariant (not modified by a rotation of x),
then prove that the frame condition (5.101) is equivalent to
2A/K ⭐
⫹⬁
ˆ j )|2 ⭐ 2B/K .
22j ||2 | |(2
j⫽⫺⬁
5.23
4 Develop a texture classification algorithm with a two-dimensional Gabor
wavelet transform using four oriented wavelets. The classification procedure can be based on “feature vectors” that provide local averages of the
wavelet transform amplitude at several scales, along these four orientations
[315, 338, 402, 467].
CHAPTER
Wavelet Zoom
6
A wavelet transform can focus on localized signal structures with a zooming
procedure that progressively reduces the scale parameter. Singularities and irregular
structures often carry essential information in a signal. For example, discontinuities
in images may correspond to occlusion contours of objects in a scene. The wavelet
transform amplitude across scales is related to the local signal regularity and Lipschitz exponents. Singularities and edges are detected from wavelet transform local
maxima at multiple scales. These maxima define a geometric scale–space support
from which signal and image approximations are recovered.
Nonisolated singularities appear in highly irregular signals such as multifractals.
The wavelet transform takes advantage of multifractal self-similarities to compute the
distribution of their singularities.This singularity spectrum characterizes multifractal
properties. Throughout this chapter wavelets are real functions.
6.1 LIPSCHITZ REGULARITY
To characterize singular structures, it is necessary to precisely quantify the local
regularity of a signal f (t). Lipschitz exponents provide uniform regularity measurements over time intervals, but also at any point v. If f has a singularity at v, which
means that it is not differentiable at v,then the Lipschitz exponent at v characterizes
this singular behavior.
Section 6.1.1 relates the uniform Lipschitz regularity of f over R to the asymptotic decay of the amplitude of its Fourier transform. This global regularity measurement is useless in analyzing the signal properties at particular locations. Section 6.1.3
studies zooming procedures that measure local Lipschitz exponents from the decay
of the wavelet transform amplitude at fine scales.
6.1.1 Lipschitz Definition and Fourier Analysis
The Taylor formula relates the differentiability of a signal to local polynomial
approximations. Suppose that f is m times differentiable in [v ⫺ h, v ⫹ h]. Let pv be
205
206
CHAPTER 6 Wavelet Zoom
the Taylor polynomial in the neighborhood of v:
pv (t) ⫽
m⫺1
f (k) (v)
(t ⫺ v)k .
k!
k⫽0
(6.1)
The Taylor formula proves that the approximation error
v (t) ⫽ f (t) ⫺ pv (t)
satisfies
᭙t ∈ [v ⫺ h, v ⫹ h],
|v (t)| ⭐
|t ⫺ v|m
| f m (u)|.
sup
m!
u∈[v⫺h,v⫹h]
(6.2)
The mth-order differentiability of f in the neighborhood of v yields an upper
bound on the error v (t) when t tends to v. The Lipschitz regularity refines this
upper bound with noninteger exponents. Lipschitz exponents are also called Hölder
exponents in mathematics literature.
Definition 6.1: Lipschitz.
■
A function f is pointwise Lipschitz ␣ ⭓ 0 at v, if there exists K ⬎ 0 and a polynomial
pv of degree m ⫽ ␣ such that
᭙t ∈ R,
■
■
| f (t) ⫺ pv (t)| ⭐ K |t ⫺ v|␣ .
(6.3)
A function f is uniformly Lipschitz ␣ over [a, b] if it satisfies (6.3) for all v ∈ [a, b]
with a constant K that is independent of v.
The Lipschitz regularity of f at v or over [a, b] is the supremum of the ␣ such that
f is Lipschitz ␣.
At each v the polynomial pv (t) is uniquely defined. If f is m ⫽ ␣ times continuously differentiable in a neighborhood of v, then pv is the Taylor expansion of f at
v. Pointwise Lipschitz exponents may vary arbitrarily from abscissa to abscissa. One
can construct multifractal functions with nonisolated singularities, where f has a
different Lipschitz regularity at each point. In contrast,uniform Lipschitz exponents
provide a more global measurement of regularity,which applies to a whole interval.
If f is uniformly Lipschitz ␣ ⬎ m in the neighborhood of v, then one can verify that
f is necessarily m times continuously differentiable in this neighborhood.
If 0 ⭐ ␣ ⬍ 1, then pv (t) ⫽ f (v) and the Lipschitz condition (6.3) becomes
᭙t ∈ R,
| f (t) ⫺ f (v)| ⭐ K |t ⫺ v|␣ .
A function that is bounded but discontinuous at v is Lipschitz 0 at v. If the Lipschitz
regularity is ␣ ⬍ 1 at v, then f is not differentiable at v and ␣ characterizes the
singularity type.
Fourier Condition
The uniform Lipschitz regularity of f over R is related to the asymptotic decay
of its Fourier transform. Theorem 6.1 can be interpreted as a generalization of
Theorem 2.5.
6.1 Lipschitz Regularity
Theorem 6.1. A function f is bounded and uniformly Lipschitz ␣ over R if
⫹⬁
| f̂ ()| (1 ⫹ ||␣ ) d ⬍ ⫹⬁.
⫺⬁
(6.4)
Proof. To prove that f is bounded, we use the inverse Fourier integral (2.8) and (6.4),
which shows that
| f (t)| ⭐
⫹⬁
⫺⬁
| f̂ ()| d ⬍ ⫹⬁.
Let us now verify the Lipschitz condition (6.3) when 0 ⭐ ␣ ⭐ 1. In this case, pv (t) ⫽
f (v) and the uniform Lipschitz regularity means that there exists K ⬎ 0 such that for all
(t, v) ∈ R2
| f (t) ⫺ f (v)|
⭐K.
|t ⫺ v|␣
Since
f (t) ⫽
1
2
| f (t) ⫺ f (v)|
1
⭐
|t ⫺ v|␣
2
⫹⬁
⫺⬁
⫹⬁
⫺⬁
f̂ () exp(it) d,
| f̂ ()|
| exp(it) ⫺ exp(iv)|
d.
|t ⫺ v|␣
(6.5)
For |t ⫺ v|⫺1 ⭐ ||,
| exp(it) ⫺ exp(iv)|
2
⭐
⭐ 2 ||␣ .
|t ⫺ v|␣
|t ⫺ v|␣
For |t ⫺ v|⫺1 ⭓ ||,
| exp(it) ⫺ exp(iv)| || |t ⫺ v|
⭐
⭐ ||␣ .
|t ⫺ v|␣
|t ⫺ v|␣
Cutting the integral (6.5) in two for || ⬍ |t ⫺ v|⫺1 and || ⭓ |t ⫺ v|⫺1 yields
1
| f (t) ⫺ f (v)|
⭐
|t ⫺ v|␣
2
⫹⬁
⫺⬁
2 | f̂ ()| ||␣ d ⫽ K .
If (6.4) is satisfied, then K ⬍ ⫹⬁ so f is uniformly Lipschitz ␣.
Let us extend this result for m ⫽ ␣ ⬎ 0. We proved in (2.42) that (6.4) implies that
f is m times continuously differentiable. One can verify that f is uniformly Lipschitz ␣
over R if and only if f (m) is uniformly Lipschitz ␣ ⫺ m over R. The Fourier transform of
f (m) is (i)m f̂ (). Since 0 ⭐ ␣ ⫺ m ⬍ 1,we can use our previous result,which proves that
f (m) is uniformly Lipschitz ␣ ⫺ m, and thus that f is uniformly Lipschitz ␣.
■
The Fourier transform is a powerful tool for measuring the minimum global
regularity of functions. However, it is not possible to analyze the regularity of f at a
particular point v from the decay of | f̂ ()| at high frequencies . In contrast, since
wavelets are well localized in time, the wavelet transform gives Lipschitz regularity
over intervals and at points.
207
208
CHAPTER 6 Wavelet Zoom
6.1.2 Wavelet Vanishing Moments
To measure the local regularity of a signal,it is not so important to use a wavelet with
a narrow frequency support, but vanishing moments are crucial. If the wavelet has
n vanishing moments, then we show that the wavelet transform can be interpreted
as a multiscale differential operator of order n. This yields a first relation between
the differentiability of f and its wavelet transform decay at fine scales.
Polynomial Suppression
The Lipschitz property (6.3) approximates f with a polynomial pv in the neighborhood of v:
f (t) ⫽ pv (t) ⫹ v (t)
|v (t)| ⭐ K |t ⫺ v|␣ .
with
(6.6)
A wavelet transform estimates the exponent ␣ by ignoring the polynomial pv . For
this purpose, we use a wavelet that has n ⬎ ␣ vanishing moments:
⫹⬁
t k (t) dt ⫽ 0 for 0 ⭐ k ⬍ n.
⫺⬁
A wavelet with n vanishing moments is orthogonal to polynomials of degree n ⫺ 1.
Since ␣ ⬍ n,the polynomial pv has degree at most n ⫺ 1. With the change of variable
t⬘ ⫽ (t ⫺ u)/s, we verify that
⫹⬁
1
t ⫺u
W pv (u, s) ⫽
pv (t) √
dt ⫽ 0.
(6.7)
s
s
⫺⬁
Since f ⫽ pv ⫹ v ,
W f (u, s) ⫽ W v (u, s).
(6.8)
Section 6.1.3 explains how to measure ␣ from |W f (u, s)| when u is in the
neighborhood of v.
Multiscale Differential Operator
Theorem 6.2 proves that a wavelet with n vanishing moments can be written as the
nth-order derivative of a function ; the resulting wavelet transform is a multiscale
differential operator. We suppose that has a fast decay, which means that for any
decay exponent m ∈ N there exists Cm such that
᭙t ∈ R,
|(t)| ⭐
Cm
.
1 ⫹ |t|m
(6.9)
Theorem 6.2. A wavelet with a fast decay has n vanishing moments if and only if there
exists with a fast decay such that
d n (t)
.
dt n
(6.10)
dn
( f ¯ s )(u),
dun
(6.11)
(t) ⫽ (⫺1)n
As a consequence
W f (u, s) ⫽ sn
6.1 Lipschitz Regularity
with ¯ s(t) ⫽ s⫺1/2 (⫺t/s). Moreover, has no more than n vanishing moments if and
⫹⬁
only if ⫺⬁ (t) dt ⫽ 0.
Proof. The fast decay of implies that ˆ is C⬁ .This is proved by setting f ⫽ ˆ inTheorem 2.5.
The integral of a function is equal to its Fourier transform evaluated at ⫽ 0.The derivative
property (2.22) implies that for any k ⬍ n,
⫹⬁
⫺⬁
t k (t) dt ⫽ (i)k ˆ (k) (0) ⫽ 0.
(6.12)
We can therefore make the factorization
ˆ
ˆ
()
⫽ (⫺i)n (),
(6.13)
ˆ
and ()
is bounded. The fast decay of is proved with an induction on n. For n ⫽ 1,
(t) ⫽
t
⫺⬁
(u) du ⫽
⫹⬁
(u) du,
t
and the fast decay of is derived from (6.9). We then similarly verify that increasing the
order of integration by 1up to n maintains the fast decay of .
⫹⬁
ˆ
Conversely, | ()|
⭐ ⫺⬁ |(t)| dt ⬍ ⫹⬁, because has a fast decay. The Fourier transform of (6.10) yields (6.13), which implies that ˆ (k) (0) ⫽ 0 for k ⬍ n. It follows from
(6.12) that has n vanishing moments.
To test whether has more than n vanishing moments, we compute with (6.13)
⫹⬁
⫺⬁
ˆ
t n (t) dt ⫽ (i)n ˆ (n) (0) ⫽ (⫺i)n n! (0).
ˆ ⫽ ⫹⬁ (t) dt ⫽ 0.
Clearly, has no more than n vanishing moments if and only if (0)
⫺⬁
The wavelet transform (4.32) can be written
W f (u, s) ⫽ f ¯ s (u) with
1
¯ s (t) ⫽ √
s
⫺t
s
.
(6.14)
¯
We derive from (6.10) that ¯ s (t) ⫽ sn d dtsn(t) . Commuting the convolution and differentiation operators yields
n
W f (u, s) ⫽ sn f ⫹⬁
d n ¯ s
dn
(u) ⫽ sn
( f ¯ s )(u).
n
dt
dun
■
If K ⫽ ⫺⬁ (t) dt ⫽ 0, then the convolution f ¯ s (t) can be interpreted as a
weighted average of f with a kernel dilated by s. So (6.11) proves that W f (u, s)
is an nth-order derivative of an averaging of f over a domain proportional to s.
Figure 6.1 shows a wavelet transform calculated with ⫽ ⫺⬘, where is a Gaussian. The resulting W f (u, s) is the derivative of f averaged in the neighborhood of
u with a Gaussian kernel dilated by s.
Since has a fast decay, one can verify that
1
lim √ ¯ s ⫽ K ␦,
s
s→0
209
210
CHAPTER 6 Wavelet Zoom
f (t )
2
1
0
0
0.2
0.4
0.6
0.8
1
0.6
0.8
1
t
(a)
s
0.02
0.04
0.06
0.08
0.1
0.12
0
0.2
0.4
u
(b)
FIGURE 6.1
Wavelet transform W f (u, s) calculated with ⫽ ⫺⬘ where is a Gaussian, for the signal
f shown in (a). Position parameter u and scale s vary, respectively, along the horizontal and
vertical axes. (b) Black, gray, and white points correspond to positive, zero, and negative
wavelet coefficients. Singularities create large-amplitude coefficients in their cone of influence.
in the sense of the weak convergence (A.30). This means that for any that is
continuous at u,
1
lim √ ¯ s (u) ⫽ K (u).
s
s→0
If f is n times continuously differentiable in the neighborhood of u, then (6.11)
implies that
1
W f (u, s)
⫽ lim f (n) √ ¯ s (u) ⫽ K f (n) (u).
n⫹1/2
s→0 s
s→0
s
lim
(6.15)
In particular, if f is Cn with a bounded nth-order derivative, then |W f (u, s)| ⫽
O(sn⫹1/2 ). This is a first relation between the decay of |W f (u, s)| when s decreases
and the uniform regularity of f. Finer relations are studied in the next section.
6.1 Lipschitz Regularity
6.1.3 Regularity Measurements with Wavelets
The decay of the wavelet transform amplitude across scales is related to the uniform
and pointwise Lipschitz regularity of the signal. Measuring this asymptotic decay
is equivalent to zooming into signal structures with a scale that goes to zero. We
suppose that the wavelet has n vanishing moments and is Cn with derivatives that
have a fast decay. This means that for any 0 ⭐ k ⭐ n and m ∈ N there exists Cm such
that
᭙t ∈ R,
| (k) (t)| ⭐
Cm
.
1 ⫹ |t|m
(6.16)
Theorem 6.3 relates the uniform Lipschitz regularity of f on an interval to the
amplitude of its wavelet transform at fine scales.
Theorem 6.3. If f ∈ L 2 (R) is uniformly Lipschitz ␣ ⭐ n over [a, b], then there exists A ⬎ 0
such that
᭙(u, s) ∈ [a, b] ⫻ R⫹ ,
|W f (u, s)| ⭐ A s␣⫹1/2 .
(6.17)
Conversely, suppose that f is bounded and that W f (u, s) satisfies (6.17) for an ␣ ⬍ n
that is not an integer. Then f is uniformly Lipschitz ␣ on [a ⫹ , b ⫺ ], for any ⬎ 0.
Proof. This theorem is proved with minor modifications in the proof of Theorem 6.4.
Since f is Lipschitz ␣ at any v ∈ [a, b], Theorem 6.4 shows in (6.20) that
u ⫺ v ␣
|W f (u, s)| ⭐ A s␣⫹1/2 1 ⫹ .
s
᭙(u, s) ∈ R ⫻ R⫹ ,
For u ∈ [a, b], we can choose v ⫽ u, which implies that |W f (u, s)| ⭐ A s␣⫹1/2 . We verify
from the proof of (6.20) that the constant A does not depend on v because the Lipschitz
regularity is uniform over [a, b].
To prove that f is uniformly Lipschitz ␣ over [a ⫹ , b ⫺ ], we must verify that there
exists K such that for all v ∈ [a ⫹ , b ⫺ ] we can find a polynomial pv of degree ␣ such
that
᭙t ∈ R,
| f (t) ⫺ pv (t)| ⭐ K |t ⫺ v|␣ .
(6.18)
When t ∈
/ [a ⫹ /2, b ⫺ /2], then |t ⫺ v| ⭓ /2, and since f is bounded, (6.18) is verified
with a constant K that depends on . For t ∈ [a ⫹ /2, b ⫺ /2],the proof follows the same
derivations as the proof of pointwise Lipschitz regularity from (6.21) inTheorem 6.4.The
upper bounds (6.26) and (6.27) are replaced by
᭙t ∈ [a ⫹ /2, b ⫺ /2],
(k)
|⌬j (t)| ⭐ K 2(␣⫺k)j
for 0 ⭐ k ⭐ ␣ ⫹ 1.
(6.19)
This inequality is verified by computing an upper-bound integral similar to (6.25) but
which is divided in two—u ∈ [a, b] and u ∈
/ [a, b]. When u ∈ [a, b], the condition (6.21)
is replaced by |W f (u, s)| ⭐ A s␣⫹1/2 in (6.25). When u ∈
/ [a, b], we just use the fact that
|W f (u, s)| ⭐ f and derive (6.19) from the fast decay of | (k) (t)|, by observing
that |t ⫺ u| ⭓ /2 for t ∈ [a ⫹ /2, b ⫺ /2]. The constant K depends on A and but not
on v. The proof then proceeds like the proof of Theorem 6.4, and since the resulting
211
212
CHAPTER 6 Wavelet Zoom
constant K in (6.29) does not depend on v, the Lipschitz regularity is uniform over
[a ⫺ , b ⫹ ].
■
The inequality (6.17) is really a condition on the asymptotic decay of |W f (u, s)|
when s goes to zero. At large scales it does not introduce any constraint since the
Cauchy-Schwarz inequality guarantees that the wavelet transform is bounded:
|W f (u, s)| ⫽ | f , u,s | ⭐ f .
When the scale s decreases, W f (u, s) measures fine-scale variations in the neighborhood of u. Theorem 6.3 proves that |W f (u, s)| decays like s␣⫹1/2 over intervals
where f is uniformly Lipschitz ␣.
Observe that the upper bound (6.17) is similar to the sufficient Fourier condition of theorem (6.1), which supposes that | f̂ ()| decays faster than ⫺␣ . The
wavelet scale s plays the role of a “localized” inverse frequency ⫺1 . As opposed to
the Fourier transform theorem (6.1), the wavelet transform gives a Lipschitz regularity condition that is localized over any finite interval and it provides a necessary
condition that is nearly sufficient. When [a, b] ⫽ R, then (6.17) is a necessary and
sufficient condition for f to be uniformly Lipschitz ␣ on R.
If has exactly n vanishing moments, then the wavelet transform decay gives
no information concerning the Lipschitz regularity of f for ␣ ⬎ n. If f is uniformly
Lipschitz ␣ ⬎ n, then it is Cn and (6.15) proves that lims→0 s⫺n⫺1/2 W f (u, s) ⫽
K f (n) (u) with K ⫽ 0. This proves that |W f (u, s)| ∼ sn⫹1/2 at fine scales despite
the higher regularity of f .
If the Lipschitz exponent ␣ is an integer, then (6.17) is not sufficient to prove
that f is uniformly Lipschitz ␣. When [a, b] ⫽ R, if ␣ ⫽ 1 and has two vanishing
moments, then the class of functions that satisfy (6.17) is called the Zygmund class
[44]. It is slightly larger than the set of functions that are uniformly Lipschitz 1. For
example, f (t) ⫽ t loge t belongs to the Zygmund class although it is not Lipschitz 1
at t ⫽ 0.
Pointwise Lipschitz Regularity
The study of pointwise Lipschitz exponents with the wavelet transform is a delicate and beautiful topic that finds its mathematical roots in the characterization of
Sobolev spaces by Littlewood and Paley in the 1930s. Characterizing the regularity of
f at a point v can be difficult because f may have very different types of singularities
that are aggregated in the neighborhood of v. In 1984, Bony [118] introduced the
“two-microlocalization”theory, which refines the Littlewood-Paley approach to provide pointwise characterization of singularities that he used to study the solution of
hyperbolic partial differential equations. These technical results became much simpler through the work of Jaffard [312] who proved that the two-microlocalization
properties are equivalent to specific decay conditions on the wavelet transform
amplitude. Theorem 6.4 gives a necessary and a sufficient condition on the wavelet
transform for estimating the Lipschitz regularity of f at a point v. Remember that
the wavelet has n vanishing moments and n derivatives having a fast decay.
6.1 Lipschitz Regularity
Theorem 6.4: Jaffard. If f ∈ L 2 (R) is Lipschitz ␣ ⭐ n at v, then there exists A such that
u ⫺ v ␣
(6.20)
᭙(u, s) ∈ R ⫻ R⫹ , |W f (u, s)| ⭐ A s␣⫹1/2 1 ⫹ .
s
Conversely, if ␣ ⬍ n is not an integer and there exist A and ␣⬘ ⬍ ␣ such that
u ⫺ v ␣⬘ ,
(6.21)
᭙(u, s) ∈ R ⫻ R⫹ , |W f (u, s)| ⭐ A s␣⫹1/2 1 ⫹ s
then f is Lipschitz ␣ at v.
Proof. The necessary condition is relatively simple to prove but the sufficient condition is
much more difficult.
Proof of (6.20). Since f is Lipschitz ␣ at v,there exists a polynomial pv of degree ␣ ⬍ n
and K such that | f (t) ⫺ pv (t)| ⭐ K |t ⫺ v|␣ . Since has n vanishing moments, we saw in
(6.7) that Wpv (u, s) ⫽ 0, and thus
⫹⬁ 1 t ⫺u
|W f (u, s)| ⫽ f (t) ⫺ pv (t) √
dt s
s
⫺⬁
⫹⬁
t ⫺u ␣ 1 dt.
K |t ⫺ v| √
⭐
s
s
⫺⬁
The change of variable x ⫽ (t ⫺ u)/s gives
√
|W f (u, s)| ⭐ s
⫹⬁
⫺⬁
K |sx ⫹ u ⫺ v|␣ |(x)| dx.
Since |a ⫹ b|␣ ⭐ 2␣ (|a|␣ ⫹ |b|␣ ),
␣√
|W f (u, s)| ⭐ K 2
s s
␣
⫹⬁
⫺⬁
␣
|x| |(x)| dx ⫹ |u ⫺ v|
␣
⫹⬁
⫺⬁
|(x)| dx ,
which proves (6.20).
Proof of (6.21). The wavelet reconstruction formula (4.37) proves that f can be decomposed in a Littlewood-Paley–type sum
f (t) ⫽
⫹⬁
⌬j (t)
(6.22)
j⫽⫺⬁
with
⌬j (t) ⫽
1
C
⫹⬁ 2 j⫹1
⫺⬁
2j
1
W f (u, s) √
s
t ⫺ u ds
du.
s
s2
(6.23)
(k)
Let ⌬j be its kth-order derivative. To prove that f is Lipschitz ␣ at v we shall
approximate f with a polynomial that generalizes the Taylor polynomial
pv (t) ⫽
␣
k⫽0
⎛
⎝
⫹⬁
j⫽⫺⬁
⎞
(k)
⌬ j (v)⎠
(t ⫺ v)k
.
k!
(6.24)
213
214
CHAPTER 6 Wavelet Zoom
If f is n times differentiable at v, then pv corresponds to the Taylor polynomial; however,
(k)
this is not necessarily true. We shall first prove that ⫹⬁
j⫽⫺⬁ ⌬j (v) is finite by getting
(k)
upper bounds on |⌬j (t)|.These sums may be thought of as a generalization of pointwise
derivatives.
To simplify the notation, we denote by K a generic constant that may change value
from one line to the next but that does not depend on j and t. The hypothesis (6.21) and
the asymptotic decay condition (6.16) imply that
⫹⬁ 2 j⫹1
u ⫺ v ␣⬘ Cm
1
ds
A s␣ 1 ⫹ du
m
C ⫺⬁ 2 j
s
1 ⫹ |(t ⫺ u)/s| s2
⫹⬁
u ⫺ v ␣⬘ 1
du
␣j
m j .
2 1⫹ j ⭐K
j
2
2
1
⫹
(t
⫺
u)/2
⫺⬁
|⌬j (t)| ⫽
(6.25)
Since |u ⫺ v|␣⬘ ⭐ 2␣⬘ (|u ⫺ t|␣⬘ ⫹ |t ⫺ v|␣⬘ ), the change of variable u⬘ ⫽ 2⫺j (u ⫺ t) yields
␣j
|⌬j (t)| ⭐ K 2
⫹⬁
⫺⬁
␣⬘
1 ⫹ |u⬘|␣⬘ ⫹ (v ⫺ t)/2 j du⬘.
1 ⫹ |u⬘|m
Choosing m ⫽ ␣⬘ ⫹ 2 yields
v ⫺ t ␣⬘
1 ⫹ j .
2
␣j
|⌬j (t)| ⭐ K 2
(6.26)
The same derivations applied to the derivatives of ⌬j (t) yield
᭙k ⭐ ␣ ⫹ 1,
(k)
|⌬j (t)| ⭐ K 2(␣⫺k)j
v ⫺ t ␣⬘
1 ⫹ j .
2
(6.27)
At t ⫽ v, it follows that
᭙k ⭐ ␣,
(k)
|⌬j (v)| ⭐ K 2(␣⫺k)j .
(6.28)
(k)
This guarantees a fast decay of |⌬j (v)| when 2 j goes to zero, because ␣ is not an integer
so ␣ ⬎ ␣. At large scales 2 j , since |W f (u, s)| ⭐ f with the change of variable
u⬘ ⫽ (t ⫺ u)/s in (6.23), we have
f
(k)
|⌬j (v)| ⭐
C
⫹⬁
⫺⬁
|
(k)
(u⬘)| du⬘
2 j⫹1
2j
ds
,
s3/2⫹k
(k)
therefore |⌬j (v)| ⭐ K 2⫺(k⫹1/2)j . Together with (6.28) this proves that the polynomial
pv defined in (6.24) has coefficients that are finite.
With the Littlewood-Paley decomposition (6.22), we compute
⎞
⎛
⫹⬁
␣
k (t
⫺
v)
(k)
⎠ .
⎝⌬j (t) ⫺
⌬j (v)
| f (t) ⫺ pv (t)| ⫽ k!
j⫽⫺⬁
k⫽0
6.1 Lipschitz Regularity
The sum over scales is divided in two at 2 j such that 2 j ⭓ |t ⫺ v| ⭓ 2 j⫺1 . For j ⭓ J , we can
use the classical Taylor theorem to bound the Taylor expansion of ⌬j :
␣
⫹⬁ k
(t ⫺ v) (k)
⌬j (t) ⫺
I⫽
⌬
(v)
j
k! j⫽J k⫽0
⭐
⫹⬁
(t ⫺ v)␣⫹1
j⫽J
(␣ ⫹ 1)!
␣⫹1
sup |⌬j
h∈[t,v]
(h)|.
Inserting (6.27) yields
I ⭐ K |t ⫺ v|␣⫹1
v ⫺ t ␣⬘
2⫺j(␣⫹1⫺␣) j ,
2
j⫽J
⫹⬁
and since 2 j ⭓ |t ⫺ v| ⭓ 2 j⫺1 , we get I ⭐ K |v ⫺ t|␣ .
Let us now consider the case j ⬍ J :
j⫺1 ␣
k
(t
⫺
v)
(k)
⌬j (t) ⫺
II ⫽
⌬
(v)
j
k!
j⫽⫺⬁
k⫽0
⎛ ⎞
␣⬘ j⫺1
␣
k
v
⫺
t
(t
⫺
v)
⎝2␣j 1 ⫹ ⭐K
2 j(␣⫺k) ⎠
⫹
2j k!
j⫽⫺⬁
k⫽0
⎛
⭐ K ⎝2␣J ⫹ 2(␣⫺␣⬘)J |t ⫺ v|␣⬘ ⫹
␣
(t ⫺ v)k
k⫽0
k!
⎞
2 j(␣⫺k) ⎠,
and since 2 j ⭓ |t ⫺ v| ⭓ 2 j⫺1 , we get II ⭐ K |v ⫺ t|␣ . As a result,
| f (t) ⫺ pv (t)| ⭐ I ⫹ II ⭐ K |v ⫺ t|␣ ,
which proves that f is Lipschitz ␣ at v.
(6.29)
■
Cone of Influence
To interpret more easily the necessary condition (6.20) and the sufficient condition
(6.21), we shall suppose that has a compact support equal to [⫺C, C]. The cone
of influence of v in the scale–space plane is the set of points (u, s) such that v is
included in the support of u,s (t) ⫽ s⫺1/2 ((t ⫺ u)/s). Since the support of ((t ⫺
u)/s) is equal to [u ⫺ Cs, u ⫹ Cs], the cone of influence of v is defined by
|u ⫺ v| ⭐ Cs.
(6.30)
It is illustrated in Figure 6.2. If u is in the cone of influence of v, then W f (u, s) ⫽
f , u,s depends on the value of f in the neighborhood of v. Since |u ⫺ v|/s ⭐ C,
the conditions (6.20, 6.21) can be written as
|W f (u, s)| ⭐ A⬘ s␣⫹1/2 ,
215
216
CHAPTER 6 Wavelet Zoom
v
0
|u ⫺ v| ⬎ C s
u
|u ⫺ v| ⬎ C s
|u ⫺ v| ⬍ C s
s
FIGURE 6.2
The cone of influence of an abscissa v consists of the scale–space points (u, s) for which the
support of u,s intersects t ⫽ v.
which is identical to the uniform Lipschitz condition (6.17) given by Theorem 6.3.
In Figure 6.1, the high-amplitude wavelet coefficients are in the cone of influence
of each singularity.
Oscillating Singularities
It may seem surprising that (6.20) and (6.21) also impose a condition on the wavelet
transform outside the cone of influence of v. Indeed, this corresponds to wavelets
of which the support does not intersect v. For |u ⫺ v| ⬎ Cs, we get
|W f (u, s)| ⭐ A⬘ s␣⫺␣⬘⫹1/2 |u ⫺ v|␣ .
(6.31)
We shall see that it is indeed necessary to impose this decay when u tends to v in
order to control the oscillations of f that might generate singularities.
Let us consider the generic example of a highly oscillatory function
1
f (t) ⫽ sin ,
t
which is discontinuous at v ⫽ 0 because of the acceleration of its oscillations. Since
is a smooth Cn function,if it is centered close to zero,then the rapid oscillations of
sin t ⫺1 produce a correlation integral sin t ⫺1 , u,s that is very small. With an integration by parts, one can verify that if (u, s) is in the cone of influence of v ⫽ 0, then
|W f (u, s)| ⭐ A s2⫹1/2 . This looks as if f is Lipschitz 2 at 0. However, Figure 6.3
shows high-energy wavelet coefficients outside the cone of influence of v ⫽ 0,
which are responsible for the discontinuity. To guarantee that f is Lipschitz ␣, the
amplitude of such coefficients is controlled by the upper bound (6.31).
To explain why high-frequency oscillations appear outside the cone of influence
of v, we use the results of Section 4.4.3 on the estimation of instantaneous frequencies with wavelet ridges. The instantaneous frequency of sin t ⫺1 ⫽ sin (t) is
|⬘(t)| ⫽ t ⫺2 . Let a be the analytic part of , defined in (4.47). The corresponding
6.1 Lipschitz Regularity
f (t)
1
0
21
20.5
0
t
0.5
0
u
0.5
s
0.05
0.1
0.15
0.2
0.25
20.5
FIGURE 6.3
Wavelet transform of f (t) ⫽ sin(a t ⫺1 ) calculated with ⫽ ⫺⬘, where is a Gaussian.
High-amplitude coefficients are along a parabola outside the cone of influence of t ⫽ 0.
a . It was proved in (4.109)
complex analytic wavelet transform is W a f (u, s) ⫽ f , u,s
⫺1/2
a
|W f (u, s)| is located at the scale
that for a fixed time u, the maximum of s
s(u) ⫽
⫽ u2 ,
⬘(u)
where is the center frequency of ˆ a (). When u varies, the set of points (u, s(u))
defines a ridge that is a parabola located outside the cone of influence of v ⫽ 0 in
the plane (u, s). Since ⫽ Re[ a ], the real wavelet transform is
W f (u, s) ⫽ Re[W a f (u, s)].
The high-amplitude values of W f (u, s) are thus located along the same parabola
ridge curve in the scale–space plane, which clearly appears in Figure 6.3. Real
wavelet coefficients W f (u, s) change signs along the ridge because of the variations
of the complex phase of W a f (u, s).
The example of f (t) ⫽ sin t ⫺1 can be extended to general oscillating singularities
[32]. A function f has an oscillating singularity at v if there exist ␣ ⭓ 0 and  ⬎ 0
such that for t in a neighborhood of v,
1
,
f (t) ∼ |t ⫺ v|␣ g
|t ⫺ v|
217
218
CHAPTER 6 Wavelet Zoom
where g(t) is a C⬁ oscillating function that has primitives bounded at any order.
The function g(t) ⫽ sin t ⫺1 is a typical example. The oscillations have an instantaneous frequency ⬘(t) that increases to infinity faster than |t|⫺1 when t goes to v.
High-energy wavelet coefficients are located along the ridge s(u) ⫽ /⬘(u),and this
curve is necessarily outside the cone of influence |u ⫺ v| ⭐ Cs.
6.2 WAVELET TRANSFORM MODULUS MAXIMA
Theorems 6.3 and 6.4 prove that the local Lipschitz regularity of f at v depends
on the decay at fine scales of |W f (u, s)| in the neighborhood of v. Measuring this
decay directly in the time-scale plane (u, s) is not necessary.The decay of |W f (u, s)|
can indeed be controlled from its local maxima values. Section 6.2.1 studies the
detection and characterization of singularities from wavelet local maxima. Signal
approximations are recovered in Section 6.2.2, from the scale–space support of
these local maxima at dyadic scales.
6.2.1 Detection of Singularities
Singularities are detected by finding the abscissa where the wavelet modulus maxima converge at fine scales. A wavelet modulus maximum is defined as a point
(u0 , s0 ) such that |W f (u, s0 )| is locally maximum at u ⫽ u0 . This implies that
⭸W f (u0 , s0 )
⫽ 0.
⭸u
This local maximum should be a strict local maximum in either the right or the left
neighborhood of u0 to avoid having any local maxima when |W f (u, s0 )| is constant.
We call any connected curve s(u) in the scale–space plane (u, s) along which all
points are modulus maxima a maxima line. (See Figure 6.5b on page 218, which
shows the wavelet modulus maxima of a signal.)
To better understand the properties of these maxima, the wavelet transform is
written as a multiscale differential operator.Theorem 6.2 proves that if has exactly
n vanishing moments and a compact
⫹⬁ support,then there exists of compact support
such that ⫽ (⫺1)n (n) with ⫺⬁ (t) dt ⫽ 0. The wavelet transform is rewritten in
(6.11) as a multiscale differential operator
W f (u, s) ⫽ sn
dn
( f ¯ s )(u).
dun
(6.32)
If the wavelet has only one vanishing moment, wavelet modulus maxima are the
maxima of the first-order derivative of f smoothed by ¯ s ,as illustrated by Figure 6.4.
These multiscale modulus maxima are used to locate discontinuities and edges in
images. If the wavelet has two vanishing moments,the modulus maxima correspond
to high curvatures. Theorem 6.5 proves that if W f (u, s) has no modulus maxima at
fine scales, then f is locally regular.
6.2 Wavelet Transform Modulus Maxima
f (t )
f s (u)
u
W1 f (u, s)
u
W2 f (u, s)
u
FIGURE 6.4
The convolution f ¯ s (u) averages f over a domain proportional to s. If ⫽ ⫺⬘, then
d
W1 f (u, s) ⫽ s du
( f ¯ s )(u) has modulus maxima at sharp variation points of f ¯ s (u). If ⫽ ⬘⬘,
d2
¯
then the modulus maxima of W2 f (u, s) ⫽ s2 du
2 ( f s )(u) correspond to locally maximum
curvatures.
Theorem 6.5: Hwang,
Mallat. Suppose that is Cn with a compact support, and ⫽
⫹⬁
n
(n)
(⫺1) with ⫺⬁ (t)dt ⫽ 0. Let f ∈ L 1 [a, b]. If there exists s0 ⬎ 0 such that |W f (u, s)|
has no local maximum for u ∈ [a, b] and s ⬍ s0 , then f is uniformly Lipschitz n on [a ⫹
, b ⫺ ], for any ⬎ 0.
This theorem is proved in [364]. It implies that f can be singular (not Lipschitz 1)
at a point v only if there is a sequence of wavelet maxima points (up , sp )p∈N that
converges toward v at fine scales:
lim up ⫽ v
p→⫹⬁
and
lim sp ⫽ 0.
p→⫹⬁
These modulus maxima points may or may not be along the same maxima line.
This result guarantees that all singularities are detected by following the wavelet
transform modulus maxima at fine scales. Figure 6.5 gives an example where all
singularities are located by following the maxima lines.
Maxima Propagation
For all ⫽ (⫺1)n (n), we are not guaranteed that a modulus maxima located at
(u0 , s0 ) belongs to a maxima line that propagates toward finer scales. When s
decreases, W f (u, s) may have no more maxima in the neighborhood of u ⫽ u0 .
219
f (t )
2
1
0
0
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
0.6
0.8
1
t
log2(s)
26
24
22
0
0
u
(a)
26
log2(s)
24
22
0
0
0.2
0.4
u
(b)
log2|W f (u, s)|
23
24
25
26
27
26
25
24
23
log2(s)
(c)
FIGURE 6.5
220
(a) Wavelet transform W f (u, s); the horizontal and vertical axes give u and log2 s, respectively.
(b) Modulus maxima of W f (u, s). (c) The full line gives the decay of log2 |W f (u, s)| as a
function of log2 s along the maxima line that converges to the abscissa t ⫽ 0.05. The dashed
line gives log2 |W f (u, s)| along the left maxima line that converges to t ⫽ 0.42.
6.2 Wavelet Transform Modulus Maxima
Theorem 6.6 proves that this is never the case if is a Gaussian. The wavelet transform W f (u, s) can then be written as the solution of the heat-diffusion equation,
where s is proportional to the diffusion time.The maximum principle applied to the
heat-diffusion equation proves that maxima may not disappear when s decreases.
Applications of the heat-diffusion equation to the analysis of multiscale averaging
have been studied by several computer vision researchers [309, 330, 496].
Theorem 6.6: Hummel, Poggio, Yuille. Let ⫽ (⫺1)n (n) , where is a Gaussian. For
any f ∈ L 2 (R), the modulus maxima of W f (u, s) belong to connected curves that are
never interrupted when the scale decreases.
Proof. To simplify the proof, we suppose that is a normalized Gaussian (t) ⫽ 2⫺1 ⫺1/2
ˆ
exp(⫺t 2 /4) and that the Fourier transform is ()
⫽ exp(⫺2 ). Theorem 6.2 proves that
W f (u, s) ⫽ sn f (n) s (u),
(6.33)
where the nth derivative f (n) is defined in the sense of distributions. Let be the diffusion
time. The solution of
⭸g(u, ) ⭸2 g(u, )
⫽
⭸
⭸u2
(6.34)
with initial condition g(u, 0) ⫽ g0 (u) is obtained by computing the Fourier transform with
respect to u of (6.34):
⭸ĝ(, u)
⫽ ⫺2 ĝ(, ).
⭸
It follows that ĝ(, ) ⫽ ĝ0 () exp(⫺2 ) and thus,
1
g(u, ) ⫽ √ g0 √ (u).
For ⫽ s2 , setting g0 ⫽ f (n) and inserting (6.33) yields W f (u, s) ⫽ sn⫹1 g(u, s2 ). Thus,
the wavelet transform is proportional to a heat diffusion with initial condition f (n) .
The maximum principle for the parabolic heat equation [35] proves that a global
maximum of |g(u, s2 )| for (u, s) ∈ [a, b] ⫻ [s0 , s1 ] is necessarily either on the boundary
u ⫽ a, b or at s ⫽ s0 . A modulus maxima of W f (u, s) at (u1 , s1 ) is a local maxima of
|g(u, s2 )| for a fixed s and varying u. Suppose that a line of modulus maxima is interrupted
at (u1 , s1 ), with s1 ⬎ 0. One can then verify that there exists ⬎ 0 such that a global
maximum of |g(u, s2 )| over [u1 ⫺ , u1 ⫹ ] ⫻ [s1 ⫺ , s1 ] is at (u1 , s1 ). This contradicts
the maximum principle, and thus proves that all modulus maxima propagate toward
finer scales.
■
Derivatives of Gaussians are most often used to guarantee that all maxima lines
propagate up to the finest scales. Chaining together maxima into maxima lines is also
a procedure for removing spurious modulus maxima created by numerical errors
in regions where the wavelet transform is close to zero.
221
222
CHAPTER 6 Wavelet Zoom
Isolated Singularities
A wavelet transform may have a sequence of local maxima that converge to an
abscissa v even though f is perfectly regular at v. This is the case of the maxima
line of Figure 6.5 that converges to the abscissa v ⫽ 0.23. To detect singularities it
is therefore not sufficient to follow the wavelet modulus maxima across scales. The
Lipschitz regularity is calculated from the decay of the modulus maxima amplitude.
Let us suppose that for s ⬍ s0 all modulus maxima that converge to v are included
in a cone
|u ⫺ v| ⭐ Cs.
(6.35)
This means that f does not have oscillations that accelerate in the neighborhood of
v. The potential singularity at v is necessarily isolated. Indeed, we can derive from
Theorem 6.5 that the absence of maxima outside the cone of influence implies that f
is uniformly Lipschitz n in the neighborhood of any t ⫽ v with t ∈ (v ⫺ Cs0 , v ⫹ Cs0 ).
The decay of |W f (u, s)| in the neighborhood of v is controlled by the decay of the
modulus maxima included in the cone |u ⫺ v| ⭐ Cs. Theorem 6.3 implies that f is
uniformly Lipschitz ␣ in the neighborhood of v if and only if there exists A ⬎ 0 such
that each modulus maximum (u, s) in the cone (6.35) satisfies
|W f (u, s)| ⭐ A s␣⫹1/2 ,
(6.36)
1
log2 |W f (u, s)| ⭐ log2 A ⫹ ␣ ⫹
log2 s.
2
(6.37)
which is equivalent to
Thus, the Lipschitz regularity at v is the maximum slope of log2 |W f (u, s)| as a
function of log2 s along the maxima lines converging to v.
In numerical calculations, the finest scale of the wavelet transform is limited by
the resolution of the discrete data. From a sampling at intervals N ⫺1 , Section 4.3.3
computes the discrete wavelet transform at scales s ⭓ N ⫺1 where is large enough
to avoid sampling coarsely the wavelets at the finest scale. The Lipschitz regularity
␣ of a singularity is then estimated by measuring the decay slope of log2 |W f (u, s)|
as a function of log2 s for 2 j ⭓ s ⭓ N ⫺1 . The largest scale 2 j should be smaller than
the distance between two consecutive singularities to avoid having other singularities influence the value of W f (u, s). The sampling interval N ⫺1 must be small
enough to measure ␣ accurately. The signal in Figure 6.5(a) is defined by N ⫽ 256
samples. Figure 6.5(c) shows the decay of log2 |W f (u, s)| along the maxima line
converging to t ⫽ 0.05. It has slope ␣ ⫹ 1/2 ≈ 1/2 for 2⫺4 ⭓ s ⭓ 2⫺6 . As expected,
␣ ⫽ 0 because the signal is discontinuous at t ⫽ 0.05. Along the second maxima line
converging to t ⫽ 0.42 the slope is ␣ ⫹ 1/2 ≈ 1,which indicates that the singularity is
Lipschitz 1/2.
When f is a function with singularities that are not isolated, finite resolution
measurements are not sufficient to distinguish individual singularities. Section 6.4
describes a global approach that computes the singularity spectrum of multifractals
by taking advantage of their self-similarity.
6.2 Wavelet Transform Modulus Maxima
Smoothed Singularities
The signal may have important variations that are infinitely continuously differentiable. For example, at the border of a shadow the gray level of an image varies
quickly but is not discontinuous because of the diffraction effect. The smoothness
of these transitions is modeled as a diffusion with a Gaussian kernel that has a
variance that is measured from the decay of wavelet modulus maxima.
In the neighborhood of a sharp transition at v, we suppose that
f (t) ⫽ f0 g (t),
(6.38)
where g is a Gaussian of variance 2 :
g (t) ⫽
2
⫺t
1
exp
.
√
2 2
2
(6.39)
If f0 has a Lipschitz ␣ singularity at v that is isolated and nonoscillating,it is uniformly
Lipschitz ␣ in the neighborhood of v. For wavelets that are derivatives of Gaussians,
Theorem 6.7 [367] relates the decay of the wavelet transform to and ␣.
Theorem 6.7. Let ⫽ (⫺1)n (n) with (t) ⫽ exp(⫺t 2 /(22 )). If f ⫽ f0 g and f0 is
uniformly Lipschitz ␣ on [v ⫺ h, v ⫹ h], then there exists A such that
⫺(n⫺␣)/2
2
᭙(u, s) ∈ [v ⫺ h, v ⫹ h] ⫻ R⫹ , |W f (u, s)| ⭐ A s␣⫹1/2 1 ⫹ 2 2
. (6.40)
 s
Proof. The wavelet transform can be written as
W f (u, s) ⫽ sn
n
dn
¯ s )(u) ⫽ sn d ( f0 g ¯ s )(u).
(
f
dun
dun
(6.41)
Since is a Gaussian, one can verify with a Fourier transform calculation that
¯ s g (t) ⫽
s ¯
s (t) with
s0 0
s0 ⫽ s 2 ⫹
2
.
2
(6.42)
Inserting this result in (6.41) yields
W f (u, s) ⫽ sn
n⫹1/2
s dn
¯ s0 )(u) ⫽ s
(
f
W f0 (u, s0 ).
0
s0 dun
s0
(6.43)
Since f0 is uniformly Lipschitz ␣ on [v ⫺ h, v ⫹ h], Theorem 6.3 proves that there exists
A ⬎ 0 such that
᭙(u, s) ∈ [v ⫺ h, v ⫹ h] ⫻ R⫹ ,
|W f0 (u, s)| ⭐ A s␣⫹1/2 .
(6.44)
Inserting this in (6.43) gives
n⫹1/2
s
␣⫹1/2
|W f (u, s)| ⭐ A
s0
,
s0
from which we derive (6.40) by inserting the expression (6.42) of s0 .
(6.45)
■
223
224
CHAPTER 6 Wavelet Zoom
This theorem explains how the wavelet transform decay relates to the amount of
diffusion of a singularity. At large scales s /, the Gaussian averaging is not “felt”
by the wavelet transform that decays like s␣⫹1/2 . For s ⭐ /, the variation of f at
v is not sharp relative to s because of the Gaussian averaging. At these fine scales,
the wavelet transform decays like sn⫹1/2 because f is C⬁ .
The parameters K , ␣, and are numerically estimated from the decay of the
modulus maxima along the maxima curves that converge toward v. The variance
2 depends on the choice of wavelet and is known in advance. A regression is
performed to approximate
1
n⫺␣
2
log2 |W f (u, s)| ≈ log2 (K ) ⫹ ␣ ⫹ log2 s ⫺
log2 1 ⫹ 2 2 .
2
2
 s
Figure 6.6 gives the wavelet modulus maxima computed with a wavelet that is a second derivative of a Gaussian. The decay of log2 |W f (u, s)| as a function of log2 s is
given along several maxima lines corresponding to smoothed and nonsmoothed
singularities. The wavelet is normalized so that  ⫽ 1 and the diffusion scale is
⫽ 2⫺5 .
6.2.2 Dyadic Maxima Representation
Wavelet transform maxima carry the properties of sharp signal transitions and singularities. By recovering a signal approximation from these maxima,signal singularities
can be modified or removed by processing the wavelet modulus maxima.
For fast numerical computations, the detection of wavelet transform maxima is
limited to dyadic scales {2 j }j∈Z . Suppose that is a dyadic wavelet, which means
that there exist A ⬎ 0 and B such that
⫹⬁
ˆ j )|2 ⭐ B.
᭙ ∈ R ⫺ {0}, A ⭐
|(2
(6.46)
j⫽⫺⬁
As a consequence of Theorem 5.11 on translation-invariant frames, it is proved
in Section 5.2 that the resulting translation-invariant dyadic wavelet transform
{W f (u, 2 j )}j∈Z is complete and stable. This dyadic wavelet transform has the
same properties as a continuous wavelet transform W f (u, s). All theorems of
Sections 6.1.3 and 6.2 remain valid if we restrict s to the dyadic scales {2 j }j∈Z .
Singularities create sequences of maxima that converge toward the corresponding
location at fine scales, and the Lipschitz regularity is calculated from the decay of
the maxima amplitude.
Scale–Space Maxima Support
Mallat and Zhong [367] introduced a dyadic wavelet maxima representation with
a scale–space approximation support ⌳ of modulus maxima (u, 2 j ) of W f .
Wavelet maxima can be interpreted as points of 0 or phase for an appropriate complex wavelet transform. Let ⬘ be the derivative of and ⬘u,2 j (t) ⫽
2⫺j/2 ⬘(2⫺j (t ⫺ u)). If W f has a local extremum at u0 , then
⭸Wf (u0 , 2 j )
⫽ ⫺2⫺j f , ⬘2 j ,u0 ⫽ 0.
⭸u
f (t )
0.4
0.2
0
0
0.2
0.4
0.2
0.4
0.6
0.8
1
0.6
0.8
1
0.6
0.8
1
t
log2 (s)
27
26
25
24
23
0
u
(a)
log2 (s)
27
26
25
24
23
0
0.2
0.4
u
( b)
log2 |W f (u, s)|
log2 |W f (u, s)|
28
26
210
28
212
210
214
216
27
26
25
24
212
log2 (s)
23
27
26
25
24
log2 (s)
23
(c)
FIGURE 6.6
(a) Wavelet transform W f (u, s). (b) Modulus maxima of a wavelet transform computed ⫽ ⬘⬘,
where is a Gaussian with variance  ⫽ 1. (c) Decay of log2 |W f (u, s)| along maxima curves.
The solid and dotted lines (left ) correspond to the maxima curves converging to t ⫽ 0.81 and
t ⫽ 0.12, respectively. They correspond to the curves (right ) converging to t ⫽ 0.38 and
t ⫽ 0.55, respectively. The diffusion at t ⫽ 0.12 and t ⫽ 0.55 modifies the decay for s ⭐ ⫽ 2⫺5 .
225
226
CHAPTER 6 Wavelet Zoom
Let us introduce a complex wavelet c (t) ⫽ (t) ⫹ i⬘(t). If (u, s) ∈ ⌳, then the
resulting complex wavelet transform value is
W c f (u, 2 j ) ⫽ f , 2c j ,u ⫽ f , 2 j ,u ⫹ i f , ⬘2 j ,u ⫽ W f (u, s),
(6.47)
because f , ⬘2 j ,u ⫽ 0. The complex wavelet value W c f (u, s) has a phase equal to
0 or depending on the sign of W f (u, s), and a modulus |W c f (u, s)| ⫽ |W f (u, s)|.
Figure 6.7(c) gives an example computed with the quadratic spline dyadic
wavelet in Figure 5.3. This adaptive sampling of u produces a translation-invariant
representation, which is important for pattern recognition. When f is translated by
each W f (2 j , u) is translated by , so the maxima support is translated by , as
illustrated by Figure 6.8. This is not the case for wavelet frame coefficients, where
the translation parameter u is sampled with an interval proportional to the scale a j ,
as explained in Section 5.3.
Approximations from Wavelet Maxima
Mallat and Zhong [367] recover signal approximations from their wavelet maxima
with an alternate projection algorithm, but several other algorithms have been proposed [150, 190, 286]. In the following we concentrate on orthogonal projection
approximations on the space generated by wavelets in the scale–space maxima
support. Numerical experiments show that dyadic wavelets of compact support
recover signal approximations with a relative mean-square error that is typically of
the order of 10⫺2 .
For general dyadic wavelets,Meyer [45] and Berman and Baras [107] proved that
exact reconstruction is not possible. They found families of continuous or discrete
signals having the same dyadic wavelet transforms and modulus maxima. However,
signals with the same wavelet maxima differ from each other by small amplitude
errors introducing no oscillation, which explains the success of numerical reconstructions [367]. If the signal has a band-limited Fourier transform and if ˆ has
a compact support, then Kicey and Lennard [328] proved that wavelet modulus
maxima define a complete and stable signal representation.
As a result of (6.47), the wavelet modulus maxima specifies the complex
c
wavelet inner products { f , u,2
j }(u,2 j )∈⌳ . Thus, a modulus maxima approximation can be computed as an orthogonal projection of f on the space generated
c
by the complex wavelets {u,2
j }(u,2 j )∈⌳ . To reduce computations, the explicit
extrema condition f̃ , ⬘u,2 j ⫽ 0 is often removed, because it is indirectly almost
obtained by calculating the orthogonal projection over the space V⌳ generated
by the real maxima wavelets {u,2 j }(u,2 j )∈⌳ . Section 5.1.3 shows that this orthogonal projection is obtained from the dual frame {˜ u,2 j }(u,2 j )∈⌳ of {u,2 j }(u,2 j )∈⌳
in V⌳ :
f⌳ ⫽
f , u,2 j ˜ u,2 j .
(6.48)
(u,2 j )∈⌳
6.2 Wavelet Transform Modulus Maxima
f (t )
200
100
0
t
0
0.2
0.4
0.6
0.8
1
(a)
227
226
225
224
223
222
221
220
(b)
227
226
225
224
223
222
221
220
(c)
FIGURE 6.7
(a) Intensity variation along one row of the Lena image. (b) Dyadic wavelet transform computed
at all scales 2N ⫺1 ⭐ 2 j ⭐ 1, with the quadratic spline wavelet ⫽ ⫺⬘ shown in Figure 5.3.
(c) Modulus maxima of the dyadic wavelet transform.
227
228
CHAPTER 6 Wavelet Zoom
W f (u, a j )
u
W f (u, a j )
u
a j u0
FIGURE 6.8
If f (t) ⫽ f (t ⫺ ), then W f (u, a j ) ⫽ W f (u ⫺ , a j ). Uniformly sampling W f (u, a j ) and
W f (u, a j ) at u ⫽ na j u0 may yield very different values if ⫽ ku0 a j .
The dual-synthesis algorithm from Section 5.1.3 computes this orthogonal projection by inverting a symmetric operator L in V⌳ :
Ly⫽
y, u,2 j u,2 j ,
(6.49)
(u,2 j )∈⌳
with a conjugate gradient algorithm Indeed f⌳ ⫽ L⫺1 (L f ).
EXAMPLE 6.1
Figure 6.9(b) shows the approximation f⌳ , recovered with 10 conjugate gradient iterations,
from the wavelet maxima in Figure 6.7(c). This reconstruction is calculated with real instead
of complex wavelets. After 20 iterations, the reconstruction error is f ⫺ f̃ / f ⫽ 2.5 10⫺2 .
Figure 6.9(c) shows the signal reconstructed from 50% of the wavelet maxima that have the
largest amplitude. Sharp signal transitions corresponding to large wavelet maxima have not
been affected, but small texture variations disappear because the corresponding maxima are
removed. The resulting signal is piecewise regular.
Fast Discrete Calculations
The conjugate-gradient inversion of the operator (6.49) iterates on this operator
many times. If there are many local maxima, it is more efficient to compute W y(u, 2 j ) ⫽ y, u,2 j for all (u, 2 j ), with the “algorithme à trous” (see
Section 5.2.2). For a signal of size N , it cascades convolutions with two filters
h[n] and g[n], up to a maximum scale J ⫽ log2 N , with O(N log2 N ) operations. All
nonmaxima coefficients for (u, 2 j ) ∈
/ ⌳ are then set to zero. The reconstruction of
Ly is computed by modifying the filter bank reconstruction given by Theorem 5.14,
which also requires O(N log2 N ) operations.The decomposition and reconstruction
wavelets are the same in (6.49), so the reconstruction filters are h̃[n] ⫽ h[n] and
g̃[n] ⫽ g[n]. The factor 1/2 in (5.72) is also removed because the reconstruction
6.2 Wavelet Transform Modulus Maxima
200
100
0
0
0.2
0.4
0.6
0.8
1
0.6
0.8
1
0.6
0.8
1
(a)
200
100
0
0
0.2
0.4
(b)
200
100
0
0
0.2
0.4
(c)
FIGURE 6.9
(a) Original signal f . (b) Signal approximation f⌳ recovered from the dyadic wavelet maxima
shown in Figure 6.7(c). (c) Approximation recovered from 50% largest maxima.
wavelets in (6.49) are not attenuated by 2⫺j as in a√
nonsampled wavelet reconstruction (5.50). For J ⫽ log2 N ,we initialize ãJ [n] ⫽ C/ N where C is the average signal
value, and for log2 N ⬎ j ⭓ 0 we compute
ãj [n] ⫽ ãj⫹1 hj [n] ⫹ d̃j⫹1 gj [n].
(6.50)
229
230
CHAPTER 6 Wavelet Zoom
One can verify that L y[n] ⫽ ã0 [n] with the same derivations as in the proof of
Theorem 5.14.
The signal approximations shown in Figure 6.9 are computed with the filters of
Table 5.1. About 10 iterations of conjugate gradient are usually sufficient to recover
an approximation with f⌳ ⫺ f / f of the order of 10⫺2 , if all wavelet maxima
are kept.
6.3 MULTISCALE EDGE DETECTION
Image edges are often important for pattern recognition. This is clearly illustrated
by our visual ability to recognize an object from a drawing that gives a rough outline
of contours. But, what is an edge? It could be defined as points where the image
intensity has sharp transitions. A closer look shows that this definition is often
not satisfactory. Image textures do have sharp intensity variations that are often
not considered as edges. When looking at a brick wall, we may decide that the
edges are the contours of the wall whereas the bricks define a texture. Alternatively,
we may include the contours of each brick in the set of edges and consider the
irregular surface of each brick as a texture. The discrimination of edges versus
textures depends on the scale of analysis.
This has motivated computer vision researchers to detect sharp image variations
at different scales [42, 416]. Section 6.3.1 describes the multiscale Canny edge detector [146]. It is equivalent to detecting modulus maxima in a two-dimensional dyadic
wavelet transform [367]. Thus, the scale–space support of these modulus maxima
correspond to multiscale edges. The Lipschitz regularity of edge points is derived
from the decay of wavelet modulus maxima across scales. Image approximations
are recovered with an orthogonal projection on the wavelets of the modulus maxima support with no visual degradation. Thus, image-processing algorithms can be
implemented on multiscale edges.
6.3.1 Wavelet Maxima for Images
Canny Edge Detection
The Canny algorithm detects points of sharp variation in an image f (x1 , x2 ) by
calculating the modulus of its gradient vector
⭸f
⭸f
.
(6.51)
,
ⵜf ⫽
⭸x1 ⭸x2
The partial derivative of f in the direction of a unit vector n
⫽ (cos ␣, sin ␣) in the
x ⫽ (x1 , x2 ) plane is calculated as an inner product with the gradient vector
⭸f
⭸f
⭸f
.n
cos ␣ ⫹
sin ␣.
⫽ ⵜf
⫽
⭸
n
⭸x1
⭸x2
.
The absolute value of this partial derivative is maximum if n
is colinear to ⵜf
This shows that ⵜf (x) is parallel to the direction of maximum change of the surface
6.3 Multiscale Edge Detection
(x)| is locally maximum at x ⫽ y when
f (x).A point y ∈ R2 is defined as an edge if |ⵜf
x ⫽ y ⫹ ⵜf (y) and || is small enough. This means that the partial derivatives of f
reach a local maximum at x ⫽ y,when x varies in a one-dimensional neighborhood of
y along the direction of maximum change of f at y.These edge points are inflection
points of f .
Multiscale Edge Detection
A multiscale version of this edge detector is implemented by smoothing the surface
with a convolution kernel (x) that is dilated. This is computed with two wavelets
that are the partial derivatives of :
1 ⫽ ⫺
⭸
⭸x1
and
2 ⫽ ⫺
⭸
.
⭸x2
(6.52)
The scale varies along the dyadic sequence {2 j }j∈Z to limit computations and storage.
For 1 ⭐ k ⭐ 2, we denote for x ⫽ (x1 , x2 ),
x x
1
1
2
2k j (x1 , x2 ) ⫽ j k j , j
and ¯ 2k j (x) ⫽ 2k j (⫺x).
2
2 2
In the two directions indexed by 1 ⭐ k ⭐ 2, the dyadic wavelet transform of f ∈
L 2 (R2 ) at u ⫽ (u1 , u2 ) is
W k f (u, 2 j ) ⫽ f (x), 2k j (x ⫺ u) ⫽ f ¯ 2k j (u) .
(6.53)
Section 5.5 gives necessary and sufficient conditions for obtaining a complete and
stable representation.
Let us denote 2 j (x) ⫽ 2⫺j (2⫺j x) and ¯ 2 j (x) ⫽ 2 j (⫺x). The two scaled wavelets can be rewritten as
⭸¯ j
¯ 21 j ⫽ 2 j 2
⭸x1
and
⭸¯ j
¯ 22 j ⫽ 2 j 2 .
⭸x2
Thus, let us derive from (6.53) that the wavelet transform components are proportional to the coordinates of the gradient vector of f smoothed by ¯ 2 j :
⭸
1
( f ¯ 2 j )(u)
W f (u, 2 j )
j ⭸u1
f ¯ 2 j )(u).
⫽ 2 j ⵜ(
(6.54)
⫽2
⭸
¯
W 2 f (u, 2 j )
⭸u2 ( f 2 j )(u)
The modulus of this gradient vector is proportional to the wavelet transform modulus
Mf (u, 2 j ) ⫽ |W 1 f (u, 2 j )|2 ⫹ |W 2 f (u, 2 j )|2 .
(6.55)
Let A f (u, 2 j ) be the angle of the wavelet transform vector (6.54) in the plane
(x1 , x2 ):
␣(u)
if W 1 f (u, 2 j ) ⭓ 0
A f (u, 2 j ) ⫽
(6.56)
⫹ ␣(u) if W 1 f (u, 2 j ) ⬍ 0
231
232
CHAPTER 6 Wavelet Zoom
with
␣(u) ⫽ tan⫺1
W 2 f (u, 2 j )
.
W 1 f (u, 2 j )
f ¯ 2 j )(u).
The unit vector n
j (u) ⫽ (cos A f (u, 2 j ), sin A f (u, 2 j )) is colinear to ⵜ(
An edge point at the scale 2 j is a point v such that Mf (u, 2 j ) is locally maximum
at u ⫽ v when u ⫽ v ⫹
nj (v) and || is small enough. These points are also called
wavelet transform modulus maxima. The smoothed image f ¯ 2 j has an inflection
point at a modulus maximum location. Figure 6.10 gives an example where the
wavelet modulus maxima are located along the contour of a circle.
Maxima Curves
Edge points are distributed along curves that often correspond to the boundary
of important structures. Individual wavelet modulus maxima are chained together
to form a maxima curve that follows an edge. At any location, the tangent of the
edge curve is approximated by computing the tangent of a level set. This tangent
direction is used to chain wavelet maxima that are along the same edge curve.
The level sets of g(x) are the curves x(s) in the (x1 , x2 ) plane where g(x(s)) is
constant. The parameter s is the arc-length of the level set. Let ⫽ (1 , 2 ) be the
direction of the tangent of x(s). Since g(x(s)) is constant when s varies,
⭸g
⭸g(x(s))
⭸g
· ⫽ 0.
⫽
1 ⫹
2 ⫽ ⵜg
⭸s
⭸x1
⭸x2
So, ⵜg(x)
is perpendicular to the direction of the tangent of the level set that goes
through x.
This level set property applied to g ⫽ f ¯ 2 j proves that at a maximum point
v the vector n
j (v) of angle A f (v, 2 j ) is perpendicular to the level set of f ¯ 2 j
going through v. If the intensity profile remains constant along an edge, then the
inflection points (maxima points) are along a level set. The tangent of the maxima
curve is therefore perpendicular to n
j (v). The intensity profile of an edge may
not be constant but its variations are often negligible over a neighborhood of size
2 j for a sufficiently small scale 2 j , unless we are near a corner. The tangent of
the maxima curve is then nearly perpendicular to n
j (v). In discrete calculations,
maxima curves are recovered by chaining together any two wavelet maxima at v
and v ⫹ n
, which are neighbors over the image sampling grid and such that n
is
nearly perpendicular to n
j (v).
EXAMPLE 6.2
The dyadic wavelet transform of the image in Figure 6.10 yields modulus images Mf (v, 2 j )
with maxima along the boundary of a disk. This circular edge is also a level set of the image.
Thus, the vector n
j (v) of angle A f (v, 2 j ) is perpendicular to the edge at the maxima locations.
6.3 Multiscale Edge Detection
(a)
(b)
(c)
(d)
(e)
FIGURE 6.10
The very top image has N ⫽ 1282 pixels. (a) Wavelet transform in the horizontal direction with
a scale 2 j that increases from top to bottom: {W 1 f (u, 2 j )}⫺6⭐j⭐0 ; black, gray, and white
pixels correspond to negative, zero, and positive values, respectively. (b) Vertical direction:
{W 2 f (u, 2 j )}⫺6⭐j⭐0 . (c) Wavelet transform modulus {Mf (u, 2 j )}⫺6⭐j⭐0 ; white and black
pixels correspond to zero and large-amplitude coefficients, respectively. (d) Angles
{A f (u, 2 j )}⫺6⭐j⭐0 at points where the modulus is nonzero. (e) The wavelet modulus maxima
support is shown in black.
233
234
CHAPTER 6 Wavelet Zoom
(a)
(b)
(c)
(d)
(e)
(f)
FIGURE 6.11
Multiscale edges of the Lena image shown in Figure 6.12. (a) {W 1 f (u, 2 j )}⫺7⭐j⭐⫺3 .
(b) {W 2 f (u, 2 j )}⫺7⭐j⭐⫺3 . (c) {Mf (u, 2 j )}⫺7⭐j⭐⫺3 . (d) {A f (u, 2 j )}⫺7⭐j⭐⫺3 . (e) Modulus
maxima support. (f) Support of maxima with modulus values above a threshold.
EXAMPLE 6.3
In the Lena image shown in Figure 6.11, some edges disappear when the scale increases.
These correspond to fine-scale intensity variations that are removed by the averaging with
¯ 2 j when 2 j is large. This averaging also modifies the position of the remaining edges.
Figure 6.11(f) displays the wavelet maxima such that Mf (v, 2 j ) ⭓ T for a given threshold T .
They indicate the location of edges where the image has large amplitude variations.
Lipschitz Regularity
The decay of the two-dimensional wavelet transform depends on the regularity of
f . We restrict the analysis to Lipschitz exponents 0 ⭐ ␣ ⭐ 1. A function f is said to
6.3 Multiscale Edge Detection
be Lipschitz ␣ at v ⫽ (v1 , v2 ) if there exists K ⬎ 0 such that for all (x1 , x2 ) ∈ R2 ,
| f (x1 , x2 ) ⫺ f (v1 , v2 )| ⭐ K (|x1 ⫺ v1 |2 ⫹ |x2 ⫺ v2 |2 )␣/2 .
(6.57)
If there exists K ⬎ 0 such that (6.57) is satisfied for any v ∈ ⍀, then f is uniformly
Lipschitz ␣ over ⍀. As in one dimension, the Lipschitz regularity of a function f is
related to the asymptotic decay |W 1 f (u, 2 j )| and |W 2 f (u, 2 j )| in the corresponding
neighborhood. This decay is controlled by Mf (u, 2 j ). Like in Theorem 6.3, one can
prove that f is uniformly Lipschitz ␣ inside a bounded domain of R2 if and only if
there exists A ⬎ 0 such that for all u inside this domain and all scales 2 j ,
|Mf (u, 2 j )| ⭐ A 2j(␣⫹1) .
(6.58)
Suppose that the image has an isolated edge curve along which f has Lipschitz
regularity ␣. The value of |Mf (u, 2 j )| in a two-dimensional neighborhood of the
edge curve can be bounded by the wavelet modulus values along the edge curve.
The Lipschitz regularity ␣ of the edge is estimated with (6.58) by measuring the
slope of log2 |Mf (u, 2 j )| as a function of j. If f is not singular but has a smooth
transition along the edge, the smoothness can be quantified by the variance 2
of a two-dimensional Gaussian blur. The value of 2 is estimated by generalizing
Theorem 6.7.
Reconstruction from Edges
In his book about vision, Marr [42] conjectured that images can be reconstructed
from multiscale edges. For a Canny edge detector, this is equivalent to recovering
images from wavelet modulus maxima. Whether dyadic wavelet maxima define a
complete and stable representation in two dimensions is still an open mathematical
problem. However, the algorithm of Mallat and Zhong [367] recovers an image
approximation that is visually identical to the original one. In the following, image
approximations are computed by projecting the image on the space generated by
wavelets on the modulus maxima support.
Let ⌳ be the set of all modulus maxima points (u, 2 j ). Let n
be the unit vector
in the direction of A f (u, 2 j ) and
3
2j
u,2
j (x) ⫽ 2
⭸2 2 j (x ⫺ u)
.
⭸
n2
Since the wavelet transform modulus Mf (u, 2 j ) has a local extremum at u in the
direction of n
, it results that
3
f , u,2
j ⫽ 0.
(6.59)
A modulus maxima representation provides the set of inner products
k
{ f , u,2
j }(u,2 j )∈⌳,1⭐k⭐3 . A modulus maxima approximation f⌳ can be computed
as an orthogonal projection of f on the space generated by the family of maxima
k }
wavelets {u,2
j (u,2 j )∈⌳,1⭐k⭐3 .
235
236
CHAPTER 6 Wavelet Zoom
3
To reduce computations, the condition on the third wavelets f , u,2
j ⫽ 0 is
removed, because it is indirectly almost imposed by the orthogonal projection over
the space V⌳ generated by the other two wavelets for k ⫽ 1, 2. The dual-synthesis
algorithm from Section 5.1.3 computes this orthogonal projection f⌳ by inverting
a symmetric operator L in V⌳ :
Ly⫽
2
k
k
y, u,2
j u,2 j ,
(6.60)
(u,2 j )∈⌳ k⫽1
with a conjugate-gradient algorithm. Indeed f⌳ ⫽ L⫺1 (L f ). When keeping all modulus maxima, the resulting modulus maxima approximation f⌳ satisfies f⌳ ⫺
f / f ⭐ 10⫺2 . Singularities and edges are nearly perfectly recovered and no spurious oscillations are introduced. The images differ slightly in smooth regions, but
visually this is not noticeable.
EXAMPLE 6.4
The image reconstructed in Figure 6.12(b) is visually identical to the original image. It is
recovered with 10 conjugate-gradient iterations. After 20 iterations, the relative mean-square
reconstruction error is ˜f ⫺ f / f ⫽ 4 10⫺3 . The thresholding of maxima accounts for the
disappearance of image structures from the reconstruction shown in Figure 6.12(c). Sharp
image variations are recovered.
Denoising by Multiscale Edge Thresholding
Multiscale edge representations can be used to reduce additive noise. Denoising
algorithms by thresholding wavelet coefficients are presented in Section 11.3.1.
Block thresholding (see Section 11.4.2) regularizes this coefficient selection by
regrouping them in square blocks. Similarly, a noise-removal algorithm can be
implemented by thresholding multiscale wavelet maxima,while taking into account
their geometric properties.
A simple approach implemented by Hwang and Mallat [364] chains the maxima
into curves that are thresholded as a block. In Figure 6.13 noisy modulus maxima
are shown on the second row and the third row displays the thresholded modulus
maxima chains. At the finest scale shown on the left, the noise is masking the image
structures. Maxima chains are selected by using the position of the selected maxima at the previous scale. An image approximation is recovered from the selected
wavelet maxima. Edges are well-recovered visually but textures and fine structures
are removed. This produces a cartoonlike image.
Illusory Contours
A multiscale wavelet edge detector defines edges as points where the image intensity
varies sharply. However,this definition is too restrictive when edges are used to find
the contours of objects. For image segmentation, edges must define closed curves
6.3 Multiscale Edge Detection
(a)
(b)
(c)
FIGURE 6.12
(a) Original Lena image. (b) Image reconstructed from the wavelet maxima displayed in
Figure 6.11(e) and larger-scale maxima. (c) Image reconstructed from the thresholded wavelet
maxima displayed in Figure 6.11(f) and larger-scale maxima.
that outline the boundaries of each region. Because of noise or light variations,
local edge detectors produce contours with holes. Filling these holes requires some
prior knowledge about the behavior of edges in the image. The illusion of the
Kanizsa triangle [37] shows that such an edge filling is performed by the human
visual system.
In Figure 6.14 one can“see”the edges of a straight and a curved triangle although
the image gray level remains uniformly white between the black discs. Closing edge
curves and understanding illusory contours requires computational models that are
237
238
CHAPTER 6 Wavelet Zoom
(a)
(b)
(c)
(d)
FIGURE 6.13
(a) Noisy peppers image. (b) Peppers image restored from the thresholding maxima chains
shown in (d). The images in row (c) show the wavelet maxima support of the noisy image—the
scale increases from left to right, from 2⫺7 to 2⫺5 . The images in row (d) give the maxima
support computed with a thresholding selection of the maxima chains.
6.3 Multiscale Edge Detection
(a)
(b)
FIGURE 6.14
The illusory edges of a (a) straight and (b) curved triangle are perceived in domains where the
images are uniformly white.
not as local as multiscale differential operators. Such contours can be obtained as
the solution of a global optimization that incorporates constraints on the regularity
of contours and takes into account the existence of occlusions [269].
6.3.2 Fast Multiscale Edge Computations
The dyadic wavelet transform of an image of N pixels is computed with a separable
extension of the filter bank algorithm described in Section 5.2.2. A fast multiscale
edge detection is derived [367].
Wavelet Design
Edge-detection wavelets (6.52) are designed as separable products of the onedimensional dyadic wavelets constructed in Section 5.2.1. Their Fourier transform is
1 ˆ 1 ˆ 2
,
(6.61)
ˆ 1 (1 , 2 ) ⫽ ĝ
2
2
2
and
2 ˆ 1 ˆ 2
ˆ 2 (1 , 2 ) ⫽ ĝ
,
(6.62)
2
2
2
ˆ
where ()
is a scaling function that has energy concentrated at low frequencies
and
√
⫺i
ĝ() ⫽ ⫺i 2 sin
exp
.
(6.63)
2
2
This transfer function is the Fourier transform of a finite difference filter, which
is a discrete approximation of a derivative
⎧
⫺0.5 if p ⫽ 0
g[ p] ⎨
(6.64)
0.5 if p ⫽ 1
√ ⫽
2 ⎩ 0
otherwise.
239
240
CHAPTER 6 Wavelet Zoom
The resulting wavelets 1 and 2 are finite difference approximations of partial
derivatives along x1 and x2 of (x1 , x2 ) ⫽ 4 (2x1 )(2x2 ).
To implement the dyadic wavelet transform with a filter bank algorithm, the
scaling function ˆ is calculated, as in (5.60), with an infinite product:
ˆ
()
⫽
⫹⬁
1 ˆ
ĥ(2⫺p )
⫽ √ ĥ
.
√
2
2
2
2
p⫽1
(6.65)
The 2 periodic function ĥ is the transfer function of a finite impulse-response lowpass filter h[ p]. We showed in (5.61) that the Fourier transform of a box spline of
degree m,
sin(/2) m⫹1
⫺i
1 if m is even
ˆ
()
⫽
exp
with ⫽
0 if m is odd
/2
2
is obtained with
ˆ
√ √ (2)
m⫹1
⫺i
⫽ 2 cos
ĥ() ⫽ 2
exp
.
ˆ
2
2
()
Table 5.1 gives h[ p] for m ⫽ 2.
Algorithme à Trous
The one-dimensional algorithme à trous (see Section 5.2.2) is extended in two
dimensions with convolutions along the image rows and columns.
Each sample a0 [n] of the normalized discrete image is considered to be an average of the input analog image f calculated with the kernel (x1 ) (x2 ) translated
at n ⫽ (n1 , n2 ):
a0 [n1 , n2 ] ⫽ f (x1 , x2 ), (x1 ⫺ n1 ) (x2 ⫺ n2 ) .
This is further justified in Section 7.7.3. For any j ⭓ 0, we denote
aj [n1 , n2 ] ⫽ f (x1 , x2 ), 2 j (x1 ⫺ n1 ) 2 j (x2 ⫺ n2 ) .
The discrete wavelet coefficients at n ⫽ (n1 , n2 ) are
dj1 [n] ⫽ W 1 f (n, 2 j )
and
dj2 [n] ⫽ W 2 f (n, 2 j ).
They are calculated with separable convolutions.
For any j ⭓ 0, the filter h[ p] “dilated” by 2 j is defined by
h[⫺p/2 j ] if p/2 j ∈ Z
h̄j [ p] ⫽
0
otherwise;
and for j ⬎ 0, a centered finite difference filter is defined by
⎧
0.5
if p ⫽ ⫺2 j⫺1
ḡj [ p] ⎨
√ ⫽ ⫺0.5 if p ⫽ 2 j⫺1
⎩
2
0
otherwise.
(6.66)
(6.67)
6.4 Multiscale Edge Detection
√
√
For j ⫽ 0, we define ḡ0 [0]/ 2 ⫽ ⫺0.5, ḡ0 [⫺1]/ 2 ⫽ ⫺0.5 and ḡ0 [ p] ⫽ 0 for p ⫽
0, ⫺1. A separable two-dimensional filter is written as
␣[n1 , n2 ] ⫽ ␣[n1 ] [n2 ],
and ␦[n] is a discrete Dirac. Similar to Theorem 5.14, one can prove that for any
j ⭓ 0 and any n ⫽ (n1 , n2 ),
aj⫹1 [n] ⫽ aj h̄j h̄j [n],
(6.68)
1
[n] ⫽ aj ḡj ␦[n],
dj⫹1
(6.69)
2
dj⫹1
[n] ⫽ aj ␦ḡj [n].
(6.70)
Dyadic wavelet coefficients up to the scale 2 J are therefore calculated by cascading the convolutions (6.68–6.70) for 0 ⬍ j ⭐ J .To take into account border problems,
all convolutions are replaced by circular convolutions, which means that the input
image a0 [n] is considered to be periodic along its rows and columns. For an image
of N pixels,this algorithm requires O(N log2 N ) operations. For a square image with
a maximum scale J ⫽ log2 N 1/2 , one can verify that the larger-scale approximation
is a constant proportional to the gray-level average C:
aJ [n1 , n2 ] ⫽ N
⫺1/2
1/2 ⫺1
N
a0 [n1 , n2 ] ⫽ N 1/2 C.
n1 ,n2 ⫽0
The wavelet transform modulus is M f (n, 2 j ) ⫽ |dj1 [n]|2 ⫹ |dj2 [n]|2 , whereas A f (n,
2 j ) is the angle of the vector (dj1 [n], dj2 [n]).
The support ⌳ of wavelet modulus maxima (u, 2 j ) is the set of points Mf (u, 2 j ),
which is larger than its two neighbors Mf (u ⫾ , 2 j ),where ⫽ (1 , 2 ) is the vector
with coordinates 1 and 2 that are either 0 or 1 and have an angle that is the closest
to A f (u, 2 j ).
Reconstruction from Maxima
The orthogonal projection from wavelet maxima is computed with the dualsynthesis algorithm from Section 5.1.3, which inverts the symmetric operator
(6.60) with conjugate-gradient iterations. This requires computing L y efficiently for
any image y[n]. For this purpose, the wavelet coefficients of y are first calculated
with the algorithme à trous, and all coefficients for (u, 2 j ) ∈
/ ⌳ are set to 0. The signal L y[n] is recovered from these nonzero wavelet coefficients. Let hj [n] ⫽ h̄j [⫺n]
and gj [n] ⫽ ḡj [⫺n] be the two filters defined with (6.66) and (6.67). The calculation
is initialized for J ⫽ log2 N 1/2 by setting ãJ [n] ⫽ C N ⫺1/2 , where C is the average
image intensity. For log2 N ⬎ j ⭓ 0, we compute
1
2
ãj [n] ⫽ ãj⫹1 hj hj [n] ⫹ dj⫹1
gj ␦[n] ⫹ dj⫹1
[n] ␦gj [n],
and one can verify that L y[n] ⫽ ã0 [n] is recovered with O(N log2 N ) operations.
The reconstructed images that were shown in Figure 6.12 are obtained with 10
conjugate-gradient iterations implemented with this filter bank algorithm.
241
242
CHAPTER 6 Wavelet Zoom
6.4 MULTIFRACTALS
Signals that are singular at almost every point were originally studied as pathological
objects of pure mathematical interest. Mandelbrot [41] was the first to recognize
that such phenomena are encountered everywhere. Among the many examples
[25] are economic records such as the Dow Jones industrial average, physiological
data including heart records,electromagnetic fluctuations in galactic radiation noise,
textures in images of natural terrains, variations of traffic flow, and so on.
The singularities of multifractals often vary from point to point,and knowing the
distribution of these singularities is important in analyzing their properties. Pointwise measurements of Lipschitz exponents are not possible because of the finite
numerical resolution.After discretization,each sample corresponds to a time interval
where the signal has an infinite number of singularities that may all be different.The
singularity distribution must therefore be estimated from global measurements that
take advantage of multifractal self-similarities. Section 6.4.2 computes the fractal
dimension of sets of points having the same Lipschitz regularity, with a global partition function calculated from wavelet transform modulus maxima. Applications to
fractal noises, such as fractional Brownian motions and hydrodynamic turbulence,
are studied in Section 6.4.3.
6.4.1 Fractal Sets and Self-Similar Functions
A set S ⊂ Rn is said to be self-similar if it is the union of disjoint subsets S1 , . . . , Sk that
can be obtained from S with a scaling, translation, and rotation. This self-similarity
often implies an infinite multiplication of details,which creates irregular structures.
The triadic Cantor set and the Von Koch curve are simple examples.
EXAMPLE 6.5
The Von Koch curve is a fractal set obtained by recursively dividing each segment of length l
in four segments of length l/3, as illustrated in Figure 6.15. Each subdivision multiplies the
length by 4/3; therefore, the limit of these subdivisions is a curve of infinite length.
EXAMPLE 6.6
The triadic Cantor set is constructed by recursively dividing intervals of size l in two subintervals
of size l/3 and a central hole, illustrated in Figure 6.16. The iteration begins from [0, 1]. The
Cantor set obtained as a limit of these subdivisions is a dust of points in [0, 1].
Fractal Dimension
The Von Koch curve has infinite length in a finite square of R2 ; therefore, the usual
length measurement is not well adapted to characterize the topological properties of
such fractal curves. This motivated Hausdorff in 1919 to introduce a new definition
6.4 Multifractals
l
l/3
l/3
l/3
l/3
FIGURE 6.15
Three iterations of the Von Koch subdivision. The Von Koch curve is the fractal obtained as a
limit of an infinite number of subdivisions.
1
1/3
1/9
1/3
1/9
1/9
1/9
FIGURE 6.16
Three iterations of the Cantor subdivision of [0, 1]. The limit of an infinite number of
subdivisions is a closed set in [0, 1].
of dimension—the capacity dimension—based on the size variations of sets when
measured at different scales.
The capacity dimension is a simplification of the Hausdorff dimension that is
easier to compute numerically. Let S be a bounded set in Rn .We count the minimum
number N (s) of balls of radius s needed to cover S. If S is a set of dimension D with
a finite length (D ⫽ 1), surface (D ⫽ 2), or volume (D ⫽ 3), then
N (s) ∼ s⫺D ,
so
D ⫽ ⫺ lim
s→0
log N (s)
.
log s
(6.71)
The capacity dimension D of S generalizes this result and is defined by
D ⫽ ⫺ lim inf
s→0
log N (s)
.
log s
(6.72)
243
244
CHAPTER 6 Wavelet Zoom
The measure of S is then
M ⫽ lim sup N (s) sD .
s→0
It may be finite or infinite.
The Hausdorff dimension is a refined fractal measure that considers all covers
of S with balls of radius smaller than s. It is most often, but not always, equal to
the capacity dimension. In the following examples, the capacity dimension is called
fractal dimension.
EXAMPLE 6.7
The Von Koch curve has infinite length because its fractal dimension is D ⬎ 1. We need
N (s) ⫽ 4n balls of size s ⫽ 3⫺n to cover the whole curve, thus,
N (3⫺n ) ⫽ (3⫺n )⫺ log 4/ log 3 .
One can verify that at any other scale s, the minimum number of balls N (s) to cover this curve
satisfies
D ⫽ ⫺ lim inf
s→0
log N (s) log 4
⫽
.
log s
log 3
As expected, it has a fractal dimension between 1 and 2.
EXAMPLE 6.8
The triadic Cantor set is covered by N (s) ⫽ 2n intervals of size s ⫽ 3⫺n , so
N (3⫺n ) ⫽ (3⫺n )⫺ log 2/log 3 .
One can also verify that
D ⫽ ⫺ lim inf
s→0
log N (s) log 2
⫽
.
log s
log 3
Self-Similar Functions
Let f be a continuous function with a compact support S. We say that f is selfsimilar if there exist disjoint subsets S1 , . . . , Sk such that the graph of f restricted
to each Si is an affine transformation of f . This means that there exist a scale li ⬎ 1,
a translation ri , a weight pi , and a constant ci such that
᭙t ∈ Si , f (t) ⫽ ci ⫹ pi f li (t ⫺ ri ) .
(6.73)
Outside these subsets, we suppose that f is constant. Generalizations of this
definition can also be used [128].
If a function is self-similar then its wavelet transform is also self-similar. Let g be
an affine transformation of f :
g(t) ⫽ p f l(t ⫺ r) ⫹ c.
(6.74)
6.4 Multifractals
Its wavelet transform is
Wg(u, s) ⫽
⫹⬁
1
g(t) √
s
⫺⬁
t ⫺u
s
dt.
With the change of variable t⬘ ⫽ l(t ⫺ r),since has a zero average,the affine relation
(6.74) implies
p
Wg(u, s) ⫽ √ W f l(u ⫺ r), sl .
l
Suppose that has a compact support included in [⫺K , K ].The affine invariance
(6.73) of f over Si ⫽ [ai , bi ] produces an affine invariance for all wavelets having a
support included in Si . For any s ⬍ (bi ⫺ ai )/K and any u ∈ [ai ⫹ Ks, bi ⫺ Ks],
pi
W f (u, s) ⫽ √ W f li (u ⫺ ri ), sli .
li
The wavelet transform’s self-similarity implies that the positions and values of its
modulus maxima are also self-similar. This can be used to recover unknown affineinvariance properties with a voting procedure based on wavelet modulus maxima
[310].
EXAMPLE 6.9
A Cantor measure is constructed over a Cantor set. Let d0 (x) ⫽ dx be the uniform Lebesgue
measure on [0, 1]. As in the Cantor set construction, this measure is subdivided into three
uniform measures over [0, 1/3], [1/3, 2/3], and [2/3, 1] with integrals equal to p1 , 0, and
p2 , respectively. We impose p1 ⫹ p2 ⫽ 1 to obtain a total measure d1 on [0, 1] with an integral equal to 1. This operation is iteratively repeated by dividing each uniform measure of
integral p over [a, a ⫹ l] into three equal parts where the integrals are p1 p, 0, and p2 p,
respectively, over [a, a ⫹ l/3], [a ⫹ l/3, a ⫹ 2l/3], and [a ⫹ 2l/3, a ⫹ l]. This is illustrated in
Figure 6.17. After each subdivision, the resulting measure dn has a unit integral. In the
limit, we obtain a Cantor measure d⬁ of unit integral with a support that is the triadic Cantor
set.
1
d0(x)
p1
p21
p2
p1 p2
p2 p1
d1(x)
p22
d2(x)
FIGURE 6.17
Two subdivisions of the uniform measure on [0, 1] with left and right weights p1 and p2 . The
Cantor measure d⬁ is the limit of an infinite number of these subdivisions.
245
246
CHAPTER 6 Wavelet Zoom
EXAMPLE 6.10
A devil’s staircase is the integral of a Cantor measure:
f (t) ⫽
t
0
d⬁ (x).
(6.75)
It is a continuous function that increases from 0 to 1 on [0, 1]. The recursive construction of
the Cantor measure implies that f is self-similar:
⎧
⎪
⎨p1 f (3t)
f (t) ⫽ p1
⎪
⎩
p1 ⫹ p2 f (3t ⫺ 2)
if t ∈ [0, 1/3]
if t ∈ [1/3, 2/3]
if t ∈ [2/3, 0].
Figure 6.18 displays the devil’s staircase obtained with p1 ⫽ p2 ⫽ 0.5. The wavelet transform in (b) is calculated with a wavelet that is the first derivative of a Gaussian. The self-similarity
of f yields a wavelet transform and modulus maxima that are self-similar. The subdivision
of each interval in three parts appears through the multiplication by 2 maxima lines when
the scale is multiplied by 3. This Cantor construction is generalized with different interval
subdivisions and weight allocations beginning from the same Lebesgue measure d0 on
[0, 1] [5].
6.4.2 Singularity Spectrum
Finding the distribution of singularities in a multifractal signal f is particularly
important for analyzing its properties. The spectrum of singularity measures the
global repartition of singularities having different Lipschitz regularity. The pointwise Lipschitz regularity of f is given by Definition 6.1.
Definition 6.1: Spectrum. Let S␣ be the set of all points t ∈ R where the pointwise Lipschitz regularity of f is equal to ␣. The spectrum of singularity D(␣) of f is the fractal
dimension of S␣ . The support of D(␣) is the set of ␣ such that S␣ is not empty.
This spectrum was originally introduced by Frisch and Parisi [264] to analyze the
homogeneity of multifractal measures that model the energy dissipation of turbulent
fluids. It was then extended by Arneodo, Bacry, and Muzy [381] to multifractal
signals. The fractal dimension definition (6.72) shows that if we make a disjoint
cover of the support of f with intervals of size s, then the number of intervals that
intersect S␣ is
N␣ (s) ∼ s⫺D(␣) .
(6.76)
The singularity spectrum gives the proportion of Lipschitz ␣ singularities that appear
at any scale s. A multifractal f is said to be homogeneous if all singularities have the
same Lipschitz exponent ␣0 , which means the support of D(␣) is restricted to {␣0 }.
Fractional Brownian motions are examples of homogeneous multifractals.
6.4 Multifractals
f (t)
1
0.5
0
0
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
0.6
0.8
1
t
log2 (s)
26
24
22
0
u
0
(a)
log2 (s)
26
24
22
0
u
0
0.2
0.4
(b)
FIGURE 6.18
Devil’s staircase calculated from a Cantor measure with equal weights p1 ⫽ p2 ⫽ 0.5.
(a) Wavelet transform W f (u, s) computed with ⫽ ⫺⬘ where is Gaussian. (b) Wavelet
transform modulus maxima.
247
248
CHAPTER 6 Wavelet Zoom
Partition Function
One cannot compute the pointwise Lipschitz regularity of a multifractal because
its singularities are not isolated, and the finite numerical resolution is not sufficient
to discriminate them. It is possible, however, to measure the singularity spectrum
of multifractals from the wavelet transform local maxima using a global partition
function introduced by Arneodo, Bacry, and Muzy [381].
Let be a wavelet with n vanishing moments. Theorem 6.5 proves that if f has
pointwise Lipschitz regularity ␣0 ⬍ n at v, then the wavelet transform W f (u, s) has
a sequence of modulus maxima that converges toward v at fine scales. Thus, the set
of maxima at the scale s can be interpreted as a covering of the singular support of
f with wavelets of scale s. At these maxima locations,
|W f (u, s)| ∼ s␣0 ⫹1/2 .
Let {up (s)}p∈Z be the position of all local maxima of |Wg(u, s)| at a fixed scale s.
The partition function Z measures the sum at a power q of all these wavelet modulus
maxima:
Z (q, s) ⫽
|W f (up , s)|q .
(6.77)
p
At each scale s, any two consecutive maxima up and up⫹1 are supposed to have a
distance |up⫹1 ⫺ up | ⬎ s, for some ⬎ 0. If not, over intervals of size s, the sum
(6.77) includes only the maxima of largest amplitude. This protects the partition
function from the multiplication of very close maxima created by fast oscillations.
For each q ∈ R, the scaling exponent (q) measures the asymptotic decay of
Z(q, s) at fine scales s:
(q) ⫽ lim inf
s→0
log Z (q, s)
.
log s
This typically means that
Z (q, s) ∼ s(q) .
Legendre Transform
Theorem 6.8 relates (q) to the Legendre transform of D(␣) for self-similar signals. This result was established in [91] for a particular class of fractal signals and
generalized by Jaffard [313].
Theorem 6.8: Arneodo, Bacry, Jaffard, Muzy. Let ⌳ ⫽ [␣min , ␣max ] be the support of
D(␣). Let be a wavelet with n ⬎ ␣max vanishing moments. If f is a self-similar signal,
then
(q) ⫽ min q (␣ ⫹ 1/2) ⫺ D(␣) .
(6.78)
␣∈⌳
Proof. The detailed proof is long; we only give an intuitive justification. The sum (6.77) over
all maxima positions is replaced by an integral over the Lipschitz parameter. At the scale s,
6.4 Multifractals
(6.76) indicates that the density of modulus maxima that cover a singularity with Lipschitz
exponent ␣ is proportional to s⫺D(␣) . At locations where f has Lipschitz regularity ␣, the
wavelet transform decay is approximated by
|W f (u, s)| ∼ s␣⫹1/2 .
It follows that
Z (q, s) ∼
⌳
sq(␣⫹1/2) s⫺D(␣) d␣.
When s goes to 0 we derive that Z(q, s) ∼ s(q) for (q) ⫽ min␣∈⌳ (q(␣ ⫹ 1/2)
⫺ D(␣)).
■
This theorem proves that the scaling exponent (q) is the Legendre transform of
D(␣). It is necessary to use a wavelet with enough vanishing moments to measure
all Lipschitz exponents up to ␣max . In numerical calculations (q) is computed by
evaluating the sum Z(q, s). Thus, we need to invert the Legendre transform (6.78)
to recover the spectrum of singularity D(␣).
Theorem 6.9.
■
■
The scaling exponent (q) is a concave and increasing function of q.
The Legendre transform (6.78) is invertible if and only if D(␣) is concave, in which
case
(6.79)
D(␣) ⫽ min q (␣ ⫹ 1/2) ⫺ (q) .
q∈R
■
The spectrum D(␣) of self-similar signals is concave.
Proof. The proof that D(␣) is concave for self-similar signals can be found in [313]. We
concentrate on the properties of the Legendre transform that are important in numerical
calculations. To simplify the proof, let us suppose that D(q) is twice differentiable. The
minimum of the Legendre transform (6.78) is reached at a critical point q(␣). Computing
the derivative of q(␣ ⫹ 1/2) ⫺ D(␣) with respect to ␣ gives
q(␣) ⫽
with
dD
,
d␣
1
(q) ⫽ q ␣ ⫹
⫺ D(␣).
2
(6.80)
(6.81)
Since it is a minimum, the second derivative of (q(␣)) with respect to ␣ is positive, from
which we derive that
d 2 D(␣(q))
⭐ 0.
d␣2
This proves that (q) depends only on the values where D(␣) has a negative second
derivative. Thus, we can recover D(␣) from (q) only if it is concave.
249
250
CHAPTER 6 Wavelet Zoom
The derivative of (q) is
1
d␣ d␣ dD(␣)
1
d(q)
⫽␣⫹ ⫹q
⫺
⫽ ␣ ⫹ ⭓ 0.
dq
2
dq dq d␣
2
(6.82)
Therefore, it is increasing. Its second derivative is
d 2 (q) d␣
⫽
.
dq 2
dq
Taking the derivative of (6.80) with respect to q proves that
d␣ d 2 D(␣)
⫽ 1.
dq d␣2
d (q)
D(␣)
Since d d␣
2 ⭐ 0, we derive that dq 2 ⭐ 0. Thus, (q) is concave. By using (6.81), (6.82),
and the fact that (q) is concave, we verify that
2
2
D(␣) ⫽ min q (␣ ⫹ 1/2) ⫺ (q) .
q∈R
■
The spectrum D(␣) of self-similar signals is concave and therefore can be calculated from (q) with the inverse Legendre formula (6.79). This formula is also
valid for a much larger class of multifractals; for example, it is verified for statistical
self-similar signals such as realizations of fractional Brownian motions. Multifractals
having some stochastic self-similarity have a spectrum that can often be calculated
as an inverse Legendre transform (6.79). However, let us emphasize that this formula is not exact for any function f because its spectrum of singularity D(␣) is not
necessarily concave. In general, Jaffard proved [313] that the Legendre transform
(6.79) gives only an upper bound of D(␣). These singularity spectrum properties
are studied in detail in [46].
Figure 6.19 illustrates the properties of a concave spectrum D(␣). The Legendre
transform (6.78) proves that its maximum is reached at
D(␣0 ) ⫽ max D(␣) ⫽ ⫺(0).
␣∈⌳
It is the fractal dimension of the Lipschitz exponent ␣0 most frequently encountered
in f . Since all other Lipschitz ␣ singularities appear over sets of lower dimension,
if ␣0 ⬍ 1, then D(␣0 ) is also the fractal dimension of the singular support of f . The
spectrum D(␣) for ␣ ⬍ ␣0 depends on (q) for q ⬎ 0, and for ␣ ⬎ ␣0 it depends on
(q) for q ⬍ 0.
Numerical Calculations
To compute D(␣),we assume
that the Legendre transform formula (6.79) is valid.We
first calculate Z(q, s) ⫽ p |W f (up , s)|q , then derive the decay scaling exponent
(q), and finally compute D(␣) with a Legendre transform. If q ⬍ 0, then the value
of Z(q, s) depends mostly on the small-amplitude maxima |W f (up , s)|. Numerical
calculations may then become unstable. To avoid introducing spurious modulus
6.4 Multifractals
D (␣)
q50
q,0
q.0
q 5 1⬁
␣min
0
q 5 2⬁
␣max
␣0
␣
FIGURE 6.19
Concave spectrum D(␣).
maxima created by numerical errors in regions where f is nearly constant, wavelet
maxima are chained to produce maxima curve across scales. If ⫽ (⫺1)p ( p) where
is a Gaussian,Theorem 6.6 proves that all maxima lines up (s) define curves that
propagate up to the limit s ⫽ 0. Thus, all maxima lines that do not propagate up
to the finest scale are removed in the calculation of Z(q, s). The calculation of the
spectrum D(␣) proceeds as follows:
1. Maxima. Compute W f (u, s) and the modulus maxima at each scale s. Chain
the wavelet maxima across scales.
2. Partition function. Compute
Z (q, s) ⫽
|W f (up , s)|q .
p
3. Scaling. Compute (q) with a linear regression of log2 Z(s, q) as a function
of log2 s:
log2 Z (q, s) ≈ (q) log2 s ⫹ C(q).
4. Spectrum. Compute
D(␣) ⫽ min q(␣ ⫹ 1/2) ⫺ (q) .
q∈R
EXAMPLE 6.11
The spectrum of singularity D(␣) of the devil’s staircase (6.75) is a concave function that can
be calculated analytically [292]. Suppose that p1 ⬍ p2 . The support of D(␣) is [␣min , ␣max ]
with
␣min ⫽
⫺ log p2
log 3
and
␣max ⫽
⫺ log p1
.
log 3
251
252
CHAPTER 6 Wavelet Zoom
f (t)
200
1
150
0.8
100
0.6
50
0.4
0
Z (q, s)
q ⫽ ⫺10.00
q ⫽⫺6.67
q ⫽⫺3.33
q ⫽ 0.00
q ⫽ 3.33
⫺50 q ⫽ 6.67
0.2
⫺100
0
t
0
0.2
0.4
0.6
0.8
1
q ⫽ 10.00
⫺150
⫺10
⫺8
⫺6
⫺2
log2 s
(b)
(a)
10
⫺4
D (␣)
(q)
0.6
5
0.5
0
0.4
⫺5
0.3
0.2
⫺10
⫺15
⫺10
0.1
⫺5
0
(c)
5
q
10
0
0.4
0.5
0.6
0.7
0.8
␣
0.9
(d)
FIGURE 6.20
(a) Devil’s staircase with p1 ⫽ 0.4 and p2 ⫽ 0.6. (b) Partition function Z (q, s) for several values
of q. (c) Scaling exponent (q). (d) The theoretical spectrum D(␣) is shown with a solid line.
The spectrum values are calculated numerically with a Legendre transform of (q).
If p1 ⫽ p2 ⫽ 1/2, then the support of D(␣) is reduced to a point, which means that all the
singularities of f have the same Lipschitz log 2/log 3 regularity. The value D(log 2/log 3) is
then the fractal dimension of the triadic Cantor set and is equal to log 2/log 3.
Figure 6.20(a) shows a devil’s staircase calculated with p1 ⫽ 0.4 and p2 ⫽ 0.6. Its wavelet
transform is computed with ⫽ ⫺⬘ where is a Gaussian. The decay of log2 Z(q, s) as a
function of log2 s is shown in Figure 6.20(b) for several values of q. The resulting (q) and
D(␣) are given by Figures 6.20(c, d). There is no numerical instability for q ⬍ 0, because there
is no modulus maximum that has an amplitude close to zero. This is not the case if the wavelet
transform is calculated with a wavelet that has more vanishing moments.
Smooth Perturbations
Let f be a multifractal with a spectrum of singularity D(␣) calculated from (q). If a
C⬁ signal g is added to f then the singularities are not modified and the singularity
6.4 Multifractals
spectrum of f̃ ⫽ f ⫹ g remains D(␣).We study the effect of this smooth perturbation
on the spectrum calculation.
The wavelet transform of f̃ is
W f̃ (u, s) ⫽ W f (u, s) ⫹ Wg(u, s).
Let (q) and ˜ (q) be the scaling exponent of the partition functions Z(q, s) and
Z̃(q, s) calculated from the modulus maxima of W f (u, s) and W f̃ (u, s), respectively. We denote by D(␣) and D̃(␣) the Legendre transforms of (q) and ˜ (q),
respectively. Theorem 6.10 relates (q) and ˜ (q).
Theorem 6.10: Arneodo, Bacry, Muzy. Let be a wavelet with exactly n vanishing
moments. Suppose that f is a self-similar function.
■
■
If g is a polynomial of degree p ⬍ n, then (q) ⫽ ˜ (q) for all q ∈ R.
If g (n) is almost everywhere nonzero, then
(q)
˜ (q) ⫽
(n ⫹ 1/2) q
if q ⬎ qc
if q ⭐ qc
(6.83)
where qc is defined by (qc ) ⫽ (n ⫹ 1/2)qc .
Proof. If g is a polynomial of degree p ⬍ n, then Wg(u, s) ⫽ 0. The addition of g does not
modify the calculation of the singularity spectrum based on wavelet maxima, so (q) ⫽
˜ (q) for all q ∈ R.
If g is a C⬁ function that is not a polynomial then its wavelet transform is generally
nonzero. We justify (6.83) with an intuitive argument that is not a proof. A rigorous proof
can be found in [91]. Since has exactly n vanishing moments, (6.15) proves that
|Wg(u, s)| ∼ K sn⫹1/2 g (n) (u).
We suppose that g (n) (u) ⫽ 0. For (q) ⭐ (n ⫹ 1/2)q, since |Wg(u, s)|q ∼ sq(n⫹1/2) has a
faster asymptotic decay than s(q) when s goes to zero, one can verify that Z̃(q, s) and
Z(q, s) have the same scaling exponent, ˜ (q) ⫽ (q). If (q) ⬎ (n ⫹ 1/2)q, which means
that q ⭐ qc , then the decay of |W f̃ (u, s)|q is controlled by the decay of |Wg(u, s)|q , so
˜ (q) ⫽ (n ⫹ 1/2)q.
■
This theorem proves that the addition of a nonpolynomial smooth function introduces a bias in the calculation of the singularity spectrum. Let ␣c be the critical
Lipschitz exponent corresponding to qc :
D(␣c ) ⫽ qc (␣c ⫹ 1/2) ⫺ (qc ).
The Legendre transform of ˜ (q) in (6.83) yields
⎧
⎨ D(␣) if ␣ ⭐ ␣c
D̃(␣) ⫽
0
if ␣ ⫽ n
⎩
⫺⬁
if ␣ ⬎ ␣c and ␣ ⫽ n.
This modification is illustrated by Figure 6.21.
(6.84)
253
254
CHAPTER 6 Wavelet Zoom
~
D (␣)
D (␣)
D (␣)
␣min
0
␣0
␣c
␣max
n
␣
FIGURE 6.21
If has n vanishing moments, in the presence of a C⬁ perturbation the computed spectrum
D̃(␣) is identical to the true spectrum D(␣) for ␣ ⭐ ␣c . Its support is reduced to {n} for ␣ ⬎ ␣c .
The bias introduced by the addition of smooth components can be detected
experimentally by modifying the number n of vanishing moments of . Indeed
the value of qc depends on n. If the singularity spectrum varies when changing
the number of vanishing moments of the wavelet then it indicates the presence of
a bias.
6.4.3 Fractal Noises
Fractional Brownian motions are statistically self-similar Gaussian processes that
give interesting models for a wide class of natural phenomena [371]. Despite their
nonstationarity, one can define a power spectrum that has a power decay. Realizations of fractional Brownian motions are almost everywhere singular, with the same
Lipschitz regularity at all points.
We often encounter fractal noise processes that are not Gaussian although their
power spectrum has a power decay. Realizations of these processes may include singularities of various types.The spectrum of singularity is then important in analyzing
their properties. This is illustrated by an application to hydrodynamic turbulence.
Definition 6.2: Fractional Brownian Motion. A fractional Brownian motion of Hurst
exponent 0 ⬍ H ⬍ 1 is a zero-mean Gaussian process BH such that
BH (0) ⫽ 0,
and
E{|BH (t) ⫺ BH (t ⫺ ⌬)|2 } ⫽ 2 |⌬|2H .
(6.85)
Property (6.85) imposes that the deviation of |BH (t) ⫺ BH (t ⫺ ⌬)| be proportional to |⌬|H . As a consequence, one can prove that any realization f of BH is
almost everywhere singular with a pointwise Lipschitz regularity ␣ ⫽ H.The smaller
6.4 Multifractals
f (t)
20
0
220
0
0.2
0.4
0.6
0.8
1
t
(a)
212
log2 s
log2 (s)
210
210
28
28
26
26
24
24
22
22
0
0.2
0.4
0.6
0.8
u
1
0
0
0.2
(b)
4
0.4
0.6
0.8
1
(c)
(q)
D (␣)
1
3
2
0.9
1
0.8
0
0.7
21
q
0
2
4
0.65
0.7
␣
0.75
(e)
(d)
FIGURE 6.22
(a) One realization of a fractional Brownian motion for a Hurst exponent H ⫽ 0.7. (b) Wavelet
transform. (c) Modulus maxima of its wavelet transform. (d) Scaling exponent (q).
(e) Resulting D(␣) over its support.
H is, the more singular f is. Figure 6.22(a) shows the graph of one realization for
H ⫽ 0.7.
Setting ⌬ ⫽ t in (6.85) yields
E{|BH (t)|2 } ⫽ 2 |t|2H .
255
256
CHAPTER 6 Wavelet Zoom
Developing (6.85) for ⌬ ⫽ t ⫺ u also gives
E{BH (t) BH (u)} ⫽
2 2H
|t| ⫹ |u|2H ⫺ |t ⫺ u|2H .
2
(6.86)
The covariance does not depend only on t ⫺ u, which proves that a fractional
Brownian motion is nonstationary.
The statistical self-similarity appears when scaling this process. One can derive
from (6.86) that for any s ⬎ 0,
E{BH (st) BH (su)} ⫽ E{sH BH (t) sH BH (u)}.
Since BH (st) and sH BH (t) are two Gaussian processes with the same mean and
covariance, they have the same probability distribution,
BH (st) ≡ sH BH (t),
where ≡ denotes an equality of finite-dimensional distributions.
Power Spectrum
Although BH is not stationary, one can define a generalized power spectrum. This
power spectrum is introduced by proving that the increments of a fractional
Brownian motion are stationary and by computing their power spectrum [73].
Theorem 6.11. Let g⌬ (t) ⫽ ␦(t) ⫺ ␦(t ⫺ ⌬). The increment
IH,⌬ (t) ⫽ BH g⌬ (t) ⫽ BH (t) ⫺ BH (t ⫺ ⌬)
(6.87)
is a stationary process with power spectrum
R̂IH,⌬ () ⫽
2
H
|ĝ⌬ ()|2 .
||2H⫹1
(6.88)
Proof. The covariance of IH,⌬ is computed with (6.86):
E{IH,⌬ (t) IH,⌬ (t ⫺ )} ⫽
2
(| ⫺ ⌬|2H ⫹ | ⫹ ⌬|2H ⫺ 2||2H ) ⫽ RIH,⌬ ().
2
(6.89)
The power spectrum R̂IH,⌬ () is the Fourier transform of RIH,⌬ (). One can verify that the
Fourier transform of the distribution f () ⫽ ||2H is f̂ () ⫽ ⫺H ||⫺(2H⫹1) , with H ⬎ 0.
Thus, we derive that the Fourier transform of (6.89) can be written as
R̂IH,⌬ () ⫽ 2 2 H ||⫺(2H⫹1) sin2
2 ⫽ 2 /2.
which proves (6.88) for H
H
⌬
,
2
■
If X(t) is a stationary process,then we know that Y (t) ⫽ X g(t) is also stationary
and the power spectrum of both processes is related by
R̂X () ⫽
R̂Y ()
.
|ĝ()|2
(6.90)
6.4 Multifractals
Although BH (t) is not stationary, Theorem 6.11 proves that IH,⌬ (t) ⫽ BH g⌬ (t) is
stationary. As in (6.90), it is tempting to define a “generalized” power spectrum
calculated with (6.88):
R̂BH () ⫽
R̂IH,⌬ ()
|ĝ⌬ ()|
⫽
2
2
H
.
||2H⫹1
(6.91)
The nonstationarity of BH (t) appears in the energy blow-up at low frequencies. The
increments IH,⌬ (t) are stationary because the multiplication by |ĝ⌬ ()|2 ⫽ O(2 )
removes the explosion of the low-frequency energy. One can generalize this result
and verify that if g is an arbitrary stable filter with a transfer function that satisfies
|ĝ()| ⫽ O(), then Y (t) ⫽ BH g(t) is a stationary Gaussian process with a power
spectrum that is
R̂Y () ⫽
2
H
|ĝ()|2 .
2H⫹1
||
(6.92)
Wavelet Transform
The wavelet transform of a fractional Brownian motion is
WBH (u, s) ⫽ BH ¯ s (u).
(6.93)
ˆ
Since has at least one vanishing moment, necessarily |()|
⫽ O() in the
¯
neighborhood
of
⫽
0.
The
wavelet
filter
g
⫽
has
a
Fourier
transform
ĝ() ⫽
s
√ ∗
s ˆ (s) ⫽ O() near ⫽ 0. This proves that for a fixed s the process Ys (u) ⫽
WBH (u, s) is a Gaussian stationary process [258] with a power spectrum that is
calculated with (6.92):
2
ˆ
R̂Ys () ⫽ s |(s)|
2
H
⫽ s2H⫹2 R̂Y1 (s).
||2H⫹1
(6.94)
The self-similarity of the power spectrum and the fact that BH is Gaussian are
sufficient to prove that WBH (u, s) is self-similar across scales:
u
WBH (u, s) ≡ sH⫹1/2 WBH
,1 ,
s
where the equivalence means that they have the same finite distributions. Interesting characterizations of fractional Brownian motion properties are also obtained by
decomposing these processes in wavelet bases [46, 73, 490].
EXAMPLE 6.12
Figure 6.22(a) on page 253 displays one realization of a fractional Brownian with H ⫽ 0.7.
The wavelet transform and its modulus maxima are shown in Figures 6.22(b) and 6.22(c).
The partition function (6.77) is computed from the wavelet modulus maxima. Figure 6.22(d)
gives the scaling exponent (q), which is nearly a straight line.
257
258
CHAPTER 6 Wavelet Zoom
Fractional Brownian motions are homogeneous fractals with Lipschitz exponents equal
to H. In this example, the theoretical spectrum D(␣) has a support reduced to {0.7} with
D(0.7) ⫽ 1. The estimated spectrum in Figure 6.22(e) is calculated with a Legendre transform
of (q). Its support is [0.65, 0.75]. There is an estimation error because the calculations are
performed on a signal of finite size.
Fractal Noises
Some physical phenomena produce more general fractal noises X(t), which are
not Gaussian processes, but they do have stationary increments. As for fractional
Brownian motions, one can define a generalized power spectrum with a power
decay
R̂X () ⫽
2
H
.
2H⫹1
||
These processes are transformed into a wide-sense stationary process by a convolution with a stable filter g that removes the lowest frequencies |ĝ()| ⫽ O().
Thus,one can determine that the wavelet transform Ys (u) ⫽ WX(u, s) is a stationary
process at any fixed scale s. Its spectrum is the same as the spectrum (6.94) of
fractional Brownian motions. If H ⬍ 1, the asymptotic decay of R̂X () indicates that
realizations of X(t) are singular functions; however, it gives no information about
the distribution of these singularities.
As opposed to fractional Brownian motions, general fractal noises have realizations that may include singularities of various types. Such multifractals are
differentiated from realizations of fractional Brownian motions by computing their
singularity spectrum D(␣). For example,the velocity fields of fully developed turbulent flows have been modeled by fractal noises,but the calculation of the singularity
spectrum clearly shows that these flows differ in important ways from fractional
Brownian motions.
Hydrodynamic Turbulence
Fully developed turbulence appears in incompressible flows at high Reynolds numbers. Understanding the properties of hydrodynamic turbulence is a major problem
of modern physics, which remains mostly open despite an intense research effort
since the first theory of Kolmogorov in 1941 [331]. The number of degrees of liberty of three-dimensional turbulence is considerable, which produces extremely
complex spatio-temporal behavior. No formalism is yet able to build a statistical
physics framework based on the Navier-Stokes equations that would enable us to
understand the global behavior of turbulent flows as it is done in thermodynamics.
In 1941, Kolmogorov [331] formulated a statistical theory of turbulence. The
velocity field is modeled as a process V (x) that has increments with variance
E{|V (x ⫹ ⌬) ⫺ V (x)|2 } ∼ 2/3 ⌬2/3 .
6.5 Exercises
The constant is a rate of dissipation of energy per unit of mass and time, which
is supposed to be independent of the location. This indicates that the velocity field
is statistically homogeneous with Lipschitz regularity ␣ ⫽ H ⫽ 1/3. The theory predicts that a one-dimensional trace of a three-dimensional velocity field is a fractal
noise process with stationary increments that have spectrum decays with a power
exponent 2H ⫹ 1 ⫽ 5/3:
R̂V () ⫽
2
H
.
5/3
||
The success of this theory comes from numerous experimental verifications of
this power spectrum decay. However, the theory does not take into account
the existence of coherent structures such as vortices. These phenomena contradict the hypothesis of homogeneity, which is at the root of Kolmogorov’s 1941
theory.
Kolmogorov [332] modified the homogeneity assumption in 1962 by introducing an energy dissipation rate (x) that varies with the spatial location x.This opens
the door to “local stochastic self-similar” multifractal models, first developed by
Mandelbrot [370] to explain energy exchanges between fine-scale structures and
large-scale structures. The spectrum of singularity D(␣) is playing an important role
in testing these models [264]. Calculations with wavelet maxima on turbulent velocity fields [5] show that D(␣) is maximum at 1/3, as predicted by the Kolmogorov
theory. However, D(␣) does not have a support reduced to {1/3}, which verifies
that a turbulent velocity field is not a homogeneous process. Models based on the
wavelet transform have been introduced to explain the distribution of vortices in
turbulent fluids [13, 251, 252].
6.5 EXERCISES
6.1
2 Lipschitz regularity:
(a) Prove that if f is uniformly Lipschitz ␣ on [a, b], then it is pointwise
Lipschitz ␣ at all t0 ∈ [a, b].
(b) Show that f (t) ⫽ t sin t ⫺1 is Lipschitz 1 at all t0 ∈ [⫺1, 1] and verify that
it is uniformly Lipschitz ␣ over [⫺1, 1] only for ␣ ⭐ 1/2. Hint: Consider
the points tn ⫽ (n ⫹ 1/2)⫺1 ⫺1 .
6.2
2 Regularity of derivatives:
(a) Prove that f is uniformly Lipschitz ␣ ⬎ 1 over [a, b] if and only if f ⬘ is
uniformly Lipschitz ␣ ⫺ 1 over [a, b].
(b) Show that f may be pointwise Lipschitz ␣ ⬎ 1 at t0 while f ⬘ is not
pointwise Lipschitz ␣ ⫺ 1 at t0 . Consider f (t) ⫽ t 2 cos t ⫺1 at t ⫽ 0.
6.3
2 Find f (t) that is uniformly Lipschitz 1, but does not satisfy the sufficient
Fourier condition (6.1).
6.4
1 Let f (t) ⫽ cos t and (t) be a wavelet that is symmetric about 0.
0
259
260
CHAPTER 6 Wavelet Zoom
(a) Verify that
√
ˆ
W f (u, s) ⫽ s (s
0 ) cos 0 t.
(b) Find the equations of the curves of wavelet modulus maxima in the timescale plane (u, s). Relate the decay of |W f (u, s)| along these curves to
the number n of vanishing moments of .
6.5
1 Let f (t) ⫽ |t|␣ . Show that W f (u, s) ⫽ s ␣⫹1/2 W f (u/s, 1). Prove that it is not
sufficient to measure the decay of |W f (u, s)| when s goes to zero at u ⫽ 0 in
order to compute the Lipschitz regularity of f at t ⫽ 0.
6.6
3 Let f (t) ⫽ |t|␣ sin |t|⫺ with ␣ ⬎ 0 and  ⬎ 0.What is the pointwise Lipschitz
regularity of f and f ⬘ at t ⫽ 0? Find the equation of the ridge curve in the
(u, s) plane along which the high-amplitude wavelet coefficients |W f (u, s)|
converge to t ⫽ 0 when s goes to zero. Compute the maximum values of ␣
and ␣⬘ such that W f (u, s) satisfies (6.21).
6.7
2 For a complex wavelet, we call lines of constant phase the curves in the
(u, s) plane along which the complex phase of W f (u, s) remains constant
when s varies.
(a) If f (t) ⫽ |t|␣ ,prove that the lines of constant phase converge toward the
singularity at t ⫽ 0 when s goes to zero. Verify this numerically.
(b) Let be a real wavelet and W f (u, s) be the real wavelet transform
of f . Show that the modulus maxima of W f (u, s) correspond to lines
of constant phase of an analytic wavelet transform, which is calculated
with a particular analytic wavelet a that you will specify.
6.8
3 Prove that if f ⫽ 1
[0,⫹⬁) , then the number of modulus maxima of W f (u, s)
at each scale s is larger than or equal to the number of vanishing moments
of .
6.9
2 The spectrum of singularity of the Riemann function
f (t) ⫽
⫹⬁
1
sin n2 t
2
n
n⫽⫺⬁
is defined on its support by D(␣) ⫽ 4␣ ⫺ 2 if ␣ ∈ [1/2, 3/4] and D(3/2) ⫽ 0
[304, 313]. Verify this result numerically by computing this spectrum from
the partition function of a wavelet transform modulus maxima.
6.10
3 Let ⫽ ⫺⬘ where is a positive window of compact support. If f is a
Cantor devil’s staircase, prove that there exist lines of modulus maxima that
converge toward each singularity.
6.11
3 Implement an algorithm that detects oscillating singularities by following
the ridges of an analytic wavelet transform when the scale s decreases. Test
your algorithm on f (t) ⫽ sin t ⫺1 .
6.5 Exercises
6.12
2 Implement an algorithm that reconstructs a signal from the local maxima
of its dyadic wavelet transform with a dual synthesis (6.48) using a conjugategradient algorithm.
6.13
3 Let X[n] ⫽ f [n] ⫹ W [n] be a signal of size N , where W is a Gaussian white
noise of variance 2 . Implement in WAVELAB an estimator of f that thresholds
at T ⫽ the maxima of a dyadic wavelet transform of X.The estimation of f
is reconstructed from the thresholded maxima representation with the dual
synthesis (6.48) implemented with a conjugate-gradient algorithm. Compare
numerically the risk of this estimator with the risk of a thresholding estimator
over the translation-invariant dyadic wavelet transform of X.
6.14
2 Let (t) be a Gaussian of variance 1.
(a) Prove that the Laplacian of a two-dimensional Gaussian
(x1 , x2 ) ⫽
⭸2 (x1 )
⭸2 (x2 )
(x2 ) ⫹ (x1 )
2
⭸x
⭸x22
satisfies the dyadic wavelet condition (5.101) (there is only one wavelet).
(b) Explain why the zero-crossings of this dyadic wavelet transform provide
the locations of multiscale edges in images. Compare the position of
these zero-crossings with the wavelet modulus maxima obtained with
1 (x1 , x2 ) ⫽ ⫺⬘(x1 ) (x2 ) and 2 (x1 , x2 ) ⫽ ⫺(x1 ) ⬘(x2 ).
6.15
2 The covariance of a fractional Brownian motion B (t) is given by (6.86).
H
Show that the wavelet transform at a scale s is stationary by verifying that
2 2H⫹1 ⫹⬁ 2H u1 ⫺ u2
E WBH (u1 , s) WBH (u2 , s) ⫽ ⫺
|t| ⌿
s
⫺ t dt,
2
s
⫺⬁
¯
¯ ⫽ (⫺t).
with ⌿(t) ⫽ (t)
and (t)
X () ⫽
E{X(t)X(t ⫺ )} that is twice differentiable. One can prove that the average
⫺1
number of zero-crossings over an interval of size 1 is ⫺RX ⬘⬘(0) 2 RX (0)
[53]. Let BH (t) be a fractional Brownian motion and a wavelet that is C2 .
Prove that the average numbers, respectively, of zero-crossings and of modulus maxima of WBH (u, s) for u ∈ [0, 1] are proportional to s. Verify this result
numerically.
6.16
2 Let X(t) be a stationary Gaussian process with a covariance R
6.17
2 Implement an algorithm that estimates the Lipschitz regularity ␣ and the
smoothing scale of sharp variation points in one-dimensional signals by
applying the result of Theorem 6.7 on the dyadic wavelet transform maxima.
261
CHAPTER
7
Wavelet Bases
One can construct wavelets such that the dilated and translated family
1
j,n (t) ⫽ √
2j
t ⫺2 jn
2j
( j,n)∈Z2
is an orthonormal basis of L 2 (R). Behind this simple statement lie very different
points of view that open a fruitful exchange between harmonic analysis and discrete
signal processing.
Orthogonal wavelets dilated by 2 j carry signal variations at the resolution
⫺j
2 . The construction of these bases can be related to multiresolution signal approximations. Following this link leads us to an unexpected equivalence
between wavelet bases and conjugate mirror filters used in discrete multirate
filter banks. These filter banks implement a fast orthogonal wavelet transform
that requires only O(N ) operations for signals of size N . The design of conjugate mirror filters also gives new classes of wavelet orthogonal bases including
regular wavelets of compact support. In several dimensions, wavelet bases of
L 2 (Rd ) are constructed with separable products of functions of one variable.
Wavelet bases are also adapted to bounded domains and surfaces with lifting
algorithms.
7.1 ORTHOGONAL WAVELET BASES
Our search for orthogonal wavelets begins with multiresolution
approximations. For
f ∈ L 2 (R), the partial sum of wavelet coefficients ⫹⬁
n⫽⫺⬁ f , j,n j,n can indeed
be interpreted as the difference between two approximations of f at the resolutions
2⫺j⫹1 and 2⫺j . Multiresolution approximations compute the approximation of signals at various resolutions with orthogonal projections on different spaces {Vj }j∈Z .
Section 7.1.3 proves that multiresolution approximations are entirely characterized
by a particular discrete filter that governs the loss of information across resolutions.
These discrete filters provide a simple procedure for designing and synthesizing
orthogonal wavelet bases.
263
264
CHAPTER 7 Wavelet Bases
7.1.1 Multiresolution Approximations
Adapting the signal resolution allows one to process only the relevant details
for a particular task. In computer vision, Burt and Adelson [126] introduced a
multiresolution pyramid that can be used to process a low-resolution image first
and then selectively increase the resolution when necessary. This section formalizes multiresolution approximations, which set the ground for the construction of
orthogonal wavelets.
The approximation of a function f at a resolution 2⫺j is specified by a discrete
grid of samples that provides local averages of f over neighborhoods of size proportional to 2 j . Thus, a multiresolution approximation is composed of embedded grids
of approximation. More formally,the approximation of a function at a resolution 2⫺j
is defined as an orthogonal projection on a space Vj ⊂ L 2 (R). The space Vj regroups
all possible approximations at the resolution 2⫺j . The orthogonal projection of f
is the function fj ∈ Vj that minimizes f ⫺ fj . The following definition, introduced
by Mallat [362] and Meyer [44], specifies the mathematical properties of multiresolution spaces. To avoid confusion, let us emphasize that a scale parameter 2 j is the
inverse of the resolution 2⫺j .
Definition 7.1: Multiresolutions. A sequence {Vj }j∈Z of closed subspaces of L 2 (R) is a
multiresolution approximation if the following six properties are satisfied:
᭙( j, k) ∈ Z2 ,
f (t) ∈ Vj ⇔ f (t ⫺ 2 j k) ∈ Vj ,
᭙j ∈ Z,
᭙j ∈ Z,
Vj⫹1 ⊂ Vj ,
t
f (t) ∈ Vj ⇔ f
∈ Vj⫹1 ,
2
lim Vj ⫽
j→⫹⬁
⫹⬁
⎛
lim Vj ⫽ Closure ⎝
j→⫺⬁
Vj ⫽ {0},
j⫽⫺⬁
⫹⬁
(7.1)
(7.2)
(7.3)
(7.4)
⎞
Vj ⎠ ⫽ L 2 (R),
(7.5)
j⫽⫺⬁
and there exists such that {(t ⫺ n)}n∈Z is a Riesz basis of V0 .
Let us give an intuitive explanation of these mathematical properties. Property
(7.1) means that Vj is invariant by any translation proportional to the scale 2 j . As
we shall see later, this space can be assimilated to a uniform grid with intervals 2 j ,
which characterizes the signal approximation at the resolution 2⫺j . The inclusion
(7.2) is a causality property that proves that an approximation at a resolution 2⫺j
contains all the necessary information to compute an approximation at a coarser
resolution 2⫺j⫺1 . Dilating functions in Vj by 2 enlarges the details by 2 and (7.3)
guarantees that it defines an approximation at a coarser resolution 2⫺j⫺1 . When the
resolution 2⫺j goes to 0 (7.4) implies that we lose all the details of f and
lim PVj f ⫽ 0.
j→⫹⬁
(7.6)
7.1 Orthogonal Wavelet Bases
On the other hand, when the resolution 2⫺j goes ⫹⬁, property (7.5) imposes that
the signal approximation converges to the original signal:
lim f ⫺ PVj f ⫽ 0.
j→⫺⬁
(7.7)
When the resolution 2⫺j increases, the decay rate of the approximation error
f ⫺ PVj f depends on the regularity of f . In Section 9.1.3 we relate this error
to the uniform Lipschitz regularity of f .
The existence of a Riesz basis {(t ⫺ n)}n∈Z of V0 provides a discretization
theorem as explained in Section 3.1.3. The function can be interpreted as a unit
resolution cell; Section 5.1.1 gives the definition of a Riesz basis. It is a family of
linearly independent functions such that there exist B ⭓ A ⬎ 0, which satisfy
⫹⬁
᭙f ∈ V0 ,
A f 2 ⭐
| f (t), (t ⫺ n)|2 ⭐ B f 2 .
(7.8)
n⫽⫺⬁
This energy equivalence guarantees that signal expansions over {(t ⫺ n)}n∈Z are
numerically stable. One may verify that the family {2⫺j/2 (2⫺j t ⫺ n)}n∈Z is a Riesz
basis of Vj with the same Riesz bounds A and B at all scales 2 j . Theorem 3.4 proves
that {(t ⫺ n)}n∈Z is a Riesz basis if and only if
⫹⬁
᭙ ∈ [⫺, ],
A⭐
ˆ ⫹ 2k)|2 ⭐ B.
|(
(7.9)
k⫽⫺⬁
EXAMPLE 7.1: Piecewise Constant Approximations
A simple multiresolution approximation is composed of piecewise constant functions. Space
Vj is the set of all g ∈ L 2 (R) such that g(t) is constant for t ∈ [n2 j , (n ⫹ 1)2 j ) and n ∈ Z. The
approximation at a resolution 2⫺j of f is the closest piecewise constant function on intervals of
size 2 j . The resolution cell can be chosen to be the box window ⫽ 1[0,1) . Clearly, Vj ⊂ Vj⫺1 ,
since functions constant on intervals of size 2 j are also constant on intervals of size 2 j⫺1 .
The verification of the other multiresolution properties is left to the reader. It is often desirable
to construct approximations that are smooth functions, in which case piecewise constant
functions are not appropriate.
EXAMPLE 7.2: Shannon Approximations
Frequency band-limited functions also yield multiresolution approximations. Space Vj is
defined as the set of functions with a Fourier transform support included in [⫺2⫺j , 2⫺j ].
Theorem 3.5 provides an orthonormal basis {(t ⫺ n)}n∈Z of V0 defined by
(t) ⫽
sin t
.
t
All other properties of multiresolution approximation are easily verified.
(7.10)
265
266
CHAPTER 7 Wavelet Bases
The approximation at the resolution 2⫺j of f ∈ L 2 (R) is the function PVj f ∈ Vj that
minimizes PVj f ⫺ f . It is proved in (3.12) that its Fourier transform is obtained with a
frequency filtering:
P
Vj f () ⫽ f̂ () 1[⫺2⫺j ,2⫺j ] ().
This Fourier transform is generally discontinuous at ⫾ 2⫺j , in which case |PVj f (t)| decays
like |t|⫺1 for large |t|, even though f might have a compact support.
EXAMPLE 7.3: Spline Approximations
Polynomial spline approximations construct smooth approximations with fast asymptotic
decay. The space Vj of splines of degree m ⭓ 0 is the set of functions that are m ⫺ 1 times continuously differentiable and equal to a polynomial of degree m on any interval [n2 j , (n ⫹ 1)2 j ]
for n ∈ Z. When m ⫽ 0, it is a piecewise constant multiresolution approximation. When m ⫽ 1,
functions in Vj are piecewise linear and continuous.
A Riesz basis of polynomial splines is constructed with box splines. A box spline of degree
m is computed by convolving the box window 1[0,1] with itself m ⫹ 1 times and centering at 0
or 1/2. Its Fourier transform is
m⫹1
⫺i
ˆ() ⫽ sin(/2)
exp
.
/2
2
(7.11)
If m is even, then ⫽ 1 and has a support centered at t ⫽ 1/2. If m is odd, then ⫽ 0 and
(t) is symmetric about t ⫽ 0. Figure 7.1 displays a cubic box spline m ⫽ 3 and its Fourier
transform. For all m ⭓ 0, one can prove that {(t ⫺ n)}n∈Z is a Riesz basis of V0 by verifying
the condition (7.9). This is done with a closed-form expression for the series (7.19).
^
()
(t)
0.8
1
0.6
0.8
0.6
0.4
0.4
0.2
0.2
0
22
21
0
(a)
1
2
0
FIGURE 7.1
Cubic box spline (a) and its Fourier transform ˆ (b).
210
0
(b)
10
7.1 Orthogonal Wavelet Bases
7.1.2 Scaling Function
The approximation of f at the resolution 2⫺j is defined as the orthogonal projection
PVj f on Vj . To compute this projection, we must find an orthonormal basis of Vj .
Theorem 7.1 orthogonalizes the Riesz basis {(t ⫺ n)}n∈Z and constructs an orthogonal basis of each space Vj by dilating and translating a single function called a
scaling function.To avoid confusing the resolution 2⫺j and the scale 2 j ,in the rest of
this chapter the notion of resolution is dropped and PVj f is called an approximation
at the scale 2 j .
Theorem 7.1. Let {Vj }j∈Z be a multiresolution approximation and be the scaling
function having a Fourier transform
ˆ
()
ˆ
()
⫽
(7.12)
1/2 .
⫹⬁
ˆ ⫹ 2k)|2
|
(
k⫽⫺⬁
Let us denote
1
t ⫺n
j,n (t) ⫽ √
.
2j
2j
The family {j,n }n∈Z is an orthonormal basis of Vj for all j ∈ Z.
Proof. To construct an orthonormal basis, we look for a function ∈ V0 . Thus, it can be
expanded in the basis {(t ⫺ n)}n∈Z :
⫹⬁
(t) ⫽
a[n] (t ⫺ n),
n⫽⫺⬁
which implies that
ˆ
ˆ
()
⫽ â() (),
where â is a 2 periodic Fourier series of finite energy. To compute â we express
¯ ⫽ ∗ (⫺t). For any
the orthogonality of {(t ⫺ n)}n∈Z in the Fourier domain. Let (t)
(n, p) ∈ Z2 ,
(t ⫺ n), (t ⫺ p) ⫽
⫹⬁
⫺⬁
(t ⫺ n) ∗ (t ⫺ p) dt
(7.13)
¯ p ⫺ n).
⫽ (
¯
Thus, {(t ⫺ n)}n∈Z is orthonormal if and only if (n)
⫽ ␦[n]. Computing the Fourier
transform of this equality yields
⫹⬁
ˆ ⫹ 2k)|2 ⫽ 1.
|(
(7.14)
k⫽⫺⬁
2 , and we proved in (3.3) that sampling
¯
ˆ
Indeed, the Fourier transform of (t)
is |()|
a function periodizes its Fourier transform. The property (7.14) is verified if we choose
⫹⬁
â() ⫽
k⫽⫺⬁
⫺1/2
ˆ ⫹ 2k)|2
|(
.
267
268
CHAPTER 7 Wavelet Bases
We saw in (7.9) that the denominator has a strictly positive lower bound, so â is a 2
periodic function of finite energy.
■
Approximation
The orthogonal projection of f over Vj is obtained with an expansion in the scaling
orthogonal basis
⫹⬁
PV j f ⫽
f , j,n j,n .
(7.15)
n⫽⫺⬁
The inner products
aj [n] ⫽ f , j,n (7.16)
provide a discrete approximation at the scale 2 j . We can rewrite them as a
convolution product:
aj [n] ⫽
1
t ⫺2 jn
dt ⫽ f ¯ j (2 j n),
f (t) √
2j
2j
⫺⬁
⫹⬁
(7.17)
√
¯ j (t) ⫽ 2⫺j (2⫺j t). The energy of the Fourier transform ˆ is typically conwith
centrated
√ in [⫺, ],as illustrated by Figure 7.2.As a consequence,the Fourier transˆ ∗ (2 j ) of
¯ j (t) is mostly nonnegligible in [⫺2⫺j , 2⫺j ]. The discrete
form 2 j
approximation aj [n] is therefore a low-pass filtering of f sampled at intervals 2 j .
Figure 7.3 gives a discrete multiresolution approximation at scales 2⫺9 ⭐ 2 j ⭐ 2⫺4 .
EXAMPLE 7.4
For piecewise constant approximations and Shannon multiresolution approximations we have
constructed Riesz bases {(t ⫺ n)}n∈Z that are orthonormal bases, thus ⫽ .
^
(w)
(t)
1
1
0.8
0.6
0.5
0.4
0
0.2
210
25
0
(a)
5
10
0
210
0
(b)
10
FIGURE 7.2
ˆ computed with (7.18) (b).
Cubic spline–scaling function (a) and its Fourier transform
7.1 Orthogonal Wavelet Bases
224
225
226
227
228
229
f(t)
40
20
0
⫺20
0
0.2
0.4
0.6
0.8
1
t
FIGURE 7.3
Discrete multiresolution approximations aj [n] at scales 2 j , computed with cubic splines.
EXAMPLE 7.5
Spline multiresolution approximations admit a Riesz basis constructed with a box spline
of degree m, having a Fourier transform given by (7.11). Inserting this expression in
(7.12) yields
ˆ
()
⫽
exp(⫺i/2)
,
√
m⫹1 S2m⫹2 ()
(7.18)
with
⫹⬁
Sn () ⫽
1
,
n
(
⫹
2k)
k⫽⫺⬁
(7.19)
269
270
CHAPTER 7 Wavelet Bases
and ⫽ 1 if m is even or ⫽ 0 if m is odd. A closed-form expression of S2m⫹2 () is obtained
by computing the derivative of order 2m of the identity
⫹⬁
S2 (2) ⫽
1
1
⫽
.
2
2
(2
⫹
2k)
4
sin
k⫽⫺⬁
linear splines, m ⫽ 1 and
1 ⫹ 2 cos2
,
48 sin4
(7.20)
√
4 3 sin2 (/2)
.
2 1 ⫹ 2 cos2 (/2)
(7.21)
S4 (2) ⫽
which yields
ˆ
()
⫽
ˆ
The cubic spline–scaling function corresponds to m ⫽ 3, and ()
is calculated with (7.18)
by inserting
S8 (2) ⫽
5 ⫹ 30 cos2 ⫹ 30 sin2 cos2
105 28 sin8
70 cos4 ⫹ 2 sin4 cos2 ⫹ 2/3 sin6
⫹
.
105 28 sin8
(7.22)
This cubic spline–scaling function and its Fourier transform are displayed in Figure 7.2. It
has an infinite support but decays exponentially.
7.1.3 Conjugate Mirror Filters
A multiresolution approximation is entirely characterized by the scaling function
that generates an orthogonal basis of each space Vj . We study the properties of
, which guarantee that the spaces Vj satisfy all conditions of a multiresolution
approximation. It is proved that any scaling function is specified by a discrete filter
called a conjugate mirror filter.
Scaling Equation
The multiresolution causality property (7.2) imposes that Vj ⊂ Vj⫺1 ; in particular,
2⫺1/2 (t/2) ∈ V1 ⊂ V0 . Since {(t ⫺ n)}n∈Z is an orthonormal basis of V0 , we can
decompose
⫹⬁
with
t
1
h[n] (t ⫺ n),
√ ( ) ⫽
2 2
n⫽⫺⬁
(7.23)
t
1
h[n] ⫽ √
, (t ⫺ n) .
2
2
(7.24)
This scaling equation relates a dilation of by 2 to its integer translations. The
sequence h[n] will be interpreted as a discrete filter.
7.1 Orthogonal Wavelet Bases
The Fourier transform of both sides of (7.23) yields
1
ˆ
ˆ
(7.25)
(2)
⫽ √ ĥ() ()
2
⫺in . Thus, it is tempting to express ()
ˆ
for ĥ() ⫽ ⫹⬁
directly as a
n⫽⫺⬁ h[n] e
product of dilations of ĥ(). For any p ⭓ 0, (7.25) implies
ˆ ⫺p ).
ˆ ⫺p⫹1 ) ⫽ √1 ĥ(2⫺p ) (2
(7.26)
(2
2
By substitution, we obtain
⎞
⎛
P
⫺p )
ĥ(2
ˆ ⫺P ).
ˆ
⎠ (2
(7.27)
()
⫽⎝
√
2
p⫽1
ˆ
ˆ ⫺P ) ⫽ (0),
ˆ
If ()
is continuous at ⫽ 0, then lim (2
so
P→⫹⬁
ˆ
()
⫽
⫹⬁
ĥ(2⫺p ) ˆ
(0).
√
2
p⫽1
(7.28)
Theorem 7.2 [44, 362] gives necessary and sufficient conditions on ĥ() to
guarantee that this infinite product is the Fourier transform of a scaling function.
Theorem 7.2: Mallat, Meyer. Let ∈ L 2 (R) be an integrable scaling function. The Fourier
series of h[n] ⫽ 2⫺1/2 (t/2), (t ⫺ n) satisfies
᭙ ∈ R,
and
|ĥ()|2 ⫹ |ĥ( ⫹ )|2 ⫽ 2,
√
ĥ(0) ⫽ 2.
(7.29)
(7.30)
Conversely, if ĥ() is 2 periodic and continuously differentiable in a neighborhood of
⫽ 0, if it satisfies (7.29) and (7.30) and if
inf
∈[⫺/2,/2]
|ĥ()| ⬎ 0,
(7.31)
then
ˆ
()
⫽
⫹⬁
ĥ(2⫺p )
√
2
p⫽1
(7.32)
is the Fourier transform of a scaling function ∈ L 2 (R).
Proof. This theorem is a central result and its proof is long and technical. It is divided in
several parts.
Proof of the necessary condition (7.29). The necessary condition is proved to be a consequence of the fact that {(t ⫺ n)}n∈Z is orthonormal. In the Fourier domain, (7.14)
271
272
CHAPTER 7 Wavelet Bases
gives an equivalent condition:
⫹⬁
ˆ ⫹ 2k)|2 ⫽ 1.
|(
᭙ ∈ R,
(7.33)
k⫽⫺⬁
ˆ
ˆ
Inserting ()
⫽ 2⫺1/2 ĥ(/2) (/2)
yields
2
2
⫹ k ˆ
⫹ k ⫽ 2.
ĥ
2
2
k⫽⫺⬁
⫹⬁
Since ĥ() is 2 periodic, separating the even and odd integer terms gives
2 ⫹⬁
2
2 ⫹⬁
2
ˆ
ˆ
⫹ 2p ⫹ ĥ
⫹ ⫹ ⫹ 2p ⫽ 2.
ĥ
2
2
2
2
p⫽⫺⬁
p⫽⫺⬁
Inserting (7.33) for ⬘ ⫽ /2 and ⬘ ⫽ /2 ⫹ proves that
|ĥ(⬘)|2 ⫹ |ĥ(⬘ ⫹ )|2 ⫽ 2.
√
Proof of the necessary condition (7.30). We prove that ĥ(0) ⫽ 2 by showing that
ˆ
ˆ
ˆ
(0)
⫽ 0. Indeed, we know that (0)
⫽ 2⫺1/2 ĥ(0) (0).
More precisely, we verify that
ˆ
|(0)|
⫽ 1 is a consequence of the completeness property (7.5) of multiresolution
approximations.
The orthogonal projection of f ∈ L 2 (R) on Vj is
⫹⬁
PV j f ⫽
f , j,n j,n .
(7.34)
n⫽⫺⬁
Property (7.5) expressed in the time and Fourier domains with the Plancherel formula
implies that
2
lim f ⫺ PVj f 2 ⫽ lim 2 f̂ ⫺ P
Vj f ⫽ 0.
j→⫺⬁
j→⫺⬁
(7.35)
√
⫺j
⫺j
To compute the Fourier transform P
Vj f (), we denote j (t) ⫽ 2 (2 t). Inserting
the convolution expression (7.17) in (7.34) yields
⫹⬁
PVj f (t) ⫽
¯ j (2 j n) j (t ⫺ 2 j n) ⫽ j f
n⫽⫺⬁
⫹⬁
f ¯ j (2 j n) ␦(t ⫺ 2 j n).
n⫽⫺⬁
√
ˆ ∗ (2 j ). A uniform sampling has a
¯ j (t) is 2 j f̂ ()
The Fourier transform of f
periodized Fourier transform calculated in (3.3), and thus,
ˆ j
P
Vj f () ⫽ (2 )
⫹⬁
f̂
k⫽⫺⬁
⫺
2k
2j
2k
ˆ ∗ 2 j ⫺ j
.
2
(7.36)
ˆ j )|2 .
Let us choose f̂ ⫽ 1[⫺,] . For j ⬍ 0 and ∈ [⫺, ], (7.36) gives P
Vj f () ⫽ |(2
The mean-square convergence (7.35) implies that
2
ˆ j )|2 d ⫽ 0.
lim
1 ⫺ |(2
j→⫺⬁ ⫺
ˆ
ˆ j )| ⫽ |(0)|
ˆ
Since is integrable, ()
is continuous and thus limj→⫺⬁ |(2
⫽ 1.
7.1 Orthogonal Wavelet Bases
We now prove that the function , having a Fourier transform given by (7.32), is a scaling
function. This is divided in two intermediate results.
Proof that {(t ⫺ n)}n∈Z is orthonormal. Observe first that the infinite
product
√
ˆ
(7.32) converges and that |()|
⭐ 1 because (7.29) implies that |ĥ()| ⭐ 2.The Parseval
formula gives
(t), (t ⫺ n) ⫽
⫹⬁
⫺⬁
(t) ∗ (t ⫺ n) dt ⫽
1
2
⫹⬁
⫺⬁
2 in
ˆ
|()|
e d.
Verifying that {(t ⫺ n)}n∈Z is orthonormal is equivalent to showing that
⫹⬁
2 in
ˆ
|()|
e d ⫽ 2 ␦[n].
⫺⬁
This result is obtained by considering the functions
ˆ k () ⫽
k
ĥ(2⫺p )
1[⫺2k ,2k ] (),
√
2
p⫽1
and computing the limit, as k increases to ⫹⬁, of the integrals
Ik [n] ⫽
⫹⬁
⫺⬁
ˆ k ()|2 ein d ⫽
|
2k k
|ĥ(2⫺p )|2 in
e d.
2
⫺2k p⫽1
First, let us show that Ik [n] ⫽ 2␦[n] for all k ⭓ 1. To do this, we divide Ik [n] into two
integrals:
Ik [n] ⫽
0
k
|ĥ(2⫺p )|2
⫺2k p⫽1
2
in
e
d ⫹
2k k
0
|ĥ(2⫺p )|2 in
e d.
2
p⫽1
Let us make the change of variable ⬘ ⫽ ⫹ 2k in the first integral. Since ĥ() is 2
periodic,when p ⬍ k,then |ĥ(2⫺p [⬘ ⫺ 2k ])|2 ⫽ |ĥ(2⫺p ⬘)|2 . When k ⫽ p the hypothesis
(7.29) implies that
|ĥ(2⫺k [⬘ ⫺ 2k ])|2 ⫹ |ĥ(2⫺k ⬘)|2 ⫽ 2.
For k ⬎ 1, the two integrals of Ik [n] become
Ik [n] ⫽
2k k⫺1
|ĥ(2⫺p )|2 in
e d.
2
p⫽1
0
(7.37)
⫺p
2 in is 2k periodic we obtain I [n] ⫽ I
Since k⫺1
k
k⫺1 [n], and by induction
p⫽1 |ĥ(2 )| e
Ik [n] ⫽ I1 [n]. Writing (7.37) for k ⫽ 1 gives
I1 [n] ⫽
2
ein d ⫽ 2 ␦[n],
0
which verifies that Ik [n] ⫽ 2␦[n], for all k ⭓ 1.
273
274
CHAPTER 7 Wavelet Bases
We shall now prove that ˆ ∈ L 2 (R). For all ∈ R,
ˆ k ()|2 ⫽
lim |
k→⬁
⬁
|ĥ(2⫺p )|2
2
p⫽1
2
ˆ
⫽ |()|
.
The Fatou lemma (A.1) on positive functions proves that
⫹⬁
⫺⬁
2
ˆ
|()|
d ⭐ lim
⫹⬁
k→⬁ ⫺⬁
|ˆ k ()|2 d ⫽ 2,
(7.38)
because Ik [0] ⫽ 2 for all k ⭓ 1. Since
2 in
ˆ
|()|
e ⫽ lim |ˆ k ()|2 ein ,
k→⬁
we finally verify that
⫹⬁
⫺⬁
2 in
ˆ
|()|
e d ⫽ lim
⫹⬁
k→⬁ ⫺⬁
|ˆ k ()|2 ein d ⫽ 2 ␦[n]
(7.39)
by applying the dominated convergence theorem (A.1).This requires verifying the upperbound condition (A.1). This is done in our case by proving the existence of a constant C
such that
ˆ
2
ˆ
.
|k ()|2 ein ⫽ |ˆ k ()|2 ⭐ C |()|
(7.40)
2 is an integrable function.
ˆ
Indeed, we showed in (7.38) that |()|
ˆ k () ⫽ 0. For
The existence of C ⬎ 0 satisfying (7.40) is trivial for || ⬎ 2k since
ˆ
ˆ
|| ⭐ 2k since ()
⫽ 2⫺1/2 ĥ(/2) (/2),
it follows that
2
ˆ
ˆ k ()|2 |(2
ˆ ⫺k )|2 .
|()|
⫽ |
2 ⭓ 1/C for
ˆ
Therefore, to prove (7.40) for || ⭐ 2k , it is sufficient to show that |()|
∈ [⫺, ].
Let us first study the neighborhood of ⫽ 0. Since ĥ() is continuously differentiable
in this neighborhood and since |ĥ()|2 ⭐ 2 ⫽ |ĥ(0)|2 ,the functions |ĥ()|2 and loge |ĥ()|2
have derivatives that vanish at ⫽ 0. It follows that there exists ⬎ 0 such that
᭙|| ⭐ ,
Thus, for || ⭐
⎡
2
ˆ
|()|
⫽ exp⎣
0 ⭓ loge
⫹⬁
loge
p⫽1
|ĥ()|2
⭓ ⫺||.
2
⎤
|ĥ(2⫺p )|2 ⎦
⭓ e⫺|| ⭓ e⫺ .
2
(7.41)
Now let us analyze the domain || ⬎ . To do this we take an integer l such that 2⫺l ⬍ .
Condition (7.31) proves that K ⫽ inf ∈[⫺/2,/2] |ĥ()| ⬎ 0, so if || ⭐ ,
2
ˆ
|()|
⫽
l
|ĥ(2⫺p )|2 ˆ ⫺l 2 K 2l ⫺ 1
2 ⭓ l e ⫽ .
2
2
C
p⫽1
7.1 Orthogonal Wavelet Bases
This last result finishes the proof of inequality (7.40).Applying the dominated convergence
theorem (A.1) proves (7.39) and that {(t ⫺ n)}n∈Z is orthonormal. A simple change of
variable shows that {j,n }j∈Z is orthonormal for all j ∈ Z.
Proof that {Vj }j∈Z is a multiresolution approximation. To verify that is a scaling function, we must show that the spaces Vj generated by {j,n }j∈Z define a multiresolution
approximation. The multiresolution properties (7.1) and (7.3) are clearly true. The
causality Vj⫹1 ⊂ Vj is verified by showing that for any p ∈ Z,
⫹⬁
j⫹1,p ⫽
h[n ⫺ 2p] j,n .
n⫽⫺⬁
This equality is proved later in (7.107). Since all vectors of a basis of Vj⫹1 can be
decomposed in a basis of Vj , it follows that Vj⫹1 ⊂ Vj .
To prove the multiresolution property (7.4) we must show that any f ∈ L 2 (R) satisfies
lim PVj f ⫽ 0.
(7.42)
j→⫹⬁
Since {j,n }n∈Z is an orthonormal basis of Vj ,
⫹⬁
PVj f 2 ⫽
| f , j,n |2 .
n⫽⫺⬁
Suppose first that f is bounded by A and has a compact support included in [2 J , 2 J ]. The
constants A and J may be arbitrarily large. It follows that
⫹⬁
⫺j
| f , j,n | ⭐ 2
2
! ⫹⬁ J
2
J
n⫽⫺⬁ ⫺2
n⫽⫺⬁
⫺j 2
⭐2 A
"2
⫺j
| f (t)| |(2 t ⫺ n)| dt
! ⫹⬁ J
2
J
n⫽⫺⬁ ⫺2
"2
⫺j
|(2 t ⫺ n)| dt
.
Applying the Cauchy-Schwarz inequality to 1 ⫻ |(2⫺j t ⫺ n)| yields
⫹⬁
| f , j,n |2 ⭐ A2 2 J ⫹1
n⫽⫺⬁
⭐ A2 2 J ⫹1
⫹⬁
2J
J
n⫽⫺⬁ ⫺2
|(2⫺j t ⫺ n)|2 2⫺j dt
|(t)|2 dt ⫽ A2 2 J ⫹1
Sj
⫹⬁
⫺⬁
|(t)|2 1Sj (t) dt,
with Sj ⫽ ∪n∈Z [n ⫺ 2 J ⫺j , n ⫹ 2 J ⫺j ] for j ⬎ J . For t ∈
/ Z, we obviously have 1Sj (t) → 0 for
j → ⫹⬁. The dominated convergence theorem (A.1) applied to |(t)|2 1Sj (t) proves that
the integral converges to 0 and thus,
⫹⬁
| f , j,n |2 ⫽ 0.
lim
j→⫹⬁
n⫽⫺⬁
Property (7.42) is extended to any f ∈ L 2 (R) by using the density in L 2 (R) of bounded
function with a compact support, and Theorem A.5.
275
276
CHAPTER 7 Wavelet Bases
To prove the last multiresolution property (7.5) we must show that for any f ∈ L 2 (R),
lim f ⫺ PVj f 2 ⫽ lim f 2 ⫺ PVj f 2 ⫽ 0.
j→⫺⬁
j→⫺⬁
(7.43)
We consider functions f that have a Fourier transform f̂ that has a compact support
included in [⫺2 J , 2 J ] for J large enough. We proved in (7.36) that the Fourier
transform of PVj f is
ˆ j
P
Vj f () ⫽ (2 )
⫹⬁
$
#
f̂ ⫺ 2⫺j 2k ˆ ∗ 2 j ⫺ 2⫺j 2k .
k⫽⫺⬁
If j ⬍ ⫺J , then the supports of f̂ ( ⫺ 2⫺j 2k) are disjoint for different k, so
1
PVj f ⫽
2
2
⫹⬁
⫺⬁
1
⫹
2
ˆ j )|4 d
| f̂ ()|2 |(2
⫹⬁ ⫹⬁
⫺⬁
(7.44)
#
$
ˆ j )|2 |ˆ 2 j ⫺ 2⫺j 2k |2 d.
| f̂ ⫺ 2⫺j 2k |2 |(2
k⫽⫺⬁
k ⫽0
We have already observed that |()| ⭐ 1 and (7.41) proves that if is sufficiently small
then |()| ⭓ e⫺|| , so
ˆ
lim |()|
⫽ 1.
→0
ˆ j )|4 ⭐ | f̂ ()|2 and limj→⫺⬁ |(2
ˆ j )|4 | f̂ ()|2 ⫽ | f̂ ()|2 , one can
Since | f̂ ()|2 |(2
apply the dominated convergence theorem (A.1), to prove that
⫹⬁
lim
j→⫺⬁ ⫺⬁
ˆ j )|4 d ⫽
| f̂ ()| |(2
2
⫹⬁
⫺⬁
| f̂ ()|2 d ⫽ f 2 .
(7.45)
The operator PVj is an orthogonal projector, so PVj f ⭐ f . With (7.44) and (7.45),
this implies that limj→⫺⬁ ( f 2 ⫺ PVj f 2 ) ⫽ 0 and thus verifies (7.43). This property is
extended to any f ∈ L 2 (R) by using the density in L 2 (R) of functions having compactly
■
supported Fourier transforms and the result of Theorem A.5.
Discrete filters that have transfer functions that satisfy (7.29) are called conjugate mirror filters. As we shall see in Section 7.3, they play an important role in
discrete signal processing; they make it possible to decompose discrete signals in
separate frequency bands with filter banks. One difficulty of the proof is showing
that the infinite cascade of convolutions that is represented in the Fourier domain
by the product (7.32) does converge to a decent function in L 2 (R). The sufficient
condition (7.31) is not necessary to construct a scaling function, but it is always
satisfied in practical designs of conjugate mirror filters. It cannot just be removed
as shown by the example ĥ() ⫽ cos(3/2), which satisfies all other conditions. In
this case,a simple calculation shows that ⫽ 1/3 1[⫺3/2,3/2] . Clearly {(t ⫺ n)}n∈Z is
not orthogonal, so is not a scaling function. However, the condition (7.31) may be
replaced by a weaker but more technical necessary and sufficient condition proved
by Cohen [16, 167].
7.1 Orthogonal Wavelet Bases
EXAMPLE 7.6
For a Shannon multiresolution approximation, ˆ ⫽ 1[⫺,] . Thus, we derive from (7.32) that
᭙ ∈ [⫺, ],
√
ĥ() ⫽ 2 1[⫺/2,/2] ().
EXAMPLE 7.7
For piecewise constant approximations, ⫽ 1[0,1] . Since h[n] ⫽ 2⫺1/2 (t/2), (t ⫺ n), it
follows that
h[n] ⫽
2⫺1/2
0
if n ⫽ 0, 1
otherwise
(7.46)
EXAMPLE 7.8
Polynomial splines of degree m correspond to a conjugate mirror filter ĥ() that is calculated
ˆ
from ()
with (7.25):
ˆ
√ (2)
ĥ() ⫽ 2
.
ˆ
()
Inserting (7.18) yields
(7.47)
%
⫺i
S2m⫹2 ()
ĥ() ⫽ exp
,
2
22m⫹1 S2m⫹2 (2)
(7.48)
where ⫽ 0 if m is odd and ⫽ 1 if m is even. For linear splines m ⫽ 1, so (7.20) implies that
√
ĥ() ⫽ 2
1 ⫹ 2 cos2 (/2)
1 ⫹ 2 cos2
1/2
cos2
2
.
(7.49)
For cubic splines, the conjugate mirror filter is calculated by inserting (7.22) in (7.48).
Figure 7.4 gives the graph of |ĥ()|2 . The impulse responses h[n] of these filters have an
infinite support but an exponential decay. For m odd, h[n] is symmetric about n ⫽ 0. Table 7.1
gives the coefficients h[n] above 10⫺4 for m ⫽ 1, 3.
2
1
0
22
0
2
FIGURE 7.4
The solid line gives |ĥ()|2 on [⫺, ] for a cubic spline multiresolution. The dotted line
corresponds to |ĝ()|2 .
277
278
CHAPTER 7 Wavelet Bases
Table 7.1 Conjugate Mirror Filters h[n] for Linear Splines m ⫽ 1 and Cubic
Splines m ⫽ 3
n
h[n]
m⫽1
0
1, ⫺1
2, ⫺2
3, ⫺3
4, ⫺4
5, ⫺5
6, ⫺6
7, ⫺7
8, ⫺8
9, ⫺9
10, ⫺10
11, ⫺11
0.817645956
0.397296430
⫺0.069101020
⫺0.051945337
0.016974805
0.009990599
⫺0.003883261
⫺0.002201945
0.000923371
0.000511636
⫺0.000224296
⫺0.000122686
m⫽3
0
1, ⫺1
2, ⫺2
3, ⫺3
4, ⫺4
0.766130398
0.433923147
⫺0.050201753
⫺0.110036987
0.032080869
m⫽3
n
h[n]
5, ⫺5
6, ⫺6
7, ⫺7
8, ⫺8
9, ⫺9
10, ⫺10
11, ⫺11
12, ⫺12
13, ⫺13
14, ⫺14
15, ⫺15
16, ⫺16
17, ⫺17
18, ⫺18
19, ⫺19
20, ⫺20
0.042068328
⫺0.017176331
⫺0.017982291
0.008685294
0.008201477
⫺0.004353840
⫺0.003882426
0.002186714
0.001882120
⫺0.001103748
⫺0.000927187
0.000559952
0.000462093
⫺0.000285414
⫺0.000232304
0.000146098
Note: The coefficients below 10⫺4 are not given.
7.1.4 In Which Orthogonal Wavelets Finally Arrive
Orthonormal wavelets carry the details necessary to increase the resolution of a
signal approximation.The approximations of f at the scales 2 j and 2 j⫺1 are,respectively, equal to their orthogonal projections on Vj and Vj⫺1 . We know that Vj
is included in Vj⫺1 . Let Wj be the orthogonal complement of Vj in Vj⫺1 :
Vj⫺1 ⫽ Vj ⊕ Wj .
(7.50)
The orthogonal projection of f on Vj⫺1 can be decomposed as the sum of orthogonal
projections on Vj and Wj :
PVj⫺1 f ⫽ PVj f ⫹ PWj f .
(7.51)
The complement PWj f provides the “details” of f that appear at the scale 2 j⫺1 but
that disappear at the coarser scale 2 j . Theorem 7.3 [44, 362] proves that one can
construct an orthonormal basis of Wj by scaling and translating a wavelet .
Theorem 7.3: Mallat, Meyer. Let be a scaling function and h the corresponding
conjugate mirror filter. Let be the function having a Fourier transform
1 ˆ
ˆ
()
⫽ √ ĝ
,
(7.52)
2
2
2
with
ĝ() ⫽ e⫺i ĥ∗ ( ⫹ ).
(7.53)
7.1 Orthogonal Wavelet Bases
Let us denote
1
j,n (t) ⫽ √
2j
t ⫺2 jn
.
2j
For any scale 2 j , {j,n }n∈Z is an orthonormal basis of Wj . For all scales, {j,n }( j,n)∈Z2 is
an orthonormal basis of L 2 (R).
Proof. Let us prove first that ˆ can be written as the product (7.52). Necessarily, (t/2) ∈
W1 ⊂ V0 .Thus,it can be decomposed in {(t ⫺ n)}n∈Z ,which is an orthogonal basis of V0 :
⫹⬁
t
g[n] (t ⫺ n),
⫽
2
n⫽⫺⬁
(7.54)
t
1
g[n] ⫽ √
, (t ⫺ n) .
2
2
(7.55)
1
√
2
with
The Fourier transform of (7.54) yields
1
ˆ
ˆ
(2)
⫽ √ ĝ() ().
2
(7.56)
Lemma 7.1 gives necessary and sufficient conditions on ĝ for designing an orthogonal
wavelet.
Lemma 7.1. The family {j,n }n∈Z is an orthonormal basis of Wj if and only if
|ĝ()|2 ⫹ |ĝ( ⫹ )|2 ⫽ 2
(7.57)
ĝ() ĥ∗ () ⫹ ĝ( ⫹ ) ĥ∗ ( ⫹ ) ⫽ 0.
(7.58)
and
The lemma is proved for j ⫽ 0 from which it is easily extended to j ⫽ 0 with an
appropriate scaling. As in (7.14), one can verify that {(t ⫺ n)}n∈Z is orthonormal if and
only if
⫹⬁
᭙ ∈ R,
I () ⫽
ˆ ⫹ 2k)|2 ⫽ 1.
|(
(7.59)
k⫽⫺⬁
ˆ
ˆ
Since ()
⫽ 2⫺1/2 ĝ(/2) (/2)
and ĝ() is 2 periodic,
⫹⬁
I () ⫽
|ĝ
ˆ ⫹ k |2
⫹ k |2 |
2
2
k⫽⫺⬁
⫽ |ĝ
2
We know that
⫹⬁
ˆ ⫹ 2p |2 ⫹ |ĝ ⫹ |2
|
|ˆ
⫹ ⫹ 2p |2 .
2
2
2
p⫽⫺⬁
p⫽⫺⬁
⫹⬁
|2
⫹⬁
ˆ
p⫽⫺⬁ |( ⫹ 2p)|
2 ⫽ 1, so (7.59) is equivalent to (7.57).
279
280
CHAPTER 7 Wavelet Bases
The space W0 is orthogonal to V0 if and only if {(t ⫺ n)}n∈Z and {(t ⫺ n)}n∈Z are
orthogonal families of vectors. This means that for any n ∈ Z,
¯
(t), (t ⫺ n) ⫽ (n)
⫽ 0.
¯
¯
ˆ
The Fourier transform of (t)
is ()
ˆ ∗ (). The sampled sequence (n)
is zero if
its Fourier series computed with (3.3) satisfies
⫹⬁
ˆ ⫹ 2k) ˆ ∗ ( ⫹ 2k) ⫽ 0.
(
᭙ ∈ R,
(7.60)
k⫽⫺⬁
ˆ
ˆ
ˆ
ˆ
By inserting
()
⫽ 2⫺1/2 ĝ(/2) (/2)
and ()
⫽ 2⫺1/2 ĥ(/2) (/2)
in this equa⫹⬁
ˆ ⫹ 2k)|2 ⫽ 1, we prove as before that (7.60) is equivalent
tion, since
|
(
k⫽⫺⬁
to (7.58).
√
We must finally verify that V⫺1 ⫽ V0 ⊕ W0 . Knowing that { 2(2t ⫺ n)}n∈Z is an
orthogonal basis of V⫺1 , it is equivalent to show that for any a[n] ∈ 2 (Z) there exist
b[n] ∈ 2 (Z) and c[n] ∈ 2 (Z) such that
⫹⬁
a[n]
⫹⬁
⫹⬁
√
2 (2[t ⫺ 2⫺1 n]) ⫽
b[n] (t ⫺ n) ⫹
c[n] (t ⫺ n).
n⫽⫺⬁
n⫽⫺⬁
(7.61)
n⫽⫺⬁
This is done by relating b̂() and ĉ() to â(). The Fourier transform of (7.61) yields
1 ˆ
ˆ
ˆ
⫽ b̂() ()
⫹ ĉ() ().
√ â
2
2
2
ˆ
ˆ
ˆ
ˆ
Inserting ()
⫽ 2⫺1/2 ĝ(/2) (/2)
and ()
⫽ 2⫺1/2 ĥ(/2) (/2)
in this equation
shows that it is necessarily satisfied if
â
2
⫽ b̂() ĥ
2
⫹ ĉ() ĝ
2
.
(7.62)
Let us define
b̂(2) ⫽
1
[â() ĥ∗ () ⫹ â( ⫹ ) ĥ∗ ( ⫹ )]
2
ĉ(2) ⫽
1
[â() ĝ ∗ () ⫹ â( ⫹ ) ĝ ∗ ( ⫹ )].
2
and
When calculating the right side of (7.62), we verify that it is equal to the left side by
inserting (7.57), (7.58), and using
|ĥ()|2 ⫹ |ĥ( ⫹ )|2 ⫽ 2.
(7.63)
Since b̂() and ĉ() are 2 periodic they are the Fourier series of two sequences b[n]
and c[n] that satisfy (7.61). This finishes the proof of the lemma.
The formula (7.53)
ĝ() ⫽ e⫺i ĥ∗ ( ⫹ )
satisfies (7.57) and (7.58) because of (7.63). Thus, we derive from Lemma 7.1 that
{j,n }( j,n)∈Z2 is an orthogonal basis of Wj .
We complete the proof of the theorem by verifying that {j,n }( j,n)∈Z2 is an orthogonal
basis of L 2 (R). Observe first that the detail spaces {Wj }j∈Z are orthogonal. Indeed, Wj is
7.1 Orthogonal Wavelet Bases
orthogonal to Vj and Wl ⊂ Vl⫺1 ⊂ Vj for j ⬍ l. Thus, Wj and Wl are orthogonal. We can
also decompose
L 2 (R) ⫽ ⊕⫹⬁
j⫽⫺⬁ Wj .
(7.64)
Indeed, Vj⫺1 ⫽ Wj ⊕ Vj , and we verify by substitution that for any L ⬎ J ,
J
VL ⫽ ⊕j⫽L⫺1 Wj ⊕ VJ .
(7.65)
Since {Vj }j∈Z is a multiresolution approximation, VL and VJ tend, respectively, to L 2 (R)
and {0} when L and J go, respectively, to ⫺⬁ and ⫹⬁, which implies (7.64). Therefore, a
union of orthonormal bases of all Wj is an orthonormal basis of L 2 (R).
■
The proof of the theorem shows that ĝ is the Fourier series of
t
1
g[n] ⫽ √
, (t ⫺ n) ,
2
2
which are the decomposition coefficients of
⫹⬁
t
1
g[n] (t ⫺ n).
⫽
√
2
2
n⫽⫺⬁
(7.66)
(7.67)
Calculating the inverse Fourier transform of (7.53) yields
g[n] ⫽ (⫺1)1⫺n h[1 ⫺ n].
(7.68)
This mirror filter plays an important role in the fast wavelet transform algorithm.
EXAMPLE 7.9
Figure 7.5 displays the cubic spline wavelet and its Fourier transform ˆ calculated by
ˆ
inserting in (7.52) the expressions (7.18) and (7.48) of ()
and ĥ(). The properties of
^
| ()|
(t)
1
1
0.8
0.5
0.6
0
0.4
20.5
21
0.2
25
0
(a)
5
0
220
210
0
(b)
FIGURE 7.5
Battle-Lemarié cubic spline wavelet (a) and its Fourier transform modulus (b).
10
20
281
282
CHAPTER 7 Wavelet Bases
1
0.8
0.6
0.4
0.2
0
22
0
2
FIGURE 7.6
ˆ j )|2 for the cubic spline Battle-Lemarié wavelet, with 1 ⭐ j ⭐ 5 and ∈ [⫺, ].
Graph of |(2
this Battle-Lemarié spline wavelet are further studied in Section 7.2.2. Like most orthogonal
wavelets, the energy of ˆ is essentially concentrated in [⫺2, ⫺] ∪ [, 2]. For any that
generates an orthogonal basis of L 2 (R), one can verify that
⫹⬁
ˆ j )|2 ⫽ 1.
|(2
᭙ ∈ R ⫺ {0},
j⫽⫺⬁
This is illustrated in Figure 7.6.
The orthogonal projection of a signal f in a “detail” space Wj is obtained with a
partial expansion in its wavelet basis:
⫹⬁
P Wj f ⫽
f , j,n j,n .
n⫽⫺⬁
Thus, a signal expansion in a wavelet orthogonal basis can be viewed as an
aggregation of details at all scales 2 j that go from 0 to ⫹⬁:
⫹⬁
f⫽
⫹⬁
⫹⬁
PW j f ⫽
j⫽⫺⬁
f , j,n j,n .
j⫽⫺⬁ n⫽⫺⬁
Figure 7.7 gives the coefficients of a signal decomposed in the cubic spline wavelet
orthogonal basis. The calculations are performed with the fast wavelet transform
algorithm of Section 7.3. The up or down Diracs give the amplitudes of positive
or negative wavelet coefficients at a distance 2 j n at each scale 2 j . Coefficients are
nearly zero at fine scales where the signal is locally regular.
Wavelet Design
Theorem 7.3 constructs a wavelet orthonormal basis from any conjugate mirror filter
ĥ(). This gives a simple procedure for designing and building wavelet orthogonal
7.2 Orthogonal Wavelet Bases
Approximation
225
226
227
228
229
f (t)
40
20
0
220
0
0.2
0.4
0.6
0.8
1
t
FIGURE 7.7
Wavelet coefficients dj [n] ⫽ f , j,n calculated at scales 2 j with the cubic spline wavelet. Each
up or down Dirac gives the amplitude of a positive or negative wavelet coefficient. At the top is
the remaining coarse-signal approximation aJ [n] ⫽ f , J ,n for J ⫽ ⫺5.
bases. Conversely, we may wonder whether all wavelet orthonormal bases are associated to a multiresolution approximation and a conjugate mirror filter. If we impose
that has a compact support, then Lemarié [52] proved that necessarily corresponds to a multiresolution approximation. It is possible, however, to construct
pathological wavelets that decay like |t|⫺1 at infinity, and that cannot be derived
from any multiresolution approximation. Section 7.2 describes important classes of
wavelet bases and explains how to design ĥ to specify the support, the number of
vanishing moments, and the regularity of .
283
284
CHAPTER 7 Wavelet Bases
7.2 CLASSES OF WAVELET BASES
7.2.1 Choosing a Wavelet
Most applications of wavelet bases exploit their ability to efficiently approximate
particular classes of functions with few nonzero wavelet coefficients.This is true not
only for data compression but also for noise removal and fast calculations.Therefore,
the design of must be optimized to produce a maximum number of wavelet coefficients f , j,n that are close to zero. A function f has few nonnegligible wavelet
coefficients if most of the fine-scale (high-resolution) wavelet coefficients are small.
This depends mostly on the regularity of f , the number of vanishing moments of ,
and the size of its support. To construct an appropriate wavelet from a conjugate
mirror filter h[n], we relate these properties to conditions on ĥ().
Vanishing Moments
Let us recall that has p vanishing moments if
⫹⬁
t k (t) dt ⫽ 0 for 0 ⭐ k ⬍ p.
⫺⬁
(7.69)
This means that is orthogonal to any polynomial of degree p ⫺ 1. Section 6.1.3
proves that if f is regular and has enough vanishing moments, then the wavelet
coefficients | f , j,n | are small at fine scales 2 j . Indeed, if f is locally Ck , then over
a small interval it is well approximated by a Taylor polynomial of degree k. If k ⬍ p,
then wavelets are orthogonal to this Taylor polynomial, and thus produce smallamplitude coefficients at fine scales. Theorem 7.4 relates the number of vanishing
ˆ
moments of to the vanishing derivatives of ()
at ⫽ 0 and to the number of
zeroes of ĥ() at ⫽ . It also proves that polynomials of degree p ⫺ 1 are then
reproduced by the scaling functions.
Theorem 7.4: Vanishing Moments. Let and be a wavelet and a scaling function
that generate an orthogonal basis. Suppose that |(t)| ⫽ O((1 ⫹ t 2 )⫺p/2⫺1 ) and |(t)| ⫽
O((1 ⫹ t 2 )⫺p/2⫺1 ). The four following statements are equivalent:
1. The wavelet has p vanishing moments.
ˆ
2. ()
and its first p ⫺ 1 derivatives are zero at ⫽ 0.
3. ĥ() and its first p ⫺ 1 derivatives are zero at ⫽ .
4. For any 0 ⭐ k ⬍ p,
⫹⬁
qk (t) ⫽
nk (t ⫺ n) is a polynomial of degree k.
(7.70)
n⫽⫺⬁
ˆ
ˆ
Proof. The decay of |(t)| and |(t)| implies that ()
and ()
are p times continuously differentiable. The kth -order derivative ˆ (k) () is the Fourier transform of (⫺it)k (t). Thus,
ˆ (k) (0) ⫽
⫹⬁
We derive that (1) is equivalent to (2).
⫺⬁
(⫺it)k (t) dt.
7.2 Classes of Wavelet Bases
Theorem 7.3 proves that
√
ˆ
ˆ
2 (2)
⫽ e⫺i ĥ∗ ( ⫹ ) ().
ˆ
Since (0)
⫽ 0, by differentiating this expression we prove that (2) is equivalent to (3).
Let us now prove that (4) implies (1). Since is orthogonal to {(t ⫺ n)}n∈Z , it is also
orthogonal to the polynomials qk for 0 ⭐ k ⬍ p. This family of polynomials is a basis of
the space of polynomials of degree at most p ⫺ 1. Thus, is orthogonal to any polynomial
of degree p ⫺ 1 and in particular to t k for 0 ⭐ k ⬍ p. This means that has p vanishing
moments.
To verify that (1) implies (4) we suppose that has p vanishing moments,and for k ⬍ p,
we evaluate qk (t) defined in (7.70). This is done by computing its Fourier transform:
ˆ
q̂k () ⫽ ()
⫹⬁
k
⫹⬁
d
ˆ
nk exp(⫺in) ⫽ (i)k ()
exp(⫺in).
dk n⫽⫺⬁
n⫽⫺⬁
Let ␦(k) be the distribution that is the kth -order derivative of a Dirac, defined in
Section A.7 in the Appendix. The Poisson formula (2.4) proves that
⫹⬁
q̂k () ⫽ (i)k
1 ˆ
()
␦(k) ( ⫺ 2l).
2
l⫽⫺⬁
(7.71)
With several integrations by parts, we verify the distribution equality
ˆ
ˆ
␦(k) ( ⫺ 2l) ⫹
()
␦(k) ( ⫺ 2l) ⫽ (2l)
k⫺1
akm,l ␦(m) ( ⫺ 2l),
(7.72)
m⫽0
ˆ (m) (2l)}0⭐m⭐k .
where akm,l is a linear combination of the derivatives {
k
ˆ (m) (2l) ⫽ 0 if 0 ⭐ m ⬍ p. For
For l ⫽ 0, let us prove that am,l ⫽ 0 by showing that
any P ⬎ 0, (7.27) implies
ˆ
ˆ ⫺P )
()
⫽ (2
P
ĥ(2⫺p )
.
√
2
p⫽1
(7.73)
Since has p vanishing moments, we showed in (3) that ĥ() has a zero of order p at
ˆ
⫽⫾ . But ĥ() is also 2 periodic, so (7.73) implies that ()
⫽ O(| ⫺ 2l|p ) in the
ˆ (m) (2l) ⫽ 0 if m ⬍ p.
neighborhood of ⫽ 2l, for any l ⫽ 0. Thus,
Since akm,l ⫽ 0 and (2l) ⫽ 0 when l ⫽ 0, it follows from (7.72) that
ˆ
()
␦(k) ( ⫺ 2l) ⫽ 0
for l ⫽ 0.
The only term that remains in the summation (7.71) is l ⫽ 0, and inserting (7.72) yields
1
q̂k () ⫽ (i)
2
k
ˆ
(0)
␦(k) () ⫹
k⫺1
akm,0 ␦(m) ()
.
m⫽0
The inverse Fourier transform of ␦(m) () is (2)⫺1 (⫺it)m , and Theorem 7.2 proves that
ˆ
(0)
⫽ 0. Thus, the inverse Fourier transform qk of q̂k is a polynomial of degree k.
■
285
286
CHAPTER 7 Wavelet Bases
The hypothesis (4) is called the Fix-Strang condition [446]. The polynomials
{qk }0⭐k⬍p define a basis of the space of polynomials of degree p ⫺ 1. The Fix-Strang
condition proves that has p vanishing moments if and only if any polynomial
of degree p ⫺ 1 can be written as a linear expansion of {(t ⫺ n)}n∈Z . The decomposition coefficients of the polynomials qk do not have a finite energy because
polynomials do not have a finite energy.
Size of Support
If f has an isolated singularity at t0 and if t0 is inside the support of j,n (t) ⫽
2⫺j/2 (2⫺j t ⫺ n), then f , j,n may have a large amplitude. If has a compact
support of size K ,at each scale 2 j there are K wavelets j,n with a support including
t0 . To minimize the number of high-amplitude coefficients we must reduce the
support size of . Theorem 7.5 relates the support size of h to the support of
and .
Theorem 7.5: Compact Support. The scaling function has a compact support if and
only if h has a compact support and their supports are equal. If the support of h and
is [N1 , N2 ], then the support of is [(N1 ⫺ N2 ⫹ 1)/2 , (N2 ⫺ N1 ⫹ 1)/2].
Proof. If has a compact support, since
t
1
h[n] ⫽ √
, (t ⫺ n) ,
2
2
we derive that h also has a compact support. Conversely, the scaling function satisfies
⫹⬁
t
1
h[n] (t ⫺ n).
⫽
√
2
2
n⫽⫺⬁
(7.74)
If h has a compact support then one can prove [194] that has a compact support. The
proof is not reproduced here.
To relate the support of and h, we suppose that h[n] is nonzero for N1 ⭐ n ⭐ N2 and
that has a compact support [K1 , K2 ]. The support of (t/2) is [2K1 , 2K2 ]. The sum at
the right of (7.74) is a function with a support of [N1 ⫹ K1 , N2 ⫹ K2 ]. The equality proves
that the support of is [K1 , K2 ] ⫽ [N1 , N2 ].
Let us recall from (7.68) and (7.67) that
1
√
2
⫹⬁
⫹⬁
t
g[n] (t ⫺ n) ⫽
(⫺1)1⫺n h[1 ⫺ n] (t ⫺ n).
⫽
2
n⫽⫺⬁
n⫽⫺⬁
If the supports of and h are equal to [N1 , N2 ], the sum on the right side has a support
equal to [N1 ⫺ N2 ⫹ 1, N2 ⫺ N1 ⫹ 1]. Thus, has a support equal to [(N1 ⫺ N2 ⫹ 1)/2,
(N2 ⫺ N1 ⫹ 1)/2].
■
If h has a finite impulse response in [N1 , N2 ],Theorem 7.5 proves that has a
support of size N2 ⫺ N1 centered at 1/2. To minimize the size of the support, we
must synthesize conjugate mirror filters with as few nonzero coefficients as possible.
7.2 Classes of Wavelet Bases
Support versus Moments
The support size of a function and the number of vanishing moments are a priori
independent. However, we shall see in Theorem 7.7 that the constraints imposed
on orthogonal wavelets imply that if has p vanishing moments, then its support
is at least of size 2p ⫺ 1. Daubechies wavelets are optimal in the sense that they
have a minimum size support for a given number of vanishing moments. When
choosing a particular wavelet, we face a trade-off between the number of vanishing
moments and the support size. If f has few isolated singularities and is very regular
between singularities, we must choose a wavelet with many vanishing moments
to produce a large number of small wavelet coefficients f , j,n . If the density of
singularities increases, it might be better to decrease the size of its support at the
cost of reducing the number of vanishing moments. Indeed, wavelets that overlap
the singularities create high-amplitude coefficients.
The multiwavelet construction of Geronimo,Hardin,and Massupust [271] offers
more design flexibility by introducing several scaling functions and wavelets.
Exercise 7.16 gives an example. Better trade-off can be obtained between the multiwavelets supports and their vanishing moments [447]. However, multiwavelet
decompositions are implemented with a slightly more complicated filter bank
algorithm than a standard orthogonal wavelet transform.
Regularity
The regularity of has mostly a cosmetic influence on the error introduced by
thresholding or quantizing the wavelet coefficients. When reconstructing a signal
from its wavelet coefficients
⫹⬁
⫹⬁
f ⫽
f , j,n j,n ,
j⫽⫺⬁ n⫽⫺⬁
an error added to a coefficient f , j,n will add the wavelet component j,n to
the reconstructed signal. If is smooth, then j,n is a smooth error. For imagecoding applications, a smooth error is often less visible than an irregular error, even
though they have the same energy. Better-quality images are obtained with wavelets
that are continuously differentiable than with the discontinuous Haar wavelet.
Theorem 7.6 due to Tchamitchian [454] relates the uniform Lipschitz regularity
of and to the number of zeroes of ĥ() at ⫽ .
Theorem 7.6: Tchamitchian. Let ĥ() be a conjugate mirror filter with p zeroes at and
that satisfies the sufficient conditions of Theorem 7.2. Let us perform the factorization
p
√
1 ⫹ ei
l̂().
ĥ() ⫽ 2
2
If sup∈R |l̂()| ⫽ B, then and are uniformly Lipschitz ␣ for
␣ ⬍ ␣0 ⫽ p ⫺ log2 B ⫺ 1.
(7.75)
287
288
CHAPTER 7 Wavelet Bases
Proof. This result is proved by showing that there exist C1 ⬎ 0 and C2 ⬎ 0 such that for all
∈R
ˆ
|()|
⭐ C1 (1 ⫹ ||)⫺p⫹log2 B
(7.76)
ˆ
|()|
⭐ C2 (1 ⫹ ||)⫺p⫹log2 B .
(7.77)
The
Lipschitz regularity of and is then derived from Theorem 6.1, which shows that
& ⫹⬁
if ⫺⬁ (1 ⫹ ||␣ ) | f̂ ()| d ⬍ ⫹⬁, then f is uniformly Lipschitz ␣.
⫺1/2 ĥ(2⫺j ). One can verify that
ˆ
We proved in (7.32) that ()
⫽ ⫹⬁
j⫽1 2
⫹⬁
1 ⫹ exp(i2⫺j )
2
j⫽1
⫽
1 ⫺ exp(i)
,
i
thus,
ˆ
|()|
⫽
⫹⬁
|1 ⫺ exp(i)|p |l̂(2⫺j )|.
||p
j⫽1
(7.78)
⫹⬁
√
Let us now compute an upper bound for
j⫽1 |l̂(2
⫺j )|. At ⫽ 0, we have ĥ(0) ⫽
2
so l̂(0) ⫽ 1. Since ĥ() is continuously differentiable at ⫽ 0, l̂() is also continuously
differentiable at ⫽ 0. Thus, we derive that there exists ⬎ 0 such that if || ⬍ , then
|l̂()| ⭐ 1 ⫹ K ||. Consequently,
sup
⫹⬁
||⭐ j⫽1
|l̂(2⫺j )| ⭐ sup
⫹⬁
||⭐ j⫽1
(1 ⫹ K |2⫺j |) ⭐ eK .
(7.79)
If || ⬎ , there exists J ⭓ 1 such that 2 J ⫺1 ⭐ || ⭐ 2 J , and we decompose
⫹⬁
l̂(2⫺j ) ⫽
j⫽1
J
j⫽1
|l̂(2⫺j )|
⫹⬁
|l̂(2⫺j⫺J )|.
(7.80)
j⫽1
Since sup∈R |l̂()| ⫽ B, inserting (7.79) yields for || ⬎
⫹⬁
l̂(2⫺j ) ⭐ B J eK ⫽ eK 2 J log2 B .
(7.81)
j⫽1
Since 2 J ⭐ ⫺1 2||, this proves that
᭙ ∈ R,
|2|log2 B l̂(2⫺j ) ⭐ eK 1 ⫹ log B .
2
j⫽1
⫹⬁
ˆ
Equation (7.76) is derived from (7.78) and this last inequality. Since |(2)|
⫽
ˆ
2⫺1/2 |ĥ( ⫹ )| |()|,
(7.77) is obtained from (7.76).
■
This theorem proves that if B ⬍ 2 p⫺1 , then ␣0 ⬎ 0. It means that and are
uniformly continuous. For any m ⬎ 0, if B ⬍ 2 p⫺1⫺m , then ␣0 ⬎ m, so and are m
times continuously differentiable. Theorem 7.4 shows that the number p of zeros
of ĥ() at is equal to the number of vanishing moments of . A priori, we are
7.2 Classes of Wavelet Bases
not guaranteed that increasing p will improve the wavelet regularity since B might
increase as well. However, for important families of conjugate mirror filters such
as splines or Daubechies filters, B increases more slowly than p, which implies
that wavelet regularity increases with the number of vanishing moments. Let us
emphasize that the number of vanishing moments and the regularity of orthogonal
wavelets are related but it is the number of vanishing moments and not the regularity
that affects the amplitude of the wavelet coefficients at fine scales.
7.2.2 Shannon, Meyer, Haar, and Battle-Lemarié Wavelets
We study important classes of wavelets with Fourier transforms that are derived
from the general formula proved in Theorem 7.3,
⫺i
1
ˆ
ˆ
ˆ() ⫽ √1 ĝ
⫽ √ exp
ĥ∗
⫹
.
(7.82)
2
2
2
2
2
2
2
Shannon Wavelet
The Shannon wavelet is constructed from the Shannon multiresolution approximation,which approximates functions by their
√ restriction to low-frequency intervals. It
ˆ ⫽ 1[⫺,] and ĥ() ⫽ 2 1[⫺/2,/2] () for ∈ [⫺, ]. We derive
corresponds to
from (7.82) that
exp (⫺i/2) if ∈ [⫺2, ⫺] ∪ [, 2]
ˆ
()
⫽
(7.83)
0
otherwise,
and thus,
(t) ⫽
sin 2(t ⫺ 1/2) sin (t ⫺ 1/2)
⫺
.
2(t ⫺ 1/2)
(t ⫺ 1/2)
ˆ
This wavelet is C⬁ but has a slow asymptotic time decay. Since ()
is zero in
the neighborhood of ⫽ 0, all its derivatives are zero at ⫽ 0. Thus,Theorem 7.4
implies that has an infinite number of vanishing moments.
ˆ
Since ()
has a compact support we know that (t) is C⬁ . However, |(t)|
ˆ
decays only like |t|⫺1 at infinity because ()
is discontinuous at ⫾ and ⫾ 2.
Meyer Wavelets
A Meyer wavelet [375] is a frequency band-limited function that has a Fourier transform that is smooth, unlike the Fourier transform of the Shannon wavelet. This
smoothness provides a much faster asymptotic decay in time. These wavelets are
constructed with conjugate mirror filters ĥ() that are Cn and satisfy
√
2 if ∈ [⫺/3, /3]
ĥ() ⫽
(7.84)
0
if ∈ [⫺, ⫺2/3] ∪ [2/3, ].
The only degree of freedom is the behavior of ĥ() in the transition bands
[⫺2/3, ⫺/3] ∪ [/3, 2/3]. It must satisfy the quadrature condition
|ĥ()|2 ⫹ |ĥ( ⫹ )|2 ⫽ 2,
(7.85)
289
290
CHAPTER 7 Wavelet Bases
and to obtain Cn junctions at || ⫽ /3 and || ⫽ 2/3, the n first derivatives must
vanish at these abscissa. One can
construct such functions that are C⬁ .
⫹⬁
ˆ
The scaling function ()
⫽ p⫽1 2⫺1/2 ĥ(2⫺p ) has a compact support and one
can verify that
'
2⫺1/2 ĥ(/2) if || ⭐ 4/3
ˆ() ⫽
(7.86)
0
if || ⬎ 4/3.
The resulting wavelet (7.82) is
⎧
0
⎪
⎪
⎪
⎪
⎨2⫺1/2 ĝ(/2)
ˆ
()
⫽ ⫺1/2
⎪
exp(⫺i/2) ĥ(/4)
2
⎪
⎪
⎪
⎩
0
if || ⭐ 2/3
if 2/3 ⭐ || ⭐ 4/3
if 4/3 ⭐ || ⭐ 8/3
(7.87)
if || ⬎ 8/3.
The functions and are C⬁ because their Fourier transforms have a compact
ˆ
support. Since ()
⫽ 0 in the neighborhood of ⫽ 0, all its derivatives are zero at
⫽ 0, which proves that has an infinite number of vanishing moments.
ˆ are also Cn .The discontinuities of the (n ⫹ 1)th derivative
If ĥ is Cn ,then ˆ and
of ĥ are generally at the junction of the transition band || ⫽ /3 , 2/3, in which
case one can show that there exists A such that
|(t)| ⭐ A (1 ⫹ |t|)⫺n⫺1
and
|(t)| ⭐ A (1 ⫹ |t|)⫺n⫺1 .
Although the asymptotic decay of is fast when n is large, its effective numerical
decay may be relatively slow, which is reflected by the fact that A is quite large. As
a consequence, a Meyer wavelet transform is generally implemented in the Fourier
domain. Section 8.4.2 relates these wavelet bases to lapped orthogonal transforms
applied in the Fourier domain. One can prove [19] that there exists no orthogonal
wavelet that is C⬁ and has an exponential decay.
EXAMPLE 7.10
To satisfy the quadrature condition (7.85), one can verify that ĥ in (7.84) may be defined on
the transition bands by
3||
ĥ() ⫽ 2 cos

⫺1
2
√
for
|| ∈ [/3, 2/3],
where (x) is a function that goes from 0 to 1 on the interval [0, 1] and satisfies
᭙x ∈ [0, 1],
(x) ⫹ (1 ⫺ x) ⫽ 1.
(7.88)
An example due to Daubechies [19] is
(x) ⫽ x 4 (35 ⫺ 84 x ⫹ 70 x 2 ⫺ 20 x 3 ).
(7.89)
The resulting ĥ() has n ⫽ 3 vanishing derivatives at || ⫽ /3 , 2/3. Figure 7.8 displays the
corresponding wavelet .
7.2 Classes of Wavelet Bases
^
| ()|
(t)
1
1
0.8
0.5
0.6
0
0.4
20.5
21
0.2
25
0
(a)
5
0
210
25
0
(b)
5
10
FIGURE 7.8
Meyer wavelet (a) and its Fourier transform modulus computed with (7.89) (b).
Haar Wavelets
The Haar basis is obtained with a multiresolution of piecewise constant functions.
The scaling function is ⫽ 1[0,1] . The filter h[n] given in (7.46) has two nonzero
coefficients equal to 2⫺1/2 at n ⫽ 0 and n ⫽ 1. Thus,
⫹⬁
t
1 1
(⫺1)1⫺n h[1 ⫺ n] (t ⫺ n) ⫽ √ (t ⫺ 1) ⫺ (t) ,
⫽
√
2
2
2
n⫽⫺⬁
so
⎧
⎨⫺1 if 0 ⭐ t ⬍ 1/2
(7.90)
(t) ⫽
1 if 1/2 ⭐ t ⬍ 1
⎩
0 otherwise.
The Haar wavelet has the shortest support among all orthogonal wavelets. It is not
well adapted to approximating smooth functions because it has only one vanishing
moment.
Battle-Lemarié Wavelets
Polynomial spline wavelets introduced by Battle [99] and Lemarié [345] are comˆ
puted from spline multiresolution approximations. The expressions of ()
and
ĥ() are given, respectively, by (7.18) and (7.48). For splines of degree m, ĥ()
and its first m derivatives are zero at ⫽ . Theorem 7.4 derives that has m ⫹ 1
vanishing moments. It follows from (7.82) that
%
S2m⫹2 (/2 ⫹ )
exp(⫺i/2)
ˆ
()
⫽
.
m⫹1
S2m⫹2 () S2m⫹2 (/2)
This wavelet has an exponential decay. Since it is a polynomial spline of degree
m, it is m ⫺ 1 times continuously differentiable. Polynomial spline wavelets are less
291
292
CHAPTER 7 Wavelet Bases
(t)
(t)
2
1.5
1
1
0.5
0
0
24
22
0
(a)
2
4
21
24
22
0
(b)
2
4
FIGURE 7.9
Linear spline Battle-Lemarié scaling function (a) and wavelet (b).
regular than Meyer wavelets but have faster time asymptotic decay. For m odd, is
symmetric about 1/2. For m even,it is antisymmetric about 1/2. Figure 7.5 gives the
graph of the cubic spline wavelet corresponding to m ⫽ 3. For m ⫽ 1, Figure 7.9
displays linear splines and . The properties of these wavelets are further studied
in [15, 106, 164].
7.2.3 Daubechies Compactly Supported Wavelets
Daubechies wavelets have a support of minimum size for any given number p of
vanishing moments.Theorem 7.5 proves that wavelets of compact support are computed with finite impulse-response conjugate mirror filters h.We consider real causal
filters h[n], which implies that ĥ is a trigonometric polynomial:
N ⫺1
ĥ() ⫽
h[n] e⫺in .
n⫽0
To ensure that has p vanishing moments,Theorem 7.4 shows that ĥ must have a
zero of order p at ⫽ . To construct a trigonometric polynomial of minimal size,
we factor (1 ⫹ e⫺i ) p , which is a minimum-size polynomial having p zeros at ⫽ :
√ 1 ⫹ e⫺i p
ĥ() ⫽ 2
R(e⫺i ).
(7.91)
2
The difficulty is to design a polynomial R(e⫺i ) of minimum degree m such that ĥ
satisfies
|ĥ()|2 ⫹ |ĥ( ⫹ )|2 ⫽ 2.
(7.92)
As a result, h has N ⫽ m ⫹ p ⫹ 1 nonzero coefficients. Theorem 7.7 by Daubechies
[194] proves that the minimum degree of R is m ⫽ p ⫺ 1.
7.2 Classes of Wavelet Bases
Theorem 7.7: Daubechies. A real conjugate mirror filter h, such that ĥ() has p zeroes
at ⫽ , has at least 2p nonzero coefficients. Daubechies filters have 2p nonzero
coefficients.
Proof. The proof is constructive and computes Daubechies filters. Since h[n] is real, |ĥ()|2
is an even function and can be written as a polynomial in cos . Thus, |R(e⫺i )|2 defined
in (7.91) is a polynomial in cos that we can also write as a polynomial P(sin2 (/2)),
2p 2 |ĥ()|2 ⫽ 2 cos
P sin
.
2
2
(7.93)
The quadrature condition (7.92) is equivalent to
(1 ⫺ y) p P( y) ⫹ y p P(1 ⫺ y) ⫽ 1,
(7.94)
for any y ⫽ sin2 (/2) ∈ [0, 1]. To minimize the number of nonzero terms of the finite
Fourier series ĥ(), we must find the solution P( y) ⭓ 0 of minimum degree, which is
obtained with the Bezout theorem on polynomials.
■
Theorem 7.8: Bezout. Let Q1 ( y) and Q2 ( y) be two polynomials of degrees n1 and n2
with no common zeroes. There exist two unique polynomials P1 ( y) and P2 ( y) of degrees
n2 ⫺ 1 and n1 ⫺ 1 such that
P1 ( y) Q1 ( y) ⫹ P2 ( y) Q2 ( y) ⫽ 1.
(7.95)
The proof of this classical result is in [19]. Since Q1 ( y) ⫽ (1 ⫺ y)p and Q2 ( y) ⫽ yp
are two polynomials of degree p with no common zeros,the Bezout theorem proves
that there exist two unique polynomials P1 ( y) and P2 ( y) such that
(1 ⫺ y)p P1 ( y) ⫹ y p P2 ( y) ⫽ 1.
The reader can verify that P2 ( y) ⫽ P1 (1 ⫺ y) ⫽ P(1 ⫺ y) with
p⫺1 p⫺1⫹k
P( y) ⫽
y k.
k
(7.96)
k⫽0
Clearly, P( y) ⭓ 0 for y ∈ [0, 1]. Thus, P( y) is the polynomial of minimum degree
satisfying (7.94) with P( y) ⭓ 0.
Now we need to construct a minimum-degree polynomial
m
R(e⫺i ) ⫽
rk e⫺ik ⫽ r0
k⫽0
m
(1 ⫺ ak e⫺i )
k⫽0
such that |R(e⫺i )|2 ⫽ P(sin2 (/2)). Since its coefficients are real,R∗ (e⫺i ) ⫽ R(ei ),
and thus,
|R(e
⫺i 2
)| ⫽ R(e
⫺i
) R(e ) ⫽ P
i
2 ⫺ ei ⫺ e⫺i
⫽ Q(e⫺i ).
4
(7.97)
This factorization is solved by extending it to the whole complex plane with the
variable z ⫽ e⫺i :
m
2 ⫺ z ⫺ z ⫺1
⫺1
2
⫺1
R(z) R(z ) ⫽ r0
(1 ⫺ ak z) (1 ⫺ ak z ) ⫽ Q(z) ⫽ P
.
(7.98)
4
k⫽0
293
294
CHAPTER 7 Wavelet Bases
Let us compute the roots of Q(z). Since Q(z) has real coefficients if ck is a root,
then ck∗ is also a root, and since it is a function of z ⫹ z ⫺1 if ck is a root, then
1/ck and thus 1/ck∗ are also roots. To design R(z) that satisfies (7.98), we choose
each root ak of R(z) among a pair (ck , 1/ck ) and include a∗k as a root to obtain
real coefficients. This procedure yields a polynomial of minimum degree m ⫽ p ⫺ 1,
with r02 ⫽ Q(0) ⫽ P(1/2) ⫽ 2 p⫺1 . The resulting filter h of minimum size has N ⫽
p ⫹ m ⫹ 1 ⫽ 2p nonzero coefficients.
Among all possible factorizations,the minimum-phase solution R(ei ) is obtained
by choosing ak among (ck , 1/ck ) to be inside the unit circle |ak | ⭐ 1 [51]. The
resulting causal filter h has an energy maximally concentrated at small abscissa
n ⭓ 0. It is a Daubechies filter of order p.
The constructive proof of this theorem synthesizes causal conjugate mirror filters
of size 2p. Table 7.2 gives the coefficients of these Daubechies filters for 2 ⭐ p ⭐
10. Theorem 7.9 derives that Daubechies wavelets calculated with these conjugate
mirror filters have a support of minimum size.
Theorem 7.9: Daubechies. If is a wavelet with p vanishing moments that generates an
orthonormal basis of L 2 (R), then it has a support of size larger than or equal to 2p ⫺ 1.
A Daubechies wavelet has a minimum-size support equal to [⫺p ⫹ 1, p]. The support of
the corresponding scaling function is [0, 2p ⫺ 1].
This theorem is a direct consequence of Theorem 7.7. The support of the
wavelet, and that of the scaling function, are calculated with Theorem 7.5. When
p ⫽ 1 we get the Haar wavelet. Figure 7.10 displays the graphs of and for
p ⫽ 2, 3, 4.
The regularity of and is the same since (t) is a finite linear combination of (2t ⫺ n). However, this regularity is difficult to estimate precisely. Let B ⫽
sup∈R |R(e⫺i )| where R(e⫺i ) is the trigonometric polynomial defined in (7.91).
Theorem 7.6 proves that is at least uniformly Lipschitz ␣ for ␣ ⬍ p ⫺ log2 B ⫺ 1. For
Daubechies wavelets,B increases more slowly than p,and Figure 7.10 shows indeed
that the regularity of these wavelets increases with p. Daubechies and Lagarias [198]
have established a more precise technique that computes the exact Lipschitz regularity of . For p ⫽ 2 the wavelet is only Lipschitz 0.55, but for p ⫽ 3 it is Lipschitz
1.08, which means that it is already continuously differentiable. For p large, and
are uniformly Lipschitz ␣, for ␣ of the order of 0.2 p [168].
Symmlets
Daubechies wavelets are very asymmetric because they are constructed by selecting
the minimum-phase square root of Q(e⫺i ) in (7.97). One can show [51] that filters
corresponding to a minimum-phase square root have their energy optimally concentrated near the starting point of their support. Thus, they are highly nonsymmetric,
which yields very asymmetric wavelets.
To obtain a symmetric or antisymmetric wavelet, the filter h must be symmetric
or antisymmetric with respect to the center of its support, which means that ĥ()
has a linear complex phase. Daubechies proved [194] that the Haar filter is the
Table 7.2
Daubechies Filters for Wavelets with p Vanishing Moments
295
n
hp [n]
n
hp [n]
n
hp [n]
p⫽2
0
1
2
3
0.482962913145
0.836516303738
0.224143868042
⫺0.129409522551
8
9
10
11
⫺0.031582039317
0.000553842201
0.004777257511
⫺0.001077301085
p⫽3
0
1
2
3
4
5
0.332670552950
0.806891509311
0.459877502118
⫺0.135011020010
⫺0.085441273882
0.035226291882
p⫽4
0
1
2
3
4
5
6
7
0.230377813309
0.714846570553
0.630880767930
⫺0.027983769417
⫺0.187034811719
0.030841381836
0.032883011667
⫺0.010597401785
0
1
2
3
4
5
6
7
8
9
10
11
12
13
0.077852054085
0.396539319482
0.729132090846
0.469782287405
⫺0.143906003929
⫺0.224036184994
0.071309219267
0.080612609151
⫺0.038029936935
⫺0.016574541631
0.012550998556
0.000429577973
⫺0.001801640704
0.000353713800
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
0.604823123690
0.657288078051
0.133197385825
⫺0.293273783279
⫺0.096840783223
0.148540749338
0.030725681479
⫺0.067632829061
0.000250947115
0.022361662124
⫺0.004723204758
⫺0.004281503682
0.001847646883
0.000230385764
⫺0.000251963189
0.000039347320
p⫽5
0
1
2
3
4
5
6
7
8
9
0.160102397974
0.603829269797
0.724308528438
0.138428145901
⫺0.242294887066
⫺0.032244869585
0.077571493840
⫺0.006241490213
⫺0.012580751999
0.003335725285
p⫽8
p⫽6
0
1
2
3
4
5
6
7
0.111540743350
0.494623890398
0.751133908021
0.315250351709
⫺0.226264693965
⫺0.129766867567
0.097501605587
0.027522865530
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0.054415842243
0.312871590914
0.675630736297
0.585354683654
⫺0.015829105256
⫺0.284015542962
0.000472484574
0.128747426620
⫺0.017369301002
⫺0.04408825393
0.013981027917
0.008746094047
⫺0.004870352993
⫺0.000391740373
0.000675449406
⫺0.000117476784
p⫽9
0
1
0.038077947364
0.243834674613
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
0.026670057901
0.188176800078
0.527201188932
0.688459039454
0.281172343661
⫺0.249846424327
⫺0.195946274377
0.127369340336
0.093057364604
⫺0.071394147166
⫺0.029457536822
0.033212674059
0.003606553567
⫺0.010733175483
0.001395351747
0.001992405295
⫺0.000685856695
⫺0.000116466855
0.000093588670
⫺0.000013264203
p⫽7
p ⫽ 10
296
CHAPTER 7 Wavelet Bases
(t)
(t)
(t)
1.5
1.5
1.5
1
1
1
0.5
0.5
0.5
0
0
0
20.5
0
2
1
3
20.5
0
1
2
(t)
3
4
0
2
(t)
2
2
1
1
4
6
(t)
1.5
1
0.5
0
0
0
21
20.5
21
22
21
20.5
0
1
p 52
2
22 21
0
1
p 53
2
21
22
0
2
p 54
4
FIGURE 7.10
Daubechies scaling function and wavelet with p vanishing moments.
only real compactly supported conjugate mirror filter that has a linear phase. The
Daubechies symmlet filters are obtained by optimizing the choice of the square
root R(e⫺i ) of Q(e⫺i ) to obtain an almost linear phase.The resulting wavelets still
have a minimum support [⫺p ⫹ 1, p] with p vanishing moments, but they are more
symmetric, as illustrated by Figure 7.11 for p ⫽ 8. The coefficients of the symmlet
filters are in WAVELAB. Complex conjugate mirror filters with a compact support
and a linear phase can be constructed [352], but they produce complex wavelet
coefficients that have real and imaginary parts that are redundant when the signal
is real.
Coiflets
For an application in numerical analysis, Coifman asked Daubechies [194] to construct a family of wavelets that have p vanishing moments and a minimum-size
support, with scaling functions that also satisfy
⫹⬁
⫹⬁
(t) dt ⫽ 1 and
t k (t) dt ⫽ 0 for 1 ⭐ k ⬍ p.
(7.99)
⫺⬁
⫺⬁
Such scaling functions are useful in establishing precise quadrature formulas. If f is
Ck in the neighborhood of 2 J n with k ⬍ p, then a Taylor expansion of f up to order
k shows that
2⫺J /2 f , J ,n ≈ f (2 J n) ⫹ O(2(k⫹1)J ).
(7.100)
7.2 Classes of Wavelet Bases
(t)
(t)
1
1
0.5
0.5
0
0
20.5
21
20.5
0
5
10
25
15
0
5
(a)
(t)
(t)
1.5
1.5
1
1
0.5
0.5
0
0
20.5
20.5
0
5
10
21
15
25
0
5
(b)
FIGURE 7.11
Daubechies (a) and symmlet (b) scaling functions and wavelets with p ⫽ 8 vanishing
moments.
Thus,at a fine scale 2 J ,the scaling coefficients are closely approximated by the signal
samples.The order of approximation increases with p.The supplementary condition
(7.99) requires increasing the support of ; the resulting coiflet has a support of
size 3p ⫺ 1 instead of 2p ⫺ 1 for a Daubechies wavelet.The corresponding conjugate
mirror filters are tabulated in WAVELAB.
Audio Filters
The first conjugate mirror filters with finite impulse response were constructed
in 1986 by Smith and Barnwell [443] in the context of perfect filter bank reconstruction, explained in Section 7.3.2. These filters satisfy the quadrature condition
|ĥ()|2 ⫹ |ĥ( ⫹ )|2 ⫽ 2, which
is necessary and sufficient for filter bank recon√
struction. However, ĥ(0) ⫽ 2, so the infinite product of such filters does not yield
a wavelet basis of L 2 (R). Instead of imposing any vanishing moments, Smith and
Barnwell [443], and later Vaidyanathan and Hoang [470], designed their filters to
297
298
CHAPTER 7 Wavelet Bases
√
reduce the size of the transition band,where |ĥ()| decays from nearly 2 to nearly
0 in the neighborhood of ⫾ /2. This constraint is important in optimizing the
transform code of audio signals (see Section 10.3.3). However, many cascades of
these filters exhibit wild behavior. The Vaidyanathan-Hoang filters are tabulated in
WAVELAB. Many other classes of conjugate mirror filters with finite impulse response
have been constructed [69, 79]. Recursive conjugate mirror filters may also be
designed [300] to minimize the size of the transition band for a given number of
zeroes at ⫽ .These filters have a fast but noncausal recursive implementation for
signals of finite size.
7.3 WAVELETS AND FILTER BANKS
Decomposition coefficients in a wavelet orthogonal basis are computed with a fast
algorithm that cascades discrete convolutions with h and g,and subsamples the output. Section 7.3.1 derives this result from the embedded structure of multiresolution
approximations. A direct filter bank analysis is performed in Section 7.3.2, which
gives more general perfect reconstruction conditions on the filters. Section 7.3.3
shows that perfect reconstruction filter banks decompose signals in a basis of 2 (Z).
This basis is orthogonal for conjugate mirror filters.
7.3.1 Fast Orthogonal Wavelet Transform
We describe a fast filter bank algorithm that computes the orthogonal wavelet
coefficients of a signal measured at a finite resolution. A fast wavelet transform
decomposes successively each approximation PVj f into a coarser approximation
PVj⫹1 f , plus the wavelet coefficients carried by PW j⫹1 f . In the other direction,
the reconstruction from wavelet coefficients recovers each PVj f from PVj⫹1 f and
PW j⫹1 f .
Since {j,n }n∈Z and {j,n }n∈Z are orthonormal bases of Vj and Wj ,the projection
in these spaces is characterized by
a j [n] ⫽ f , j,n and d j [n] ⫽ f , j,n .
Theorem 7.10 [360, 361] shows that these coefficients are calculated with a cascade
of discrete convolutions and subsamplings. We denote x̄[n] ⫽ x[⫺n] and
x[ p] if n ⫽ 2p
x̌[n] ⫽
(7.101)
0
if n ⫽ 2p ⫹ 1.
Theorem 7.10: Mallat. At the decomposition,
⫹⬁
aj⫹1 [ p] ⫽
h[n ⫺ 2p] aj [n] ⫽ aj h̄[2p],
(7.102)
g[n ⫺ 2p] aj [n] ⫽ aj ḡ[2p].
(7.103)
n⫽⫺⬁
⫹⬁
dj⫹1 [ p] ⫽
n⫽⫺⬁
7.3 Wavelets and Filter Banks
At the reconstruction,
⫹⬁
⫹⬁
aj [ p] ⫽
h[ p ⫺ 2n] aj⫹1 [n] ⫹
n⫽⫺⬁
g[ p ⫺ 2n] dj⫹1 [n]
n⫽⫺⬁
(7.104)
⫽ ǎj⫹1 h[ p] ⫹ ďj⫹1 g[ p].
Proof of (7.102). Any j⫹1,p ∈ Vj⫹1 ⊂ Vj can be decomposed in the orthonormal basis
{j,n }n∈Z of Vj :
⫹⬁
j⫹1,p ⫽
j⫹1,p , j,n j,n .
(7.105)
n⫽⫺⬁
With the change of variable t⬘ ⫽ 2⫺j t ⫺ 2p, we obtain
j⫹1,p , j,n ⫽
⫹⬁
√
t ⫺ 2 j⫹1 p 1
j ∗ t ⫺2 n
dt
√
2 j⫹1
2j
2j
1
2 j⫹1
⫹⬁
1 t ∗
⫽
(t ⫺ n ⫹ 2p) dt
√
2
2
⫺⬁
1 t ⫽ √
, (t ⫺ n ⫹ 2p) ⫽ h[n ⫺ 2p].
2 2
⫺⬁
(7.106)
Thus, (7.105) implies that
⫹⬁
j⫹1,p ⫽
h[n ⫺ 2p] j,n .
(7.107)
n⫽⫺⬁
Computing the inner product of f with the vectors on each side of this equality yields
(7.102).
Proof of (7.103). Since j⫹1,p ∈ Wj⫹1 ⊂ Vj , it can be decomposed as
⫹⬁
j⫹1,p ⫽
j⫹1,p , j,n j,n .
n⫽⫺⬁
As in (7.106), the change of variable t⬘ ⫽ 2⫺j t ⫺ 2p proves that
t
1
j⫹1,p , j,n ⫽ √
, (t ⫺ n ⫹ 2p) ⫽ g[n ⫺ 2p],
2
2
(7.108)
and thus,
⫹⬁
j⫹1,p ⫽
g[n ⫺ 2p] j,n .
n⫽⫺⬁
Taking the inner product with f on each side gives (7.103).
(7.109)
299
300
CHAPTER 7 Wavelet Bases
Proof of (7.104). Since Wj⫹1 is the orthogonal complement of Vj⫹1 in Vj , the union of the
two bases {j⫹1,n }n∈Z and {j⫹1,n }n∈Z is an orthonormal basis of Vj . Thus, any j,p can
be decomposed in this basis:
⫹⬁
j,p ⫽
j,p , j⫹1,n j⫹1,n
n⫽⫺⬁
⫹⬁
⫹
j,p , j⫹1,n j⫹1,n .
n⫽⫺⬁
Inserting (7.106) and (7.108) yields
⫹⬁
j,p ⫽
⫹⬁
h[ p ⫺ 2n] j⫹1,n ⫹
n⫽⫺⬁
g[ p ⫺ 2n] j⫹1,n .
n⫽⫺⬁
Taking the inner product with f on both sides of this equality gives (7.104).
■
Theorem 7.10 proves that aj⫹1 and dj⫹1 are computed by taking every other sample of the convolution of aj with h̄ and ḡ, respectively, as illustrated by Figure 7.12.
The filter h̄ removes the higher frequencies of the inner product sequence aj ,
whereas ḡ is a high-pass filter that collects the remaining highest frequencies. The
reconstruction (7.104) is an interpolation that inserts zeroes to expand aj⫹1 and
dj⫹1 and filters these signals, as shown in Figure 7.12.
An orthogonal wavelet representation of aL ⫽ f , L,n is composed of wavelet
coefficients of f at scales 2L ⬍ 2 j ⭐ 2 J , plus the remaining approximation at the
largest scale 2 J :
#
$
{dj }L⬍j⭐J , aJ .
(7.110)
aj
2
h
2
aj 11
2
h
2
aj 12
g–
2
dj 11
g–
2
dj 12
(a)
aj 12
2
h
dj 12
2
g
1
aj 11
2
h
dj 11
2
g
1
aj
(b)
FIGURE 7.12
(a) A fast wavelet transform is computed with a cascade of filterings with h̄ and ḡ followed by
a factor 2 subsampling. (b) A fast inverse wavelet transform reconstructs progressively each
aj by inserting zeroes between samples of aj⫹1 and dj⫹1 , filtering and adding the output.
7.3 Wavelets and Filter Banks
It is computed from aL by iterating (7.102) and (7.103) for L ⭐ j ⬍ J . Figure 7.7
gives a numerical example computed with the cubic spline filter of Table 7.1. The
original signal aL is recovered from this wavelet representation by iterating the
reconstruction (7.104) for J ⬎ j ⭓ L.
Initialization
Most often the discrete input signal b[n] is obtained by a finite-resolution device that
averages and samples an analog input signal. For example, a CCD camera filters the
light intensity by the optics and each photoreceptor averages the input light over its
support.Thus,a pixel value measures average light intensity. If the sampling distance
is N ⫺1 , to define and compute the wavelet coefficients, we need to associate to
b[n] a function f (t) ∈ VL approximated at the scale 2L ⫽ N ⫺1 , and compute aL [n] ⫽
f , L,n . Exercise 7.6 explains how to compute aL [n] ⫽ f , L,n so that b[n] ⫽
f (N ⫺1 n).
A simpler and faster approach considers
⫹⬁
t ⫺ 2L n
∈ VL .
f (t) ⫽
b[n]
2L
n⫽⫺⬁
Since {L,n (t) ⫽ 2⫺L/2 (2⫺L t ⫺ n)}n∈Z is orthonormal and 2L ⫽ N ⫺1 ,
b[n] ⫽ N 1/2 f , L,n ⫽ N 1/2 aL [n].
&⬁
ˆ
But (0)
⫽ ⫺⬁ (t) dt ⫽ 1, so
N
1/2
1
t ⫺ N ⫺1 n
aL [n] ⫽
f (t) ⫺1
dt
N
N ⫺1
⫺⬁
⫹⬁
is a weighted average of f in the neighborhood of N ⫺1 n over a domain proportional
to N ⫺1 . Thus, if f is regular,
b[n] ⫽ N 1/2 aL [n] ≈ f (N ⫺1 n).
(7.111)
If is a coiflet and f (t) is regular in the neighborhood of N ⫺1 n,then (7.100) shows
that N ⫺1/2 aL [n] is a high-order approximation of f (N ⫺1 n).
Finite Signals
Let us consider a signal f with a support in [0, 1] and that is approximated with
a uniform sampling at intervals N ⫺1 . The resulting approximation aL has N ⫽ 2⫺L
samples. This is the case in Figure 7.7 with N ⫽ 1024. Computing the convolutions
with h̄ and ḡ at abscissa close to 0 or close to N requires knowing the values of
aL [n] beyond the boundaries n ⫽ 0 and n ⫽ N ⫺ 1. These boundary problems may
be solved with one of the three approaches described in Section 7.5.
Section 7.5.1 explains the simplest algorithm, which periodizes aL . The convolutions in Theorem 7.10 are replaced by circular convolutions. This is equivalent
to decomposing f in a periodic wavelet basis of L 2 [0, 1]. This algorithm has the
disadvantage of creating large wavelet coefficients at the borders.
301
302
CHAPTER 7 Wavelet Bases
If is symmetric or antisymmetric, we can use a folding procedure described
in Section 7.5.2, which creates smaller wavelet coefficients at the border. It decomposes f in a folded wavelet basis of L 2 [0, 1]. However,we mentioned in Section 7.2.3
that Haar is the only symmetric wavelet with a compact support. Higher-order spline
wavelets have a symmetry, but h must be truncated in numerical calculations.
The most efficient boundary treatment is described in Section 7.5.3, but the
implementation is more complicated. Boundary wavelets that keep their vanishing
moments are designed to avoid creating large-amplitude coefficients when f is
regular.The fast algorithm is implemented with special boundary filters and requires
the same number of calculations as the two other methods.
Complexity
Suppose that h and g have K nonzero coefficients. Let aL be a signal of size N ⫽ 2⫺L .
With appropriate boundary calculations, each aj and dj has 2⫺j samples. Equations
(7.102) and (7.103) compute aj⫹1 and dj⫹1 from aj with 2⫺j K additions and multiplications. Therefore, the wavelet representation (7.110) is calculated with at most
2KN additions and multiplications. The reconstruction (7.104) of aj from aj⫹1 and
dj⫹1 is also obtained with 2⫺j K additions and multiplications.The original signal aL
is also recovered from the wavelet representation with at most 2KN additions and
multiplications.
Wavelet Graphs
The graphs of and are computed numerically with the inverse wavelet transform. If f ⫽ , then a0 [n] ⫽ ␦[n] and dj [n] ⫽ 0 for all L ⬍ j ⭐ 0. The inverse wavelet
transform computes aL and (7.111) shows that
N 1/2 aL [n] ≈ (N ⫺1 n).
If is regular and N is large enough, we recover a precise approximation of the
graph of from aL .
Similarly, if f ⫽ , then a0 [n] ⫽ 0, d0 [n] ⫽ ␦[n], and dj [n] ⫽ 0 for L ⬍ j ⬍ 0. Then
aL [n] is calculated with the inverse wavelet transform and N 1/2 aL [n] ≈ (N ⫺1 n).
The Daubechies wavelets and scaling functions in Figure 7.10 are calculated with
this procedure.
7.3.2 Perfect Reconstruction Filter Banks
The fast discrete wavelet transform decomposes signals into low-pass and high-pass
components subsampled by 2; the inverse transform performs the reconstruction.
The study of such classical multirate filter banks became a major signal-processing
topic in 1976, when Croisier, Esteban, and Galand [189] discovered that it is possible to perform such decompositions and reconstructions with quadrature mirror
filters (Exercise 7.7). However, besides the simple Haar filter, a quadrature mirror
filter cannot have a finite impulse response. In 1984, Smith and Barnwell [444]
and Mintzer [376] found necessary and sufficient conditions for obtaining perfect
7.3 Wavelets and Filter Banks
reconstruction orthogonal filters with a finite impulse response that they called
conjugate mirror filters. The theory was completed by the biorthogonal equations
of Vetterli [471, 472] and the general paraunitary matrix theory of Vaidyanathan
[469].We follow this digital signal-processing approach,which gives a simple understanding of conjugate mirror filter conditions. More complete presentations of filter
bank properties can be found in [1, 2, 63, 68, 69].
Filter Bank
A two-channel multirate filter bank convolves a signal a0 with a low-pass filter h̄[n] ⫽
h[⫺n] and a high-pass filter ḡ[n] ⫽ g[⫺n] and subsamples by 2 the output:
a1 [n] ⫽ a0 h̄[2n] and
d1 [n] ⫽ a0 ḡ[2n].
(7.112)
A reconstructed signal ã0 is obtained by filtering the zero expanded signals with a
dual low-pass filter h̃ and a dual high-pass filter g̃, as shown in Figure 7.13. With the
zero insertion notation (7.101) it yields
ã0 [n] ⫽ ǎ1 h̃[n] ⫹ ď1 g̃[n].
(7.113)
We study necessary and sufficient conditions on h, g, h̃, and g̃ to guarantee a perfect
reconstruction ã0 ⫽ a0 .
2
h
2
a1[n]
2
2
d1[n]
2
h
+
a0 [n]
2
g
a0 [n]
g
FIGURE 7.13
The input signal is filtered by a low-pass and a high-pass filter and subsampled. The
reconstruction is performed by inserting zeroes and filtering with dual filters h̃ and g̃.
Subsampling and Zero Interpolation
Subsamplings and expansions withzero insertions have simple expressions in
⫺in , the Fourier series of the
the Fourier domain. Since x̂() ⫽ ⫹⬁
n⫽⫺⬁ x[n] e
subsampled signal y[n] ⫽ x[2n] can be written as
⫹⬁
ŷ(2) ⫽
n⫽⫺⬁
x[2n] e⫺i2n ⫽
1
x̂() ⫹ x̂( ⫹ ) .
2
(7.114)
The component x̂( ⫹ ) creates a frequency folding. This aliasing must be canceled at the reconstruction.
The insertion of zeros defines
x[ p] if n ⫽ 2p
y[n] ⫽ x̌[n] ⫽
0
if n ⫽ 2p ⫹ 1,
303
304
CHAPTER 7 Wavelet Bases
that has a Fourier transform
⫹⬁
ŷ() ⫽
x[n] e⫺i2n ⫽ x̂(2).
(7.115)
n⫽⫺⬁
Theorem 7.11 gives Vetterli’s [471] biorthogonal conditions, which guarantee that
ã0 ⫽ a0 .
Theorem 7.11: Vetterli. The filter bank performs an exact reconstruction for any input
signal if and only if
,̃
ĥ∗ ( ⫹ ) h() ⫹ ĝ ∗ ( ⫹ ) ,̃
g() ⫽ 0,
(7.116)
,̃
ĥ∗ () h() ⫹ ĝ ∗ () ,̃
g() ⫽ 2.
(7.117)
and
Proof. We first relate the Fourier transform of a1 and d1 to the Fourier transform of a0 . Since
h and g are real, the transfer functions of h̄ and ḡ are, respectively, ĥ(⫺) ⫽ ĥ∗ () and
ĝ(⫺) ⫽ ĝ ∗ (). By using (7.114), we derive from the definition (7.112) of a1 and d1 that
1
â0 () ĥ∗ () ⫹ â0 ( ⫹ ) ĥ∗ ( ⫹ ) ,
2
1
d̂1 (2) ⫽ â0 () ĝ ∗ () ⫹ â0 ( ⫹ ) ĝ ∗ ( ⫹ ) .
2
â1 (2) ⫽
(7.118)
(7.119)
The expression (7.113) of ã0 and the zero insertion property (7.115) also imply
,̃
,̃
g().
a0 () ⫽ â1 (2) h() ⫹ d̂1 (2) ,̃
(7.120)
Thus,
1 ∗
,̃
,̃
a0 () ⫽
ĥ () h() ⫹ ĝ ∗ () ,̃
g() â0 ()
2
1 ∗
,̃
ĥ ( ⫹ ) h() ⫹ ĝ ∗ ( ⫹ ) ,̃
g() â0 ( ⫹ ).
⫹
2
To obtain a0 ⫽ ã0 for all a0 , the filters must cancel the aliasing term â0 ( ⫹ ) and
guarantee a unit gain for â0 (), which proves equations (7.116) and (7.117).
■
Theorem 7.11 proves that the reconstruction filters h̃ and g̃ are entirely specified
by the decomposition filters h and g. In matrix form, it can be rewritten
∗ ,̃
2
ĥ()
ĝ()
h ()
⫽
.
(7.121)
⫻
∗
,̃
0
ĥ( ⫹ ) ĝ( ⫹ )
g ()
The inversion of this 2 ⫻ 2 matrix yields
∗ ,̃
2
ĝ( ⫹ )
h ()
⫽
,
∗
,̃
⌬() ⫺ĥ( ⫹ )
g ()
(7.122)
7.3 Wavelets and Filter Banks
where ⌬() is the determinant
⌬() ⫽ ĥ() ĝ( ⫹ ) ⫺ ĥ( ⫹ ) ĝ().
(7.123)
The reconstruction filters are stable only if the determinant does not vanish for all
∈ [⫺, ]. Vaidyanathan [469] has extended this result to multirate filter banks
with an arbitrary number M of channels by showing that the resulting matrices of
filters satisfy paraunitary properties [68].
Finite Impulse Response
When all filters have a finite impulse response, the determinant ⌬() can be evaluated. This yields simpler relations between the decomposition and reconstruction
filters.
Theorem 7.12. Perfect reconstruction filters satisfy
,̃
,̃
ĥ∗ () h() ⫹ ĥ∗ ( ⫹ ) h( ⫹ ) ⫽ 2.
(7.124)
For finite impulse-response filters, there exist a ∈ R and l ∈ Z such that
,̃∗
ĝ() ⫽ a e⫺i(2l⫹1) h ( ⫹ )
and ,̃
g() ⫽ a⫺1 e⫺i(2l⫹1) ĥ∗ ( ⫹ ).
(7.125)
Proof. Equation (7.122) proves that
,̃∗
h () ⫽
2
ĝ( ⫹ )
⌬()
⫺2
∗
and ,̃
g () ⫽
ĥ( ⫹ ).
⌬()
(7.126)
Thus,
∗
ĝ() ,̃
g () ⫽ ⫺
⌬( ⫹ ) ,̃∗
h ( ⫹ ) ĥ( ⫹ ).
⌬()
(7.127)
The definition (7.123) implies that ⌬( ⫹ ) ⫽ ⫺⌬(). Inserting (7.127) in (7.117) yields
(7.124).
The Fourier transform of finite impulse-response filters is a finite series in exp(⫾ in).
Therefore, the determinant ⌬() defined by (7.123) is a finite series. Moreover, (7.126)
proves that ⌬⫺1 () must also be a finite series. A finite series in exp(⫾ in) that has an
inverse that is also a finite series must have a single term. Since ⌬() ⫽ ⫺⌬( ⫹ ) the
exponent n must be odd. This proves that there exist l ∈ Z and a ∈ R such that
⌬() ⫽ ⫺2 a exp[i(2l ⫹ 1)].
Inserting this expression in (7.126) yields (7.125).
(7.128)
■
The factor a is a gain that is inverse for the decomposition and reconstruction
filters and l is a reverse shift. We generally set a ⫽ 1 and l ⫽ 0. In the time domain
(7.125) can then be rewritten as
g[n] ⫽ (⫺1)1⫺n h̃[1 ⫺ n]
and
g̃[n] ⫽ (⫺1)1⫺n h[1 ⫺ n].
(7.129)
The two pairs of filters (h, g) and (h̃, g̃) play a symmetric role and can be inverted.
305
306
CHAPTER 7 Wavelet Bases
Conjugate Mirror Filters
If we impose that the decomposition filter h is equal to the reconstruction filter h̃,
then (7.124) is the condition of Smith and Barnwell [444] and Mintzer [376] that
defines conjugate mirror filters:
|ĥ()|2 ⫹ |ĥ( ⫹ )|2 ⫽ 2.
(7.130)
It is identical to the filter condition (7.29) that is required in order to synthesize orthogonal wavelets. Section 7.3.3 proves that it is also equivalent to discrete
orthogonality properties.
7.3.3 Biorthogonal Bases of 2 (Z)
The decomposition of a discrete signal in a multirate filter bank is interpreted as an
expansion in a basis of 2 (Z). Observe first that the low-pass and high-pass signals
of a filter bank computed with (7.112) can be rewritten as inner products in 2 (Z):
⫹⬁
a1 [l] ⫽
a0 [n] h[n ⫺ 2l] ⫽ a0 [n], h[n ⫺ 2l],
(7.131)
a0 [n] g[n ⫺ 2l] ⫽ a0 [n], g[n ⫺ 2l].
(7.132)
n⫽⫺⬁
⫹⬁
d1 [l] ⫽
n⫽⫺⬁
The signal recovered by the reconstructing filters is
⫹⬁
a0 [n] ⫽
⫹⬁
a1 [l] h̃[n ⫺ 2l] ⫹
l⫽⫺⬁
d1 [l] g̃[n ⫺ 2l].
(7.133)
l⫽⫺⬁
Inserting (7.131) and (7.132) yields
⫹⬁
a0 [n] ⫽
⫹⬁
f [k], h[k ⫺ 2l] h̃[n ⫺ 2l] ⫹
l⫽⫺⬁
f [k], g[k ⫺ 2l] g̃[n ⫺ 2l].
(7.134)
l⫽⫺⬁
We recognize the decomposition of a0 over dual families of vectors {h̃[n ⫺ 2l],
g̃[n ⫺ 2l]}l∈Z and {h[n ⫺ 2l], g[n ⫺ 2l]}l∈Z . Theorem 7.13 proves that these two
families are biorthogonal.
Theorem 7.13. If h, g, h̃, and g̃ are perfect reconstruction filters, and their Fourier
transforms are bounded, then {h̃[n ⫺ 2l], g̃[n ⫺ 2l]}l∈Z and {h[n ⫺ 2l], g[n ⫺ 2l]}l∈Z are
biorthogonal Riesz bases of 2 (Z).
Proof. To prove that these families are biorthogonal we must show that for all n ∈ Z
h̃[n], h[n ⫺ 2l] ⫽ ␦[l]
(7.135)
g̃[n], g[n ⫺ 2l] ⫽ ␦[l]
(7.136)
h̃[n], g[n ⫺ 2l] ⫽ g̃[n], h[n ⫺ 2l] ⫽ 0.
(7.137)
and
7.3 Wavelets and Filter Banks
For perfect reconstruction filters, (7.124) proves that
1 ∗
,̃
,̃
ĥ () h() ⫹ ĥ∗ ( ⫹ ) h( ⫹ ) ⫽ 1.
2
In the time domain, this equation becomes
⫹⬁
h̄ h̃[2l] ⫽
h̃[n] h̄[n ⫺ 2l] ⫽ ␦[l],
(7.138)
k⫽⫺⬁
which verifies (7.135). The same proof as for (7.124) shows that
1 ∗
ĝ () ,̃
g() ⫹ ĝ ∗ ( ⫹ ) ,̃
g( ⫹ ) ⫽ 1.
2
In the time domain, this equation yields (7.136). It also follows from (7.122) that
1 ∗
,̃
,̃
ĝ () h() ⫹ ĝ ∗ ( ⫹ ) h( ⫹ ) ⫽ 0,
2
and
1 ∗
g() ⫹ ĥ∗ ( ⫹ ) ,̃
g( ⫹ ) ⫽ 0.
ĥ () ,̃
2
The inverse Fourier transforms of these two equations yield (7.137).
To finish the proof, one must show the existence of Riesz bounds. The reader can
verify that this is a consequence of the fact that the Fourier transform of each filter is
bounded.
■
Orthogonal Bases
A Riesz basis is orthonormal if the dual basis is the same as the original basis. For
filter banks this means that h ⫽ h̃ and g ⫽ g̃. The filter h is then a conjugate mirror
filter
|ĥ()|2 ⫹ |ĥ( ⫹ )|2 ⫽ 2.
The resulting family {h[n ⫺ 2l], g[n ⫺ 2l]}l∈Z
(7.139)
is an orthogonal basis of 2 (Z).
Discrete Wavelet Bases
The construction of conjugate mirror filters is simpler than the construction of
orthogonal wavelet bases of L 2 (R). Why then should we bother with continuous
time models of wavelets, since in any case, all computations are discrete and rely on
conjugate mirror filters? The reason is that conjugate mirror filters are most often
used in filter banks that cascade several levels of filterings and subsamplings. Thus,
it is necessary to understand the behavior of such a cascade [407]. In a wavelet
filter bank tree, the output of the low-pass filter h̄ is subdecomposed, whereas the
output of the high-pass filter ḡ is not; this is illustrated in Figure 7.12. Suppose that
the sampling distance of the original discrete signal is N ⫺1 . We denote aL [n] for
this discrete signal, with 2L ⫽ N ⫺1 . At the depth j ⫺ L ⭓ 0 of this filter bank tree, the
low-pass signal aj and high-pass signal dj can be written as
¯ j [2 j⫺L l] ⫽ aL [n], j [n ⫺ 2 j⫺L l]
aj [l] ⫽ aL
307
308
CHAPTER 7 Wavelet Bases
and
dj [l] ⫽ aL ¯ j [2 j⫺L l] ⫽ aL [n], j [n ⫺ 2 j⫺L l].
The Fourier transforms of these equivalent filters are
ˆ j () ⫽
j⫺L⫺1
ĥ(2 p )
and
ˆ j () ⫽ ĝ(2 j⫺L⫺1 )
p⫽0
j⫺L⫺2
ĥ(2 p ).
(7.140)
p⫽0
A filter bank tree of depth J ⫺ L ⭓ 0 decomposes aL over the family of vectors
.
.
J [n ⫺ 2 J ⫺L l]
,
j [n ⫺ 2 j⫺L l]
.
(7.141)
l∈Z
L⬍j⭐J , l∈Z
For conjugate mirror filters,one can verify that this family is an orthonormal basis of
2 (Z).These discrete vectors are close to a uniform sampling of the continuous timescaling functions j (t) ⫽ 2⫺j/2 (2⫺j t) and wavelets j (t) ⫽ 2⫺j/2 (2⫺j t). When the
number L ⫺ j of successive convolutions increases, one can verify that j [n] and
j [n] converge, respectively, to N ⫺1/2 j (N ⫺1 n) and N ⫺1/2 j (N ⫺1 n).
The factor N ⫺1/2 normalizes the 2 (Z) norm of these sampled functions. If L ⫺ j ⫽
4,then j [n] and j [n] are already very close to these limit values.Thus,the impulse
responses j [n] and j [n] of the filter bank are much closer to continuous timescaling functions and wavelets than they are to the original conjugate mirror filters
h and g. This explains why wavelets provide appropriate models for understanding
the applications of these filter banks. Chapter 8 relates more general filter banks to
wavelet packet bases.
If the decomposition and reconstruction filters of the filter bank are different,
the resulting basis (7.141) is nonorthogonal. The stability of this discrete wavelet
basis does not degrade when the depth J ⫺ L of the filter bank increases. The next
section shows that the corresponding continuous time wavelet (t) generates a
Riesz basis of L 2 (R).
7.4 BIORTHOGONAL WAVELET BASES
The stability and completeness properties of biorthogonal wavelet bases are
described for perfect reconstruction filters h and h̃ having a finite impulse
response. The design of linear phase wavelets with compact support is explained
in Section 7.4.2.
7.4.1 Construction of Biorthogonal Wavelet Bases
An infinite cascade of perfect reconstruction filters (h, g) and (h̃, g̃) yields two
scaling functions and wavelets having a Fourier transform that satisfies
1 ,̃ ,̃
1
,̃
ˆ
ˆ
(2) ⫽ √ h() (),
(7.142)
(2)
⫽ √ ĥ() (),
2
2
1
1
,̃
,̃
ˆ
ˆ
(2) ⫽ √ ,̃
g() ().
(7.143)
(2)
⫽ √ ĝ() (),
2
2
7.4 Biorthogonal Wavelet Bases
In the time domain, these relations become
√
(t) ⫽ 2
⫹⬁
h[n] (2t ⫺ n),
√
˜ ⫽ 2
(t)
n⫽⫺⬁
√
(t) ⫽ 2
⫹⬁
˜
h̃[n] (2t
⫺ n)
(7.144)
˜
g̃[n] (2t
⫺ n).
(7.145)
n⫽⫺⬁
⫹⬁
g[n] (2t ⫺ n),
n⫽⫺⬁
√
˜ ⫽ 2
(t)
⫹⬁
n⫽⫺⬁
The perfect reconstruction conditions are given by Theorem 7.12. If we
normalize the gain and shift to a ⫽ 1 and l ⫽ 0, the filters must satisfy
,̃
,̃
ĥ∗ () h() ⫹ ĥ∗ ( ⫹ ) h( ⫹ ) ⫽ 2,
(7.146)
and
,̃
ĝ() ⫽ e⫺i h∗ ( ⫹ ), ,̃
g() ⫽ e⫺i ĥ∗ ( ⫹ ).
(7.147)
,̃
ˆ
Wavelets should have a zero average, which means that (0)
⫽ (0) ⫽ 0. This
,̃
,̃
is obtained by setting ĝ(0) ⫽ g(0) ⫽ 0 and thus ĥ() ⫽ h() ⫽ 0. The perfect recon,̃
struction condition (7.146) implies that ĥ∗ (0) h(0) ⫽ 2. Since both filters are defined
up to multiplicative constants equal to and ⫺1 , respectively, we adjust so that
√
,̃
ĥ(0) ⫽ h(0) ⫽ 2.
In the following,we also suppose that h and h̃ are finite impulse-response filters.
One can then prove [19] that
⫹⬁
⫹⬁ ,̃ ⫺p
ĥ(2⫺p )
h(2 )
,̃
ˆ
and () ⫽
(7.148)
()
⫽
√
√
2
2
p⫽1
p⫽1
are the Fourier transforms of distributions of compact support. However, these
distributions may exhibit wild behavior and have infinite energy. Some further conˆ and ,̃
ditions must be imposed to guarantee that
are the Fourier transforms of
finite energy functions. Theorem 7.14 gives sufficient conditions on the perfect
reconstruction filters for synthesizing biorthogonal wavelet bases of L 2 (R).
Theorem 7.14: Cohen, Daubechies, Feauveau. Suppose that there exist strictly positive
trigonometric polynomials P(ei ) and P̃(ei ) such that
2
2
(7.149)
⫹ P(ei(/2⫹) ) ⫽ 2 P(ei ),
P(ei/2 ) ⫹ ĥ
ĥ
2
2
2
2
,̃
,̃
(7.150)
⫹ P̃(ei(/2⫹) ) ⫽ 2 P̃(ei ),
P̃(ei/2 ) ⫹ h
h
2
2
and that P and P̃ are unique (up to normalization). Suppose that
,̃
|ĥ()| ⬎ 0,
inf
|h()| ⬎ 0.
inf
∈[⫺/2,/2]
∈[⫺/2,/2]
(7.151)
ˆ and ,̃
Then, the functions
defined in (7.148) belong to L 2 (R), and , ˜ satisfy
biorthogonal relations
˜ ⫺ n) ⫽ ␦[n].
(t), (t
(7.152)
309
310
CHAPTER 7 Wavelet Bases
The two wavelet families {j,n }( j,n)∈Z2 and {˜ j,n }( j,n)∈Z2 are biorthogonal Riesz bases
of L 2 (R).
The proof of this theorem is in [172] and [19]. The hypothesis (7.151) is also
imposed by Theorem 7.2, which constructs orthogonal bases of scaling functions.
The conditions (7.149) and (7.150) do not appear in the construction of wavelet
orthogonal bases because they are always satisfied with P(ei ) ⫽ P̃(ei ) ⫽ 1, and one
can prove that constants are the only invariant trigonometric polynomials [341].
Biorthogonality means that for any ( j, j⬘, n, n⬘) ∈ Z4 ,
j,n , ˜ j⬘,n⬘ ⫽ ␦[n ⫺ n⬘] ␦[ j ⫺ j⬘].
(7.153)
Any f ∈ L 2 (R) has two possible decompositions in these bases:
⫹⬁
f⫽
f , j,n ˜ j,n ⫽
n, j⫽⫺⬁
⫹⬁
f , ˜ j,n j,n .
(7.154)
n, j⫽⫺⬁
The Riesz stability implies that there exist A ⬎ 0 and B ⬎ 0 such that
⫹⬁
A f 2 ⭐
| f , j,n |2 ⭐ B f 2 ,
(7.155)
1
1
f 2 ⭐
| f , ˜ j,n |2 ⭐ f 2 .
B
A
n, j⫽⫺⬁
(7.156)
n, j⫽⫺⬁
⫹⬁
Multiresolutions
Biorthogonal wavelet bases are related to multiresolution approximations. The fam˜ ⫺ n)}n∈Z
ily {(t ⫺ n)}n∈Z is a Riesz basis of the space V0 it generates, whereas {(t
is a Riesz basis of another space Ṽ0 . Let Vj and Ṽj be the spaces defined by
f (t) ∈ Vj ⇔ f (2 j t) ∈ V0 ,
f (t) ∈ Ṽj ⇔ f (2 j t) ∈ Ṽ0 .
One can verify that {Vj }j∈Z and {Ṽj }j∈Z are two multiresolution approximations of
˜ j,n }n∈Z are Riesz bases of Vj and Ṽj .The dilated
L 2 (R). For any j ∈ Z,{j,n }n∈Z and {
˜
wavelets {j,n }n∈Z and {j,n }n∈Z are bases of two detail spaces Wj and W̃j such that
Vj ⊕ Wj ⫽ Vj⫺1
and
Ṽj ⊕ W̃j ⫽ Ṽj⫺1 .
The biorthogonality of the decomposition and reconstruction wavelets implies
that Wj is not orthogonal to Vj but is to Ṽj , whereas W̃j is not orthogonal to Ṽj
but is to Vj .
Fast Biorthogonal Wavelet Transform
The perfect reconstruction filter bank discussed in Section 7.3.2 implements a
fast biorthogonal wavelet transform. For any discrete signal input b[n] sampled
7.4 Biorthogonal Wavelet Bases
at intervals N ⫺1 ⫽ 2L , there exists f ∈ VL such that aL [n] ⫽ f , L,n ⫽ N ⫺1/2 b[n].
The wavelet coefficients are computed by successive convolutions with h̄ and ḡ.
Let aj [n] ⫽ f , j,n and dj [n] ⫽ f , j,n . As in Theorem 7.10, one can prove that
aj⫹1 [n] ⫽ aj h̄[2n],
dj⫹1 [n] ⫽ aj ḡ[2n].
(7.157)
The reconstruction is performed with the dual filters h̃ and g̃:
aj [n] ⫽ ǎj⫹1 h̃[n] ⫹ ďj⫹1 g̃[n].
(7.158)
#If aL includes
$ N nonzero samples, the biorthogonal wavelet representation
{dj }L⬍j⭐J , aJ is calculated with O(N ) operations by iterating (7.157) for L ⭐ j ⬍ J .
The reconstruction of aL by applying (7.158) for J ⬎ j ⭓ L requires the same number
of operations.
7.4.2 Biorthogonal Wavelet Design
The support size,the number of vanishing moments,the regularity,wavelet ordering,
and the symmetry of biorthogonal wavelets is controlled with an appropriate design
of h and h̃.
Support
If the perfect reconstruction filters h and h̃ have a finite impulse response, then
the corresponding scaling functions and wavelets also have a compact support. As
in Section 7.2.1, one can show that if h[n] and h̃[n] are nonzero, respectively, for
˜ have a support equal to [N1 , N2 ] and
N1 ⭐ n ⭐ N2 and Ñ1 ⭐ n ⭐ Ñ2 , then and
[Ñ1 , Ñ2 ], respectively. Since
g[n] ⫽ (⫺1)1⫺n h[1 ⫺ n]
and g̃[n] ⫽ (⫺1)1⫺n h̃[1 ⫺ n],
the supports of and ˜ defined in (7.145) are, respectively,
"
!
"
!
Ñ1 ⫺ N2 ⫹ 1 Ñ2 ⫺ N1 ⫹ 1
N1 ⫺ Ñ2 ⫹ 1 N2 ⫺ Ñ1 ⫹ 1
,
and
,
.
2
2
2
2
(7.159)
Thus, both wavelets have a support of the same size and equal to
l⫽
Vanishing Moments
N2 ⫺ N1 ⫹ Ñ2 ⫺ Ñ1
.
2
(7.160)
The number of vanishing moments of and ˜ depends on the number of zeroes
,̃
at ⫽ of ĥ() and h(). Theorem 7.4 proves that has p̃ vanishing moments if
ˆ
the derivatives of its Fourier transform satisfy ˆ (k) (0) ⫽ 0 for k ⭐ p̃. Since (0)
⫽ 1,
(7.4.1) implies that it is equivalent to impose that ĝ() has a zero of order p̃ at
,̃
,̃
⫽ 0. Since ĝ() ⫽ e⫺i h∗ ( ⫹ ), this means that h() has a zero of order p̃ at
⫽ . Similarly, the number of vanishing moments of ˜ is equal to the number p of
zeroes of ĥ() at .
311
312
CHAPTER 7 Wavelet Bases
Regularity
Although the regularity of a function is a priori independent of the number of
vanishing moments, the smoothness of biorthogonal wavelets is related to their
vanishing moments. The regularity of and is the same because (7.145) shows
that is a finite linear expansion of translated.Tchamitchian’s theorem (7.6) gives
a sufficient condition for estimating this regularity. If ĥ() has a zero of order p at
, we can perform the factorization
p
1 ⫹ e⫺i
l̂().
(7.161)
ĥ() ⫽
2
Let B ⫽ sup∈[⫺,] |l̂()|. Theorem 7.6 proves that is uniformly Lipschitz
␣ for
␣ ⬍ ␣0 ⫽ p ⫺ log2 B ⫺ 1.
Generally, log2 B increases more slowly than p. This implies that the regularity of
˜
and increases with p, which is equal to the number of vanishing moments of .
˜
˜
Similarly, one can show that the regularity of and increases with p̃, which is the
number of vanishing moments of . If ĥ and h̃ have different numbers of zeroes at
, the properties of and ˜ can be very different.
Ordering of Wavelets
Since and ˜ might not have the same regularity and number of vanishing moments,
the two reconstruction formulas
⫹⬁
f⫽
f , j,n ˜ j,n ,
(7.162)
f , ˜ j,n j,n
(7.163)
n, j⫽⫺⬁
⫹⬁
f⫽
n, j⫽⫺⬁
are not equivalent.The decomposition (7.162) is obtained with the filters (h, g),and
the reconstruction with (h̃, g̃). The inverse formula (7.163) corresponds to (h̃, g̃) at
the decomposition and (h, g) at the reconstruction.
To produce small wavelet coefficients in regular regions we must compute the
inner products using the wavelet with the maximum number of vanishing moments.
The reconstruction is then performed with the other wavelet,which is generally the
smoothest one. If errors are added to the wavelet coefficients, for example with a
quantization, a smooth wavelet at the reconstruction introduces a smooth error.
The number of vanishing moments of is equal to the number p̃ of zeroes at
,̃
˜ Thus, it is better to use h at the
of h. Increasing p̃ also increases the regularity of .
,̃
decomposition and h̃ at the reconstruction if ĥ has fewer zeroes at than h.
Symmetry
It is possible to construct smooth biorthogonal wavelets of compact support that
are either symmetric or antisymmetric. This is impossible for orthogonal wavelets,
7.4 Biorthogonal Wavelet Bases
besides the particular case of the Haar basis. Symmetric or antisymmetric wavelets
are synthesized with perfect reconstruction filters having a linear phase. If h and h̃
have an odd number of nonzero samples and are symmetric about n ⫽ 0, the reader
˜ are symmetric about t ⫽ 0, while and ˜ are symmetric
can verify that and
with respect to a shifted center. If h and h̃ have an even number of nonzero samples
˜
and are symmetric about n ⫽ 1/2, then (t) and (t)
are symmetric about t ⫽ 1/2,
while and ˜ are antisymmetric with respect to a shifted center.When the wavelets
are symmetric or antisymmetric, wavelet bases over finite intervals are constructed
with the folding procedure of Section 7.5.2.
7.4.3 Compactly Supported Biorthogonal Wavelets
We study the design of biorthogonal wavelets with a minimum-size support for a
specified number of vanishing moments. Symmetric or antisymmetric compactly
supported spline biorthogonal wavelet bases are constructed with a technique
introduced in [172].
Theorem 7.15: Cohen, Daubechies, Feauveau. Biorthogonal wavelets and ˜ with,
respectively, p̃ and p vanishing moments have a support size of at least p ⫹ p̃ ⫺ 1. CDF
biorthogonal wavelets have a minimum support size p ⫹ p̃ ⫺ 1.
Proof. The proof follows the same approach as the proof of Daubechies’ theorem (7.7). One
can verify that p and p̃ must necessarily have the same parity. We concentrate on filters
h[n] and h̃[n] that have a symmetry with respect to n ⫽ 0 or n ⫽ 1/2. The general case
proceeds similarly. We can then factor
√
p
⫺i ĥ() ⫽ 2 exp
cos
L(cos ),
2
2
√
⫺i p̃
,̃
h() ⫽ 2 exp
L̃(cos ),
cos
2
2
(7.164)
(7.165)
with ⫽ 0 for p and p̃ for even values and ⫽ 1 for odd values. Let q ⫽ (p ⫹ p̃)/2. The
perfect reconstruction condition
,̃
,̃
ĥ∗ () h() ⫹ ĥ∗ ( ⫹ ) h( ⫹ ) ⫽ 2
is imposed by writing
L(cos ) L̃(cos ) ⫽ P sin2
,
2
(7.166)
where the polynomial P( y) must satisfy for all y ∈ [0, 1]
(1 ⫺ y)q P( y) ⫹ yq P(1 ⫺ y) ⫽ 1.
(7.167)
We saw in (7.96) that the polynomial of minimum degree satisfying this equation is
q⫺1 P( y) ⫽
k⫽0
q ⫺1⫹k k
y .
k
(7.168)
The spectral factorization (7.166) is solved with a root attribution similar to (7.98). The
resulting minimum support of and ˜ specified by (7.160) is then p ⫹ p̃ ⫺ 1.
■
313
314
CHAPTER 7 Wavelet Bases
Spline Biorthogonal Wavelets
Let us choose
√
⫺i p
ĥ() ⫽ 2 exp
cos
2
2
(7.169)
with ⫽ 0 for p even and ⫽ 1 for p odd. The scaling function computed with
(7.148) is then a box spline of degree p ⫺ 1:
sin(/2) p
ˆ() ⫽ exp ⫺i
.
2
/2
Since is a linear combination of box splines (2t ⫺ n), it is a compactly supported
polynomial spline of the same degree.
The number of vanishing moments p̃ of is a free parameter, which must have
the same parity as p. Let q ⫽ ( p ⫹ p̃)/2.The biorthogonal filter h̃ of minimum length
is obtained by observing that L(cos ) ⫽ 1 in (7.164). Thus, the factorization (7.166)
and (7.168) imply that
q⫺1 √
⫺i p̃
2k
q ⫺1⫹k ,̃
h() ⫽ 2 exp
cos
sin
.
k
2
2
2
k⫽0
(7.170)
These filters satisfy the conditions of Theorem 7.14 and therefore generate biorthogonal wavelet bases. Table 7.3 gives the filter coefficients for ( p ⫽ 2, p̃ ⫽ 4) and
( p⫽3, p̃ ⫽ 7); see Figure 7.14 for the resulting dual wavelet and scaling functions.
Table 7.3 Perfect Reconstruction Filters h and h̃ for Compactly
Supported Spline Wavelets
n
0
1, ⫺1
2, ⫺2
3, ⫺3
4, ⫺4
0, 1
⫺1, 2
⫺2, 3
⫺3, 4
⫺4, 5
⫺5, 6
⫺6, 7
⫺7, 8
p, p̃
p⫽2
p̃ ⫽ 4
p⫽3
p̃ ⫽ 7
h[n]
h̃[n]
0.70710678118655
0.35355339059327
0.99436891104358
0.41984465132951
⫺0.17677669529664
⫺0.06629126073624
0.03314563036812
0.53033008588991
0.17677669529664
0.95164212189718
⫺0.02649924094535
⫺0.30115912592284
0.03133297870736
0.07466398507402
⫺0.01683176542131
⫺0.00906325830378
0.00302108610126
,̃
Note: ĥ and h have, respectively, p̃ and p zeros at ⫽ .
7.4 Biorthogonal Wavelet Bases
~
(t)
(t)
~
(t)
(t)
2
0.8
2
1
1.5
1.5
0.6
1
1
0.5
0
21
0.5
0.4
0
0.2
20.5
20.5
0
0.5
1
24
22
0
2
4
0.5
0
20.5
0
21
0
~
(t)
(t)
25
2
0
5
~
(t)
1
2
1
1
(t)
3
1.5
315
2
0.5
1
1
0.5
0
0
20.5
21
0
0
21
20.5
22
21
0
1
2
3
p 5 2, p̃ 5 4
22
21
0
1
p 5 2, p̃ 5 4
2
3
21
24
22
22
0
2
p 5 3, p̃ 5 7
4
24
22
0
2
p 5 3, p̃ 5 7
FIGURE 7.14
Spline biorthogonal wavelets and scaling functions of compact support corresponding to Table 7.3 filters.
Closer Filter Length
Biorthogonal filters h and h̃ of more similar length are obtained by factoring the
polynomial P(sin2 2 ) in (7.166) with two polynomial L(cos ) and L̃(cos ) of
similar degree. There is a limited number of possible factorizations. For q ⫽ ( p ⫹
p̃)/2 ⬍ 4, the only solution is L(cos ) ⫽ 1. For q ⫽ 4 there is one nontrivial factorization, and for q ⫽ 5 there are two. Table 7.4 gives the resulting coefficients of
the filters h and h̃ of most similar length, computed by Cohen, Daubechies, and
Feauveau [172]. These filters also satisfy the conditions of Theorem 7.14 and
therefore define biorthogonal wavelet bases.
Figure 7.15 gives the scaling functions and wavelets for p ⫽ p̃ ⫽ 2 and p ⫽ p̃ ⫽ 4,
which correspond to filter sizes 5/3 and 9/7, respectively. For p ⫽ p̃ ⫽ 4, , are
˜ ,
˜ which indicates that this basis is nearly orthogonal. This particular
similar to ,
set of filters is often used in image compression and recommended for JPEG-2000.
The quasi-orthogonality guarantees a good numerical stability and the symmetry
allows one to use the folding procedure of Section 7.5.2 at the boundaries. There
are also enough vanishing moments to create small wavelet coefficients in regular
image domains. Section 7.8.5 describes their lifting implementation, which is simple and efficient. Filter sizes 5/3 are also recommended for lossless compression
with JPEG-2000, because they use integer operations with a lifting algorithm. The
design of other compactly supported biorthogonal filters is discussed extensively
in [172, 473].
4
316
CHAPTER 7 Wavelet Bases
Table 7.4 Perfect Reconstruction Filters of Most Similar Length
p, p̃
n
h[n]
h̃[n]
p⫽2
p̃ ⫽ 2
0
⫺1, 1
⫺2, 2
0
⫺1, 1
⫺2, 2
⫺3, 3
⫺4, 4
0
⫺1, 1
⫺2, 2
⫺3, 3
⫺4, 4
⫺5, 5
0
⫺1, 1
⫺2, 2
⫺3, 3
⫺4, 4
⫺5, 5
1.06066017177982
0.35355339059327
⫺0.17677669529664
0.85269867900889
0.37740285561283
⫺0.11062440441844
⫺0.02384946501956
0.03782845554969
0.89950610974865
0.47680326579848
⫺0.09350469740094
⫺0.13670658466433
⫺0.00269496688011
0.01345670945912
0.54113273169141
0.34335173921766
0.06115645341349
0.00027989343090
0.02183057133337
0.00992177208685
0.70710678118655
0.35355339059327
0
0.78848561640637
0.41809227322204
⫺0.04068941760920
⫺0.06453888262876
0
0.73666018142821
0.34560528195603
⫺0.05446378846824
0.00794810863724
0.03968708834741
0
1.32702528570780
0.47198693379091
⫺0.36378609009851
⫺0.11843354319764
0.05382683783789
0
3
4
20.5
24 23 22 21 0
3
2.
2
0
0
3
1
1
0.5
0.5
2
0
2
1.5
1
1
0
5
2
1.5
0
2
5
2
1
5
1
1.
0
0.
20.5
23 22 21
21
2
2
2.
5
2
2 2
1.
5
2
2 1
0.
5
0
0.
5
1
1.
5
2
2.
5
0
0
20.5
20.5
2
0.5
1
0.5
0
2
1.5
1
0.5
0
20.5
21
21.5
23 22 21
~
(t)
1.5
0.5
1.5
1
2
1
1.
5
0
21
22
~
(t)
1
0
0.
5
1
5
2
2 1
0.
5
2
1.
3
1.5
1.
5
2
2 1
0.
5
(t)
6
5
4
3
2
1
0
21
22
23
24
0
0.
5
(t)
4
2
2 2
1.
5
2
2 1
0.
5
p⫽5
p̃ ⫽ 5
2
2.
5
p⫽5
p̃ ⫽ 5
1
1.
5
p⫽4
p̃ ⫽ 4
20.5
1
2
3
4
21
23 22 21
FIGURE 7.15
Biorthogonal wavelets and scaling functions calculated with the filters of Table 7.4, with p ⫽ 2 and p̃ ⫽ 2 (top
row ) and p ⫽ 4 and p̃ ⫽ 4 (bottom row ).
4
7.5 Wavelet Bases on an Interval
7.5 WAVELET BASES ON AN INTERVAL
To decompose signals f defined over an interval [0, 1], it is necessary to construct
wavelet bases of L 2 [0, 1]. Such bases are synthesized by modifying the wavelets
j,n (t) ⫽ 2⫺j/2 (2⫺j t ⫺ n) of a basis {j,n }( j,n)∈Z2 of L 2 (R). Inside wavelets j,n ,
have a support included in [0, 1], and are not modified. Boundary wavelets j,n ,
have a support that overlaps t ⫽ 0 or t ⫽ 1,and are transformed into functions having
a support in [0, 1],which are designed in order to provide the necessary complement
to generate a basis of L 2 [0, 1]. If has a compact support, then there is a constant
number of boundary wavelets at each scale.
The main difficulty is to construct boundary wavelets that keep their vanishing
moments. The next three sections describe different approaches to constructing
boundary wavelets. Periodic wavelets have no vanishing moments at the boundary,
whereas folded wavelets have one vanishing moment. The custom-designed boundary wavelets of Section 7.5.3 have as many vanishing moments as the inside wavelets
but are more complicated to construct. Scaling functions j,n are also restricted to
[0, 1] by modifying the scaling functions j,n (t) ⫽ 2⫺j/2 (2⫺j t ⫺ n) associated with
the wavelets j,n . The resulting wavelet basis of L 2 [0, 1] is composed of 2⫺J scaling
functions at a coarse scale 2 J ⬍ 1, plus 2⫺j wavelets at each scale 2 j ⭐ 2 J :
/
0
int
{int
(7.171)
J ,n }0⭐n⬍2⫺J , {j,n }⫺⬁⬍j⭐J ,0⭐n⬍2⫺j .
On any interval [a, b], a wavelet orthonormal basis of L 2 [a, b] is constructed with
a dilation by b ⫺ a and a translation by a of the wavelets in (7.171).
Discrete Basis of CN
The decomposition of a signal in a wavelet basis over an interval is computed by
modifying the fast wavelet transform algorithm of Section 7.3.1. A discrete signal
b[n] of N samples is associated to the approximation of a signal f ∈ L 2 [0, 1] at a
scale N ⫺1 ⫽ 2L with (7.111):
N ⫺1/2 b[n] ⫽ aL [n] ⫽ f , int
L,n for
0 ⭐ n ⬍ 2⫺L .
Its wavelet coefficients can be calculated at scales 1 ⭓ 2 j ⬎ 2L . We set
aj [n] ⫽ f , int
j,n and
int
dj [n] ⫽ f , j,n
for
0 ⭐ n ⬍ 2⫺j .
(7.172)
The wavelets and scaling functions with support inside [0, 1] are identical to
the wavelets and scaling functions of a basis of L 2 (R). Thus, the corresponding
coefficients aj [n] and dj [n] can be calculated with the decomposition and reconstruction equations given by Theorem 7.10. However, these convolution formulas
must be modified near the boundary where the wavelets and scaling functions are
modified. Boundary calculations depend on the specific design of the boundary
wavelets, as explained in the next three sections. The resulting filter
bank algorithm
#
$
still computes the N coefficients of the wavelet representation aJ , {dj }L⬍j⭐J of aL
with O(N ) operations.
317
318
CHAPTER 7 Wavelet Bases
Wavelet coefficients can also be written as discrete inner products of aL with
discrete wavelets:
int
(7.173)
aj [n] ⫽ aL [m], int
j,n [m] and dj [n] ⫽ aL [m], j,n [m].
As in Section 7.3.3, we verify that
/
0
int
{int
J ,n [m]}0⭐n⬍2⫺J , {j,n [m]}L⬍j⭐J , 0⭐n⬍2⫺j
is an orthonormal basis of CN .
7.5.1 Periodic Wavelets
A wavelet basis {j,n }( j,n)∈Z2 of L 2 (R) is transformed into a wavelet basis of L 2 [0, 1]
by periodizing each j,n . The periodization of f ∈ L 2 (R) over [0, 1] is defined by
⫹⬁
f pér (t) ⫽
f (t ⫹ k).
(7.174)
k⫽⫺⬁
The resulting periodic wavelets are
1
pér
j,n (t) ⫽ √
j
2
⫹⬁
k⫽⫺⬁
t ⫺2 jn⫹k
.
2j
For j ⭐ 0,there are 2⫺j different j,n indexed by 0 ⭐ n ⬍ 2⫺j . If the support of j,n is
pér
pér
included in [0, 1], then j,n (t) ⫽ j,n (t) for t ∈ [0, 1]. Thus, the restriction to [0, 1] of
this periodization modifies only the boundary wavelets with a support that overlaps
t ⫽ 0 or t ⫽ 1.
As indicated in Figure 7.16, such wavelets are transformed into boundary
wavelets that have two disjoint components near t ⫽ 0 and t ⫽ 1. Taken separately,
the components near t ⫽ 0 and t ⫽ 1 of these boundary wavelets have no vanishing
moments, and thus create large signal coefficients, as we shall see later. Theorem 7.16 proves that periodic wavelets together with periodized scaling functions
pér
j,n generate an orthogonal basis of L 2 [0, 1].
0
1
t
FIGURE 7.16
pér
The restriction to [0, 1] of a periodic wavelet j,n has two disjoint components near t ⫽ 0
and t ⫽ 1.
Theorem 7.16. For any J ⭐ 0,
/
0
pér
pér
{j,n }⫺⬁⬍j⭐J ,0⭐n⬍2⫺j , {J ,n }0⭐n⬍2⫺J
is an orthogonal basis of L 2 [0, 1].
(7.175)
7.5 Wavelet Bases on an Interval
Proof. The orthogonality of this family is proved with Lemma 7.2.
Lemma 7.2. Let ␣(t), (t) ∈ L 2 (R). If ␣(t), (t ⫹ k) ⫽ 0 for all k ∈ Z, then
1
␣pér (t) pér (t) dt ⫽ 0.
(7.176)
0
To verify (7.176) we insert the definition (7.174) of periodized functions:
1
␣
pér
(t) 
pér
(t) dt ⫽
0
⫹⬁
⫺⬁
⫹⬁
⫹⬁
⫽
␣(t) pér (t) dt
k⫽⫺⬁ ⫺⬁
␣(t) (t ⫹ k) dt ⫽ 0.
#
$
Since {j,n }⫺⬁⬍j⭐J ,n∈Z , {J ,n }n∈Z is orthogonal in L 2 (R), we can verify that any
two different wavelets or scaling functions ␣pér and pér in (7.175) have necessarily
a nonperiodized version that satisfies ␣(t), (t ⫹ k) ⫽ 0 for all k ∈ Z. Thus, this lemma
proves that (7.175) is orthogonal in L 2 [0, 1].
To prove that this family generates L 2 [0, 1], we extend f ∈ L 2 [0, 1] with zeros outside
[0, 1] and decompose it in the wavelet basis of L 2 (R):
J
⫹⬁
f⫽
⫹⬁
f , j,n j,n ⫹
j⫽⫺⬁ n⫽⫺⬁
f , J ,n J ,n .
(7.177)
n⫽⫺⬁
This zero extension is periodized with the sum (7.174),which defines f pér (t) ⫽ f (t) for t ∈
[0, 1]. Periodizing (7.177) proves that f can be decomposed over the periodized wavelet
family (7.175) in L 2 [0, 1].
■
Theorem 7.16 shows that periodizing a wavelet orthogonal basis of L 2 (R)
defines a wavelet orthogonal basis of L 2 [0, 1]. If J ⫽ 0, then there is a single scaling function, and one can verify that 0,0 (t) ⫽ 1. The resulting scaling coefficient
f , 0,0 is the average of f over [0, 1].
Periodic wavelet bases have the disadvantage of creating high-amplitude wavelet
coefficients in the neighborhood of t ⫽ 0 and t ⫽ 1, because the boundary wavelets
have separate components with no vanishing moments. If f (0) ⫽ f (1), the wavelet
coefficients behave as if the signal were discontinuous at the boundaries. This can
also be verified by extending f ∈ L 2 [0, 1] into an infinite 1 periodic signal f pér and
by showing that
1
⫹⬁
pér
f (t) j,n (t) dt ⫽
f pér (t) j,n (t) dt.
(7.178)
⫺⬁
0
If f (0) ⫽ f (1), then f pér (t) is discontinuous at t ⫽ 0 and t ⫽ 1, which creates highamplitude wavelet coefficients when j,n overlaps the interval boundaries.
Periodic Discrete Transform
For f ∈ L 2 [0, 1] let us consider
pér
aj [n] ⫽ f , j,n and
pér
dj [n] ⫽ f , j,n .
319
320
CHAPTER 7 Wavelet Bases
We verify as in (7.178) that these inner products are equal to the coefficients of a
periodic signal decomposed in a nonperiodic wavelet basis:
aj [n] ⫽ f pér , j,n and dj [n] ⫽ f pér , j,n .
Thus, the convolution formulas of Theorem 7.10 apply if we take into account
the periodicity of f pér . This means that aj [n] and dj [n] are considered as discrete signals of period 2⫺j , and all convolutions in (7.102–7.104) must therefore be
replaced by circular convolutions. Despite the poor behavior of periodic wavelets
near the boundaries, they are often used because the numerical implementation is
particularly simple.
7.5.2 Folded Wavelets
Decomposing f ∈ L 2 [0, 1] in a periodic wavelet basis was shown in (7.178) to be
equivalent to a decomposition of f pér in a regular basis of L 2 (R). Let us extend f
with zeros outside [0, 1].To avoid creating discontinuities with such a periodization,
the signal is folded with respect to t ⫽ 0: f0 (t) ⫽ f (t) ⫹ f (⫺t). The support of f0 is
[⫺1, 1] and it is transformed into a 2 periodic signal, as illustrated in Figure 7.17:
⫹⬁
f repl (t) ⫽
⫹⬁
f0 (t ⫺ 2k) ⫽
k⫽⫺⬁
⫹⬁
f (t ⫺ 2k) ⫹
k⫽⫺⬁
f (2k ⫺ t).
(7.179)
k⫽⫺⬁
Clearly f repl (t) ⫽ f (t) if t ∈ [0, 1], and it is symmetric with respect to t ⫽ 0 and t ⫽ 1.
If f is continuously differentiable, then f repl is continuous at t ⫽ 0 and t ⫽ 1, but its
derivative is discontinuous at t ⫽ 0 and t ⫽ 1 if f ⬘(0) ⫽ 0 and f ⬘(1) ⫽ 0.
Decomposing f repl in a wavelet basis {j,n }( j,n)∈Z2 is equivalent to decomposing
repl
f on a folded wavelet basis. Let j,n be the folding of j,n with the summation
(7.179). One can verify that
1
⫹⬁
repl
f (t) j,n (t) dt ⫽
f repl (t) j,n (t) dt.
(7.180)
⫺⬁
0
Suppose that f is regular over [0, 1].Then f repl is continuous at t ⫽ 0, 1 and produces
smaller boundary wavelet coefficients than f pér . However, it is not continuously
differentiable at t ⫽ 0, 1, which creates bigger wavelet coefficients at the boundary
than inside.
f (t)
0
1
FIGURE 7.17
The folded signal f repl (t) is 2 periodic, symmetric about t ⫽ 0 and t ⫽ 1, and equal to f (t)
on [0, 1].
7.5 Wavelet Bases on an Interval
repl
To construct a basis of L 2 [0, 1] with the folded wavelets j,n , it is sufficient
for (t) to be either symmetric or antisymmetric with respect to t ⫽ 1/2. The Haar
wavelet is the only real compactly supported wavelet that is symmetric or antisymmetric and that generates an orthogonal basis of L 2 (R). On the other hand, if we
loosen up the orthogonality constraint,Section 7.4 proves that there exist biorthogonal bases constructed with compactly supported wavelets that are either symmetric
or antisymmetric. Let {j,n }( j,n)∈Z2 and {˜ j,n }( j,n)∈Z2 be such biorthogonal wavelet
bases. If we fold the wavelets as well as the scaling functions, then for J ⭐ 0,
/
0
repl
repl
{j,n }⫺⬁⬍j⭐J ,0⭐n⬍2⫺j , {J ,n }0⭐n⬍2⫺J
(7.181)
is a Riesz basis of L 2 [0, 1] [174]. The biorthogonal basis is obtained by folding the
dual wavelets ˜ j,n and is given by
/
0
repl
repl
{˜ j,n }⫺⬁⬍j⭐J ,0⭐n⬍2⫺j , {˜ J ,n }0⭐n⬍2⫺J .
(7.182)
˜
If J ⫽ 0, then 0,0 ⫽
0,0 ⫽ 1.
Biorthogonal wavelets of compact support are characterized by a pair of finite
perfect reconstruction filters (h, h̃). The symmetry of these wavelets depends on
the symmetry and size of the filters, as explained in Section 7.4.2. A fast folded
wavelet transform is implemented with a modified filter bank algorithm, where the
treatment of boundaries is slightly more complicated than for periodic wavelets.
The symmetric and antisymmetric cases are considered separately.
repl
repl
Folded Discrete Transform
For f ∈ L 2 [0, 1], we consider
repl
aj [n] ⫽ f , j,n and
repl
dj [n] ⫽ f , j,n .
We verify as in (7.180) that these inner products are equal to the coefficients of a
folded signal decomposed in a nonfolded wavelet basis:
aj [n] ⫽ f repl , j,n and
dj [n] ⫽ f repl , j,n .
The convolution formulas of Theorem 7.10 apply if we take into account the symmetry and periodicity of f repl . The symmetry properties of and imply that
aj [n] and dj [n] also have symmetry and periodicity properties, which must be
taken into account in the calculations of (7.102–7.104).
Symmetric biorthogonal wavelets are constructed with perfect reconstruction
filters h and ĥ of odd size that are symmetric about n ⫽ 0. Then is symmetric
about 0, whereas is symmetric about 1/2. As a result, one can verify that aj [n] is
2⫺j⫹1 periodic and symmetric about n ⫽ 0 and n ⫽ 2⫺j . Thus, it is characterized by
2⫺j ⫹ 1 samples for 0 ⭐ n ⭐ 2⫺j . The situation is different for dj [n], which is 2⫺j⫹1
periodic but symmetric with respect to ⫺1/2 and 2⫺j ⫺ 1/2. It is characterized by
2⫺j samples for 0 ⭐ n ⬍ 2⫺j .
321
322
CHAPTER 7 Wavelet Bases
To initialize this algorithm, the original signal aL [n] defined over 0 ⭐ n ⬍ N ⫺ 1
must be extended by one sample at n ⫽ N , and considered to be symmetric with
respect to n ⫽ 0 and n ⫽ N . The extension is done by setting
aL [N ] ⫽ a$L [N ⫺ 1]. For
#
any J ⬍ L, the resulting discrete wavelet representation {dj }L⬍j⭐J , aJ is characterized by N ⫹ 1 coefficients. To avoid adding one more coefficient, one can modify
symmetry at the right boundary of aL by considering that it is symmetric with
respect to N ⫺ 1/2 instead of N .The symmetry of the resulting aj and dj at the right
boundary is modified accordingly by studying the properties of the convolution
formula (7.157). As a result, these signals are characterized by 2⫺j samples and the
wavelet representation has N coefficients. A simpler implementation of this folding
technique is given with a lifting in Section 7.8.5. This folding approach is used in
most applications because it leads to simpler data structures that keep the number
of coefficients constant. However, the discrete coefficients near the right boundary
cannot be written as inner products of some function f (t) with dilated boundary
wavelets.
Antisymmetric biorthogonal wavelets are obtained with perfect reconstruction
filters h and ĥ of even size that are symmetric about n ⫽ 1/2. In this case, is
symmetric about 1/2 and is antisymmetric about 1/2. As a result, aj and dj are
2⫺j⫹1 periodic and, respectively, symmetric and antisymmetric about ⫺1/2 and
2⫺j ⫺ 1/2. They are both characterized by 2⫺j samples for 0 ⭐ n ⬍ 2⫺j . The algorithm is initialized by considering that aL [n] is symmetric with respect to ⫺1/2 and
N ⫺ 1/2. There #is no need to$ add another sample. The resulting discrete wavelet
representation {dj }L⬍j⭐J , aJ is characterized by N coefficients.
7.5.3 Boundary Wavelets
Wavelet coefficients are small in regions where the signal is regular only if the
wavelets have enough vanishing moments. The restriction of periodic and folded
“boundary” wavelets to the neighborhood of t ⫽ 0 and t ⫽ 1 have,respectively,0 and
1 vanishing moments. Therefore, these boundary wavelets cannot fully take advantage of the signal regularity. They produce large inner products, as if the signal were
discontinuous or had a discontinuous derivative. To avoid creating large-amplitude
wavelet coefficients at the boundaries, one must synthesize boundary wavelets that
have as many vanishing moments as the original wavelet . Initially introduced by
Meyer, this approach has been refined by Cohen, Daubechies, and Vial [174]. The
main results are given without proofs.
Multiresolution of L 2 [0, 1]
A wavelet basis of L 2 [0, 1] is constructed with a multiresolution approximation
{Vjint }⫺⬁⬍j⭐0 . A wavelet has p vanishing moments if it is orthogonal to all polynomials of degree p ⫺ 1 or smaller. Since wavelets at a scale 2 j are orthogonal to
functions in Vjint , to guarantee that they have p vanishing moments we make sure
that polynomials of degree p ⫺ 1 are inside Vjint .
7.5 Wavelet Bases on an Interval
We define an approximation space Vjint ⊂ L 2 [0, 1] with a compactly supported
Daubechies scaling function associated to a wavelet with p vanishing moments.
Theorem 7.7 proves that the support of has size 2p ⫺ 1. We translate so that its
support is [⫺p ⫹ 1, p]. At a scale 2 j ⭐ (2p)⫺1 , there are 2⫺j ⫺ 2p scaling functions
with a support inside [0, 1]:
t ⫺2 jn 1
int
for p ⭐ n ⬍ 2⫺j ⫺ p.
j,n (t) ⫽ j,n (t) ⫽ √ j
2j
2
To construct an approximation space Vjint of dimension 2⫺j we add p scaling
functions with a support on the left boundary near t ⫽ 0:
1
left t
int
(t)
⫽
for 0 ⭐ n ⬍ p,
√
j,n
n
2j
2j
and p scaling functions on the right boundary near t ⫽ 1:
t ⫺1
1
right
int
for 2⫺j ⫺ p ⭐ n ⬍ 2⫺j .
j,n (t) ⫽ √ j 2⫺j ⫺1⫺n
2j
2
Theorem 7.17 constructs appropriate boundary scaling functions {left
n }0⭐n⬍p and
right
{n }0⭐n⬍p .
Theorem 7.17: Cohen, Daubechies, Vial. One can construct boundary scaling functions
right
left
so that if 2⫺j ⭓ 2p, then {int
n and n
j,n }0⭐n⬍2⫺j is an orthonormal basis of a space
int
Vj satisfying
int
Vjint ⊂ Vj⫺1
⎛
⫺ log2 (2p)
lim Vjint ⫽ Closure ⎝
j→⫺⬁
⎞
Vjint ⎠ ⫽ L 2 [0, 1],
j⫽⫺⬁
and the restrictions to [0, 1] of polynomials of degree p ⫺ 1 are in Vjint .
Proof. A sketch of the proof is given. All details are in [174]. Since the wavelet corresponding to has p vanishing moments, the Fix-Strang condition (7.70) implies that
⫹⬁
qk (t) ⫽
nk (t ⫺ n)
(7.183)
n⫽⫺⬁
is a polynomial of degree k. At any scale 2 j , qk (2⫺j t) is still a polynomial of degree k, and
for 0 ⭐ k ⬍ p this family defines a basis of polynomials of degree p ⫺ 1. To guarantee that
polynomials of degree p ⫺ 1 are in Vjint we impose that the restriction of qk (2⫺j t) to [0, 1]
can be decomposed in the basis of Vjint :
p⫺1
qk (2⫺j t) 1[0,1] (t) ⫽
⫺j
a[n] left
n (2 t) ⫹
2⫺j ⫺p⫺1
n⫽p
n⫽0
p⫺1
right
⫹
b[n] n
n⫽0
(2⫺j t ⫺ 2⫺j ).
nk (2⫺j t ⫺ n)
(7.184)
323
324
CHAPTER 7 Wavelet Bases
Since the support of is [⫺p ⫹ 1, p], the condition (7.184) together with (7.183) can be
separated into two nonoverlapping left and right conditions. With a change of variable,
we verify that (7.184) is equivalent to
p
p⫺1
n (t ⫺ n) 1[0,⫹⬁) (t) ⫽
k
n⫽⫺p⫹1
a[n] left
n (t),
(7.185)
right
(7.186)
n⫽0
and
p⫺1
p⫺1
nk (t ⫺ n) 1(⫺⬁,0] (t) ⫽
n⫽⫺p
b[n] n
(t).
n⫽0
int is obtained by imposing that the boundary
The embedding property Vjint ⊂ Vj⫺1
scaling functions satisfy scaling equations. We suppose that left
n has a support [0, p ⫹ n]
and satisfies a scaling equation of the form
p⫺1
⫺1
2⫺1/2 left
n (2 t) ⫽
p⫹2n
left left
Hn,l
l (t) ⫹
l⫽0
hleft
n,m (t ⫺ m),
(7.187)
m⫽p
right
whereas n has a support [⫺p ⫺ n, 0] and satisfies a similar scaling equation on the
left , hleft , H right , and hright are adjusted to verify the polynoright. The constants Hn,l
n,m
n,m
n,l
mial reproduction equations (7.185) and (7.186), while producing orthogonal scaling
int
functions. The resulting family {int
j,n }0⭐n⬍2⫺j is an orthonormal basis of a space Vj .
The convergence of the spaces Vjint to L 2 [0, 1] when 2 j goes to 0 is a consequence of
the fact that the multiresolution spaces Vj generated by the Daubechies scaling function
{j,n }n∈Z converge to L 2 (R).
■
The proof constructs the scaling functions through scaling equations specified
by discrete filters. At the boundaries, the filter coefficients are adjusted to construct
orthogonal scaling functions with a support in [0, 1], and to guarantee that polynomials of degree p ⫺ 1 are reproduced by these scaling functions. Table 7.5 gives the
filter coefficients for p ⫽ 2.
Wavelet Basis of L 2 [0, 1]
int . The support of the
Let Wjint be the orthogonal complement of Vjint in Vj⫺1
Daubechies wavelet with p vanishing moments is [⫺p ⫹ 1, p]. Since j,n is orthogonal to any j,l , we verify that an orthogonal basis of Wjint can be constructed with
the 2⫺j ⫺ 2p inside wavelets with support in [0, 1]:
t ⫺2 jn 1
int
j,n
(t) ⫽ j,n (t) ⫽ √
2j
2j
for
p ⭐ n ⬍ 2⫺j ⫺ p,
to which are added 2p left and right boundary wavelets
t 1
int
j,n
(t) ⫽ √ nleft j
2
2j
for
0 ⭐ n ⬍ p,
7.5 Wavelet Bases on an Interval
325
Table 7.5 Left and Right Border Coefficients for a Daubechies Wavelet with p ⫽ 2 Vanishing Moments
k
l
left
Hk,l
left
Gk,l
k
m
hleft
k,m
left
gk,m
0
0
1
1
0
1
0
1
0.6033325119
0.690895531
0.03751746045
0.4573276599
⫺0.7965436169
0.5463927140
0.01003722456
0.1223510431
0
1
1
1
2
2
3
4
⫺0.398312997
0.8500881025
0.2238203570
⫺0.1292227434
⫺0.2587922483
0.227428117
⫺0.8366028212
0.4830129218
k
l
Hk,l
Gk,l
k
m
hk,m
right
left
gk,m
⫺2
⫺2
⫺1
⫺2
⫺2
⫺1
⫺2
⫺1
0.1901514184
⫺0.1942334074
0.434896998
0.8705087534
⫺0.3639069596
0.3717189665
0.8014229620
⫺0.2575129195
⫺2
⫺2
⫺2
⫺1
⫺5
⫺4
⫺3
⫺3
0.4431490496
0.7675566693
0.3749553316
0.2303890438
0.235575950
0.4010695194
⫺0.7175799994
⫺0.5398225007
h[⫺1]
h[0]
h[1]
h[2]
0.482962913145
0.836516303738
0.224143868042
⫺0.129409522551
right
right
Note: The inside filter coefficients are at the bottom of the table. A table of coefficients for p ⭓ 2 vanishing moments can
be retrieved over the Internet at the FTP site ftp://math.princeton.edu/pub/user/ingrid/interval-tables.
t ⫺1
1
right
int
j,n
(t) ⫽ √ 2⫺j ⫺1⫺n
2j
2j
for
2⫺j ⫺ p ⭐ n ⬍ 2⫺j .
int , the left and right boundary wavelets at any scale 2 j can be
Since Wjint ⊂ Vj⫺1
expanded into scaling functions at the scale 2 j⫺1 . For j ⫽ 1 we impose that the
left boundary wavelets satisfy equations of the form
p⫺1
p⫹2n
t
left left
left
Gn,l
l (t) ⫹
gn,m
(t ⫺ m).
⫽
2
m⫽p
l⫽0
1
√ nleft
2
(7.188)
left , g left ,
The right boundary wavelets satisfy similar equations. The coefficients Gn,l
n,m
right
right
int }
int
Gn,l ,and gn,m are computed so that {j,n
0⭐n⬍2⫺j is an orthonormal basis of Wj .
Table 7.5 gives the values of these coefficients for p ⫽ 2.
For any 2 J ⭐ (2p)⫺1 the multiresolution properties prove that
J
L 2 [0, 1] ⫽ VJint ⊕j⫽⫺⬁ Wjint ,
which implies that
/
int
{int
J ,n }0⭐n⬍2⫺J , {j,n }⫺⬁⬍j⭐J ,0⭐n⬍2⫺j
0
(7.189)
is an orthonormal wavelet basis of L 2 [0, 1]. The boundary wavelets, like the inside
wavelets, have p vanishing moments because polynomials of degree p ⫺ 1 are
included in the space VJint . Figure 7.18 displays the 2p ⫽ 4 boundary scaling functions
and wavelets.
326
CHAPTER 7 Wavelet Bases
left (t)
1
1
1
1
1
0
1
1.5
right (t)
right (t)
left (t)
0
2
0.8
0.5
0.5
0
0
0.6
0
21
0
0.5
1
1.5
2
20.5
0.4
0.2
0
1
20.5
3
23
2
22
left (t)
left (t)
0
0
22
0
21.5
0
1
21
0.5
0
20.5
0
right (t)
right (t)
1
2
21
1
2
2
1
1
0
0
0
21
21
21
22
1
0
0
0.5
1
1.5
2
22
0
1
2
3
22
23
21
22
21
0
22
22
21.5
21
FIGURE 7.18
Boundary scaling functions and wavelets with p ⫽ 2 vanishing moments.
Fast Discrete Algorithm
For any f ∈ L 2 [0, 1] we denote
aj [n] ⫽ f , int
j,n int
and dj [n] ⫽ f , j,n
for 0 ⭐ n ⭐ 2⫺j .
Wavelet coefficients are computed with a cascade of convolutions identical to
Theorem 7.10 as long as filters do not overlap signal boundaries. A Daubechies filter
h is considered here to have a support located at [⫺p ⫹ 1, p]. At the boundary, the
usual Daubechies filters are replaced by boundary filters that relate boundary
wavelets and scaling functions to the finer-scale scaling functions in (7.187)
and (7.188).
Theorem 7.18: Cohen, Daubechies, Vial. If 0 ⭐ k ⬍ p,
p⫺1
aj [k] ⫽
p⫹2k
left
Hk,l
aj⫺1 [l] ⫹
hleft
k,m aj⫺1 [m],
l⫽0
m⫽p
p⫺1
p⫹2k
dj [k] ⫽
left
Gk,l
aj⫺1 [l] ⫹
left
gk,m
aj⫺1 [m].
m⫽p
l⫽0
If p ⭐ k ⬍ 2⫺j ⫺ p,
⫹⬁
aj [k] ⫽
h[l ⫺ 2k] aj⫺1 [l],
l⫽⫺⬁
⫹⬁
dj [k] ⫽
g[l ⫺ 2k] aj⫺1 [l].
l⫽⫺⬁
7.5 Wavelet Bases on an Interval
If ⫺p ⭐ k ⬍ 0,
aj [2⫺j ⫹ k] ⫽
⫺1
⫺p⫺1
Hk,l aj⫺1 [2⫺j⫹1 ⫹ l] ⫹
right
l⫽⫺p
dj [2⫺j ⫹ k] ⫽
⫺1
hk,m aj⫺1 [2⫺j⫹1 ⫹ m],
right
m⫽⫺p⫹2k⫹1
⫺p⫺1
Gk,l aj⫺1 [2⫺j⫹1 ⫹ l] ⫹
gk,m aj⫺1 [2⫺j⫹1 ⫹ m].
right
l⫽⫺p
right
m⫽⫺p⫹2k⫹1
# This cascade
$ algorithm decomposes aL into a discrete wavelet transform
aJ , {dj }L⬍j⭐J with O(N ) operations. The maximum scale must satisfy 2 J ⭐ (2p)⫺1 ,
because the number of boundary coefficients remains equal to 2p at all scales.
The implementation is more complicated than the folding and periodic algorithms
described in Sections 7.5.1 and 7.5.2, but does not require more computations.
The signal aL is reconstructed from its wavelet coefficients, by inverting the
decomposition formula in Theorem 7.18.
Theorem 7.19: Cohen, Daubechies, Vial. If 0 ⭐ l ⭐ p ⫺ 1,
p⫺1
aj⫺1 [l] ⫽
p⫺1
left
Hk,l
aj [k] ⫹
k⫽0
left
Gk,l
dj [k].
k⫽0
If p ⭐ l ⭐ 3p ⫺ 2,
⫹⬁
p⫺1
aj⫺1 [l] ⫽
hleft
k,l aj [k] ⫹
h[l ⫺ 2k] aj [k]
k⫽⫺⬁
k⫽(l⫺p)/2
⫹⬁
p⫺1
⫹
left
gk,l
dj [k] ⫹
g[l ⫺ 2k] dj [k].
k⫽(l⫺p)/2
k⫽⫺⬁
⫹⬁
⫹⬁
If 3p ⫺ 1 ⭐ l ⭐ 2⫺j⫹1 ⫺ 3p,
aj⫺1 [l] ⫽
h[l ⫺ 2k] aj [k] ⫹
k⫽⫺⬁
g[l ⫺ 2k] dj [k].
k⫽⫺⬁
If ⫺p ⫺ 1 ⭓ l ⭓ ⫺3p ⫹ 1,
aj⫺1 [2⫺j⫹1 ⫹ l] ⫽
(l⫹p⫺1)/2
⫹⬁
hk,l aj [2⫺j ⫹ k] ⫹
right
k⫽⫺p
k⫽⫺⬁
(l⫹p⫺1)/2
gk,l dj [2⫺j ⫹ k] ⫹
right
⫹
h[l ⫺ 2k] aj [2⫺j ⫹ k]
⫹⬁
g[l ⫺ 2k] dj [2⫺j ⫹ k].
k⫽⫺⬁
k⫽⫺p
If ⫺1 ⭓ l ⭓ ⫺p,
aj⫺1 [2⫺j⫹1 ⫹ l] ⫽
⫺1
k⫽⫺p
Hk,l aj [2⫺j ⫹ k] ⫹
right
⫺1
k⫽⫺p
Gk,l dj [2⫺j ⫹ k].
right
327
328
CHAPTER 7 Wavelet Bases
The
# original signal
$ aL is reconstructed from the orthogonal wavelet representation aJ , {dj }L⬍j⭐J by iterating these equations for L ⬍ j ⭐ J . This reconstruction is
performed with O(N ) operations.
7.6 MULTISCALE INTERPOLATIONS
Multiresolution approximations are closely connected to the generalized interpolations and sampling theorems studied in Section 3.1.3. Section 7.6.1 constructs
general classes of interpolation functions from orthogonal scaling functions and
derives new sampling theorems. Interpolation bases have the advantage of easily
computing the decomposition coefficients from the sample values of the signal.
Section 7.6.2 constructs interpolation wavelet bases.
7.6.1 Interpolation and Sampling Theorems
Section 3.1.3 explains that a sampling scheme approximates a signal by its orthogonal projection onto a space Us and samples this projection at intervals s. The space
Us is constructed so that any function in Us can be recovered by interpolating a uniform sampling at intervals s. We relate the construction of interpolation functions
to orthogonal scaling functions and compute the orthogonal projector on Us .
An interpolation function any such that {(t ⫺ n)}n∈Z is a Riesz basis of the
space U1 it generates, and that satisfies
1 if n ⫽ 0
(n) ⫽
(7.190)
0 if n ⫽ 0.
Any f ∈ U1 is recovered by interpolating its samples f (n):
⫹⬁
f (t) ⫽
f (n) (t ⫺ n).
(7.191)
n⫽⫺⬁
Indeed, we know that f is a linear combination of the basis vector {(t ⫺ n)}n∈Z
and the interpolation property (7.190) yields (7.191). The Whittaker sampling
Theorem 3.2 is based on the interpolation function
sin t
(t) ⫽
.
t
In this case, space U1 is the set of functions having a Fourier transform support
included in [⫺, ].
Scaling an interpolation function yields a new interpolation for a different
sampling interval. Let us define s (t) ⫽ (t/s) and
2
1
Us ⫽ f ∈ L 2 (R) with f (st) ∈ U1 .
One can verify that any f ∈ Us can be written as
⫹⬁
f (t) ⫽
f (ns) s (t ⫺ ns).
n⫽⫺⬁
(7.192)
7.6 Multiscale Interpolations
Scaling Autocorrelation
We denote by o an orthogonal scaling function, defined by the fact that {o
(t ⫺ n)}n∈Z is an orthonormal basis of a space V0 of a multiresolution approximation.
Theorem 7.2 proves that this scaling function is characterized by a conjugate mirror
filter ho . Theorem 7.20 defines an interpolation function from the autocorrelation
of o [423].
ˆ o ()| ⫽ O((1 ⫹ ||)⫺1 ),
Theorem 7.20. Let ¯ o (t) ⫽ o (⫺t) and h̄o [n] ⫽ ho [⫺n]. If |
then
⫹⬁
¯ o (t)
(t) ⫽
o (u) o (u ⫺ t) du ⫽ o
(7.193)
⫺⬁
is an interpolation function. Moreover,
⫹⬁
t
⫽
h[n] (t ⫺ n)
2
n⫽⫺⬁
(7.194)
with
⫹⬁
ho [m] ho [m ⫺ n] ⫽ ho h̄o [n].
h[n] ⫽
(7.195)
m⫽⫺⬁
Proof. Observe first that
(n) ⫽ o (t), o (t ⫺ n) ⫽ ␦[n],
which proves the interpolation property (7.190). To prove that {(t ⫺ n)}n∈Z is a Riesz
basis of the space U1 it generates, we verify the condition (7.9). The autocorrelation
¯ o (t) has a Fourier transform ()
ˆ
ˆ o ()|2 . Thus, condition (7.9) means
(t) ⫽ o
⫽ |
that there exist B ⭓ A ⬎ 0 such that
⫹⬁
᭙ ∈ [⫺, ],
A⭐
|ˆ o ( ⫺ 2k)|4 ⭐ B.
(7.196)
k⫽⫺⬁
We proved in (7.14) that the orthogonality of a family {o (t ⫺ n)}n∈Z is equivalent to
⫹⬁
᭙ ∈ [⫺, ],
ˆ o ( ⫹ 2k)|2 ⫽ 1.
|
(7.197)
k⫽⫺⬁
Therefore, the right inequality of (7.196) is valid for A ⫽ 1. Let us prove the left inequality.
ˆ o ()| ⫽ O((1 ⫹ ||)⫺1 ), one can verify that there exists K ⬎ 0 such that for all ∈
Since |
2
ˆ o ( ⫹ 2k)|2 ⬍ 1/2, so (7.197) implies that K
ˆ
[⫺, ], |k|⬎K |
k⫽⫺K |o ( ⫹ 2k)| ⭓
1/2. It follows that
K
ˆ o ( ⫹ 2k)|4 ⭓
|
k⫽⫺K
which proves (7.196) for A⫺1 ⫽ 4(2K ⫹ 1).
1
,
4(2K ⫹ 1)
329
330
CHAPTER 7 Wavelet Bases
Since o is a scaling function, (7.23) proves that there exists a conjugate mirror filter
ho such that
1
√ o
2
⫹⬁
t
ho [n] o (t ⫺ n).
⫽
2
n⫽⫺⬁
Computing (t) ⫽ o ¯ o (t) yields (7.194) with h[n] ⫽ ho h̄o [n].
■
Theorem 7.20 proves that the autocorrelation of an orthogonal scaling function o is an interpolation function that also satisfies a scaling equation. One
can design to approximate regular signals efficiently by their orthogonal projection in Us . Definition 6.1 measures the regularity of f with a Lipschitz exponent,
which depends on the difference between f and its Taylor polynomial expansion.
Theorem 7.21 gives a condition for recovering polynomials by interpolating their
samples with . It derives an upper bound for the error when approximating f by
its orthogonal projection in Us .
Theorem 7.21: Fix, Strang. Any polynomial q(t) of degree smaller or equal to p ⫺ 1 is
decomposed into
⫹⬁
q(t) ⫽
q(n) (t ⫺ n)
(7.198)
n⫽⫺⬁
if and only if ĥ() has a zero of order p at ⫽ .
Suppose that this property is satisfied. If f has a compact support and is uniformly
Lipschitz ␣ ⭐ p, then there exists C ⬎ 0 such that
f ⫺ PUs f ⭐ C s␣ .
᭙s ⬎ 0,
(7.199)
Proof. The main steps of the proof are given without technical detail. Let us set s ⫽ 2 j .
One can verify that the spaces {Vj ⫽ U2 j }j∈Z define a multiresolution approximation of
L 2 (R). The Riesz basis of V0 required by Definition 7.1 is obtained with ⫽ . This basis
is orthogonalized by Theorem 7.1 to obtain an orthogonal basis of scaling functions.
Theorem 7.3 derives a wavelet orthonormal basis {j,n }( j,n)∈Z2 of L 2 (R).
Using Theorem 7.4, one can verify that has p vanishing moments if and only if
ĥ() has p zeros at . Although is not the orthogonal scaling function, the Fix-Strang
condition (7.70) remains valid. It is also equivalent that for k ⬍ p,
⫹⬁
qk (t) ⫽
nk (t ⫺ n)
n⫽⫺⬁
is a polynomial of degree k. The interpolation property (7.191) implies that qk (n) ⫽ nk
for all n ∈ Z, so qk (t) ⫽ t k . Since {t k }0⭐k⬍p is a basis for polynomials of degree p ⫺ 1, any
polynomial q(t) of degree p ⫺ 1 can be decomposed over {(t ⫺ n)}n∈Z if and only if ĥ()
has p zeros at .
We indicate how to prove (7.199) for s ⫽ 2 j . The truncated family of wavelets
{l,n }l⭐j,n∈Z is an orthogonal basis of the orthogonal complement of U2 j ⫽ Vj in L 2 (R).
Thus,
j
⫹⬁
f ⫺ PU2 j f ⫽
| f , l,n |2 .
2
l⫽⫺⬁ n⫽⫺⬁
7.6 Multiscale Interpolations
If f is uniformly Lipschitz ␣, since has p vanishing moments,Theorem 6.3 proves that
there exists A ⬎ 0 such that
|Wf (2l n, 2l )| ⫽ | f , l,n | ⭐ A 2(␣⫹1/2)l .
To simplify the argument we suppose that has a compact support, although this is not
required. Since f also has a compact support, one can verify that the number of nonzero
f , l,n is bounded by K 2⫺l for some K ⬎ 0. Thus,
j
K 2⫺l A2 2(2␣⫹1)l ⭐
f ⫺ PU2 j f ⭐
2
l⫽⫺⬁
K A2 2␣j
2 ,
1 ⫺ 2⫺␣
which proves (7.199) for s ⫽ 2 j .
■
As long as ␣ ⭐ p, the larger the Lipschitz exponent ␣, the faster the error f ⫺
PUs f decays to zero when the sampling interval s decreases. If a signal f is Ck
with a compact support, then it is uniformly Lipschitz k, so Theorem 7.21 proves
that f ⫺ PUs f ⭐ C sk .
EXAMPLE 7.11
A cubic spline–interpolation function is obtained from the linear spline–scaling function o .
The Fourier transform expression (7.5) yields
ˆ
ˆ o ()|2 ⫽
()
⫽ |
48 sin4 (/2)
.
4 (1 ⫹ 2 cos2 (/2))
(7.200)
Figure 7.19(a) gives the graph of , which has an infinite support but exponential decay. With
Theorem 7.21, one can verify that this interpolation function recovers polynomials of degree
1
1
0.5
0.5
0
0
25
0
(a)
5
22
0
(b)
FIGURE 7.19
(a) Cubic spline–interpolation function. (b) Deslauriers-Dubuc interpolation function of
degree 3.
2
331
332
CHAPTER 7 Wavelet Bases
3 from a uniform sampling. The performance of spline interpolation functions for generalized
sampling theorems is studied in [162, 468].
EXAMPLE 7.12
Deslauriers-Dubuc [206] interpolation functions of degree 2p ⫺ 1 are compactly supported
interpolation functions of minimal size that decompose polynomials of degree 2p ⫺ 1. One can
verify that such an interpolation function is the autocorrelation of a scaling function o . To
reproduce polynomials of degree 2p ⫺ 1, Theorem 7.21 proves that ĥ() must have a zero of
order 2p at . Since h[n] ⫽ ho h̄o [n], it follows that ĥ() ⫽ |ĥo ()|2 , and thus ĥo () has a
zero of order p at . The Daubechies theorem (7.7) designs minimum-size conjugate mirror
filters ho that satisfy this condition. Daubechies filters ho have 2p nonzero coefficients and
the resulting scaling function o has a support of size 2p ⫺ 1. The autocorrelation is the
Deslauriers-Dubuc interpolation function, which support [⫺2p ⫹ 1, 2p ⫺ 1].
For p ⫽ 1, o ⫽ 1[0,1] and are the piecewise linear tent functions with a support that
[⫺1, 1]. For p ⫽ 2, the Deslauriers-Dubuc interpolation function is the autocorrelation of the
Daubechies 2 scaling function, shown in Figure 7.10. The graph of this interpolation function
is in Figure 7.19(b). Polynomials of degree 2p ⫺ 1 ⫽ 3 are interpolated by this function.
The scaling equation (7.194) implies that any autocorrelation filter verifies h[2n] ⫽ 0 for n ⫽
0. For any p ⭓ 0, the nonzero values of the resulting filter are calculated from the coefficients
of the polynomial (7.168) that is factored to synthesize Daubechies filters. The support of h is
[⫺2p ⫹ 1, 2p ⫺ 1] and
2p⫺1
h[2n ⫹ 1] ⫽ (⫺1)
p⫺n
k⫽0 (k ⫺ p ⫹ 1/2)
(n ⫹ 1/2) ( p ⫺ n ⫺ 1)! ( p ⫹ n)!
for ⫺p ⭐ n ⬍ p.
(7.201)
Dual Basis
If f ∈
/ Us ,then it is approximated by its orthogonal projection PUs f on Us before the
samples at intervals s are recorded. This orthogonal projection is computed with a
˜ s (t ⫺ ns)}n∈Z [82].Theorem 3.4 proves that ˜ s (t) ⫽ s⫺1 (s
˜ ⫺1 t)
biorthogonal basis {
˜
where the Fourier transform of is
ˆ ∗ ()
,̃
() ⫽ ⫹⬁
.
(7.202)
2
ˆ
k⫽⫺⬁ |( ⫹ 2k)|
Figure 7.20 gives the graph of the cubic spline ˜ associated to the cubic spline–
interpolation function. The orthogonal projection of f over Us is computed by
decomposing f in the biorthogonal bases
⫹⬁
PUs f (t) ⫽
f (u), ˜ s (u ⫺ ns) s (t ⫺ ns).
(7.203)
n⫽⫺⬁
¯˜ (t) ⫽
˜ s (⫺t). The interpolation property (7.190) implies that
Let
s
˜ (u ⫺ ns) ⫽ f ¯˜ (ns).
P f (ns) ⫽ f (u),
Us
s
s
(7.204)
7.6 Multiscale Interpolations
1
0.5
0
210
25
0
5
10
FIGURE 7.20
˜
The dual cubic spline (t)
associated to the cubic spline–interpolation function (t) shown
in Figure 7.19(a).
Therefore, this discretization of f through a projection onto Us is obtained by a
¯˜ followed by a uniform sampling at intervals s. The best linear
filtering with
s
approximation of f is recovered with the interpolation formula (7.203).
7.6.2 Interpolation Wavelet Basis
An interpolation function can recover a signal f from a uniform sampling
{ f (ns)}n∈Z if f belongs to an appropriate subspace Us of L 2 (R). Donoho [213] has
extended this approach by constructing interpolation wavelet bases of the whole
space of uniformly continuous signals with the supremum norm.The decomposition
coefficients are calculated from sample values instead of inner product integrals.
Subdivision Scheme
Let be an interpolation function that is the autocorrelation of an orthogonal scaling
function o . Let j,n (t) ⫽ (2⫺j t ⫺ n).The constant 2⫺j/2 that normalizes the energy
of j,n is not added because we shall use a supremum norm f ⬁ ⫽ supt∈R | f (t)|
instead of the L 2 (R) norm, and
j,n ⬁ ⫽ ⬁ ⫽ |(0)| ⫽ 1.
We define the interpolation space Vj of functions
⫹⬁
g⫽
a[n] j,n ,
n⫽⫺⬁
where a[n] has at most a polynomial growth in n. Since is an interpolation function,
a[n] ⫽ g(2 j n). This space Vj is not included in L 2 (R) since a[n] may not have a
finite energy. The scaling equation (7.194) implies that Vj⫹1 ⊂ Vj for any j ∈ Z. If
the autocorrelation filter h has a Fourier transform ĥ() that has a zero of order p
at ⫽ , then Theorem 7.21 proves that polynomials of a degree smaller than p ⫺ 1
are included in Vj .
333
334
CHAPTER 7 Wavelet Bases
For f ∈
/ Vj ,we define a simple projector on Vj that interpolates the dyadic samples
f (2 j n):
⫹⬁
PVj f (t) ⫽
f (2 j n) j (t ⫺ 2 j n).
(7.205)
n⫽⫺⬁
This projector has no orthogonality property but satisfies PVj f (2 j n) ⫽ f (2 j n). Let
C0 be the space of functions that are uniformly continuous over R. Theorem 7.22
proves that any f ∈ C0 can be approximated with an arbitrary precision by PVj f
when 2 j goes to zero.
Theorem 7.22: Donoho. Suppose that has an exponential decay. If f ∈ C0 , then
lim f ⫺ PVj f ⬁ ⫽ lim sup | f (t) ⫺ PVj f (t)| ⫽ 0.
j→⫺⬁
j→⫺⬁ t∈R
(7.206)
Proof. Let (␦, f ) denote the modulus of continuity
(␦, f ) ⫽ sup sup | f (t ⫹ h) ⫺ f (t)|.
(7.207)
|h|⭐␦ t∈R
By definition, f ∈ C0 if lim (␦, f ) ⫽ 0.
␦→0
Any t ∈ R can be written as t ⫽ 2 j (n ⫹ h) with n ∈ Z and |h| ⭐ 1. Since PVj f (2 j n) ⫽
f (2 j n),
| f (2 j (n ⫹ h)) ⫺ PVj f (2 j (n ⫹ h))| ⭐ | f (2 j (n ⫹ h)) ⫺ f (2 j n)|
⫹ |PVj f (2 j (n ⫹ h)) ⫺ PVj f (2 j n)|
⭐ (2 j , f ) ⫹ (2 j , PVj f ).
Lemma 7.3 proves that (2 j , PVj f ) ⭐ C (2 j , f ) where C is a constant independent of
j and f . Taking a supremum over t ⫽ 2 j (n ⫹ h) implies the final result:
sup | f (t) ⫺ PVj f (t)| ⭐ (1 ⫹ C ) (2 j , f ) → 0
t∈R
when
j → ⫺⬁.
Lemma 7.3. There exists C ⬎ 0 such that for all j ∈ Z and f ∈ C0 ,
(2 j , PVj f ) ⭐ C (2 j , f ).
Let us set j ⫽ 0. For |h| ⭐ 1, a summation by parts gives
⫹⬁
PV 0 f (t ⫹ h) ⫺ PV 0 f (t) ⫽
f (n ⫹ 1) ⫺ f (n) h (t ⫺ n),
n⫽⫺⬁
where
⫹⬁
((t ⫹ h ⫺ k) ⫺ (t ⫺ k)) .
h (t) ⫽
k⫽1
(7.208)
7.6 Multiscale Interpolations
Thus,
⫹⬁
|PV 0 f (t ⫹ h) ⫺ PV 0 f (t)| ⭐ sup | f (n ⫹ 1) ⫺ f (n)|
n∈Z
|h (t ⫺ n)|.
(7.209)
n⫽⫺⬁
Since has an exponential decay, there exists a constant C such that if |h| ⭐ 1 and t ∈ R,
then ⫹⬁
n⫽⫺⬁ |h (t ⫺ n)| ⭐ C . Taking a supremum over t in (7.209) proves that
(1, PV 0 f ) ⭐ C sup | f (n ⫹ 1) ⫺ f (n)| ⭐ C (1, f ).
n∈Z
Scaling this result by 2 j yields (7.208).
■
Interpolation Wavelets
The projection PVj f (t) interpolates the values f (2 j n). When reducing the scale by
2, we obtain a finer interpolation PV j⫺1 f (t) that also goes through the intermediate
samples f (2 j (n ⫹ 1/2)). This refinement can be obtained by adding “details” that
compensate for the difference between PVj f (2 j (n ⫹ 1/2)) and f (2 j (n ⫹ 1/2)). To
do this, we create a “detail” space Wj that provides the values f (t) at intermediate
dyadic points t ⫽ 2 j (n ⫹ 1/2).This space is constructed from interpolation functions
centered at these locations, namely j⫺1,2n⫹1 . We call interpolation wavelets
j,n ⫽ j⫺1,2n⫹1 .
Observe that j,n (t) ⫽ (2⫺j t ⫺ n) with
(t) ⫽ (2t ⫺ 1).
The function is not truly a wavelet since it has no vanishing moment. However,
we shall see that it plays the samerole as a wavelet in this decomposition. We define
Wj to be the space of all sums ⫹⬁
n⫽⫺⬁ a[n] j,n . Theorem 7.23 proves that it is a
(nonorthogonal) complement of Vj in Vj⫺1 .
Theorem 7.23. For any j ∈ Z,
Vj⫺1 ⫽ Vj ⊕ Wj .
If f ∈ Vj⫺1 , then
⫹⬁
⫹⬁
f (2 j n) j,n ⫹
f⫽
n⫽⫺⬁
with
dj [n] j,n ,
n⫽⫺⬁
dj [n] ⫽ f 2 j (n ⫹ 1/2) ⫺ PVj f 2 j (n ⫹ 1/2) .
Proof. Any f ∈ Vj⫺1 can be written as
⫹⬁
f⫽
f (2 j⫺1 n) j⫺1,n .
n⫽⫺⬁
(7.210)
335
336
CHAPTER 7 Wavelet Bases
The function f ⫺ PVj f belongs to Vj⫺1 and vanishes at {2 j n}n∈Z . Thus, it can be
decomposed over the intermediate interpolation functions j⫺1,2n⫹1 ⫽ j,n :
⫹⬁
f (t) ⫺ PVj f (t) ⫽
dj [n] j,n (t) ∈ Wj .
n⫽⫺⬁
This proves that Vj⫺1 ⊂ Vj ⊕ Wj . By construction we know that Wj ⊂ Vj⫺1 , so Vj⫺1 ⫽
Vj ⊕ Wj . Setting t ⫽ 2 j⫺1 (2n ⫹ 1) in this formula also verifies (7.210).
■
Theorem 7.23 refines an interpolation from a coarse grid 2 j n to a finer
grid 2 j⫺1 n by adding “details” with coefficients dj [n] that are the interpolation
errors f (2 j (n ⫹ 1/2)) ⫺ PVj f (2 j (n ⫹ 1/2)). The following Theorem 7.24 defines an
interpolation wavelet basis of C0 in the sense of uniform convergence.
Theorem 7.24. If f ∈ C0 , then
J
m
lim f ⫺
m→⫹⬁
l→⫺⬁
m
f (2 J n) J ,n ⫺
n⫽⫺m
dj [n] j,n ⬁ ⫽ 0.
(7.211)
j⫽l n⫽⫺m
The formula (7.211) decomposes f into a coarse interpolation at intervals 2 J
plus layers of details that give the interpolation errors on successively finer dyadic
grids. The proof is done by choosing f to be a continuous function with a compact
support,in which case (7.211) is derived fromTheorem 7.23 and (7.206).The density
of such functions in C0 (for the supremum norm) allows us to extend this result to
any f in C0 . We shall write
⫹⬁
f⫽
J
⫹⬁
f (2 J n) J ,n ⫹
n⫽⫺⬁
dj [n] j,n ,
j⫽⫺⬁ n⫽⫺⬁
which means that [{J ,n }n∈Z , {j,n }n∈Z,j⭐J ] is a basis of C0 . In L 2 (R),“biorthogonal”
scaling functions and wavelets are formally defined by
⫹⬁
˜ J ,n ⫽
f (2 J n) ⫽ f ,
f (t) ˜ J ,n (t) dt,
⫺⬁
dj [n] ⫽ f , ˜ j,n ⫽
⫹⬁
⫺⬁
f (t) ˜ j,n (t) dt.
(7.212)
˜ J ,n (t) ⫽ ␦(t ⫺ 2 J n). Similarly, (7.210) and (7.205) implies that ˜ j,n is a
Clearly,
finite sum of Diracs. These dual-scaling functions and wavelets do not have a finite
energy, which illustrates the fact that [{J ,n }n∈Z , {j,n }n∈Z,j⭐J ] is not a Riesz basis
of L 2 (R).
If ĥ() has p zeros at , then one can verify that ˜ j,n has p vanishing moments.
With similar derivations as in the proof of (6.20) in Theorem 6.4, one can show that
if f is uniformly Lipschitz ␣ ⭐ p, then there exists A ⬎ 0 such that
| f , ˜ j,n | ⫽ |dj [n]| ⭐ A 2␣j .
A regular signal yields small-amplitude wavelet coefficients at fine scales. Thus, we
can neglect these coefficients and still reconstruct a precise approximation of f .
7.6 Multiscale Interpolations
Fast Calculations
The interpolating wavelet transform of f is calculated at scale 1 ⭓ 2 j ⬎ N ⫺1 ⫽ 2L
from its sample values { f (N ⫺1 n)}n∈Z . At each scale 2 j , the values of f in between
samples {2 j n}n∈Z are calculated with the interpolation (7.205):
⫹⬁
PVj f 2 j (n ⫹ 1/2) ⫽
f (2 j k) (n ⫺ k ⫹ 1/2)
k⫽⫺⬁
(7.213)
⫹⬁
f (2 j k) hi [n ⫺ k],
⫽
k⫽⫺⬁
where the interpolation filter hi is a subsampling of the autocorrelation filter h in
(7.195):
hi [n] ⫽ (n ⫹ 1/2) ⫽ h[2n ⫹ 1].
(7.214)
The wavelet coefficients are computed with (7.210):
dj [n] ⫽ f 2 j (n ⫹ 1/2) ⫺ PVj f 2 j (n ⫹ 1/2) .
The reconstruction of f (N ⫺1 n) from the wavelet coefficients is performed recursively by recovering the samples f (2 j⫺1 n) from the coarser sampling f (2 j n) with
the interpolation (7.213) to which is added dj [n]. If hi [n] is a finite filter of size K and
if f has a support in [0, 1], then the decomposition and reconstruction algorithms
require KN multiplications and additions.
A Deslauriers-Dubuc interpolation function has the shortest support while
including polynomials of degree 2p ⫺ 1 in the spaces Vj . The corresponding interpolation filter hi [n] defined by (7.214) has 2p nonzero coefficients for ⫺p ⭐ n ⬍ p,
which are calculated in (7.201). If p ⫽ 2, then hi [1] ⫽ hi [⫺2] ⫽ ⫺1/16 and hi [0] ⫽
hi [⫺1] ⫽ 9/16. Suppose that q(t) is a polynomial of degree smaller or equal to 2p ⫺ 1.
Since q ⫽ PVj q, (7.213) implies a Lagrange interpolation formula
⫹⬁
q 2 j (n ⫹ 1/2) ⫽
q(2 j k) hi [n ⫺ k].
k⫽⫺⬁
The Lagrange filter hi of size 2p is the shortest filter that recovers intermediate values
of polynomials of degree 2p ⫺ 1 from a uniform sampling.
To restrict the wavelet interpolation bases to a finite interval [0, 1] while reproducing polynomials of degree 2p ⫺ 1, the filter hi is modified at the boundaries.
Suppose that f (N ⫺1 n) is defined for 0 ⭐ n ⬍ N . When computing the interpolation
⫹⬁
f (2 j k) hi [n ⫺ k],
PVj f 2 j (n ⫹ 1/2) ⫽
k⫽⫺⬁
if n is too close to 0 or to 2⫺j ⫺ 1, then hi must be modified to ensure that the
support of hi [n ⫺ k] remains inside [0, 2⫺j ⫺ 1].The interpolation PVj f (2 j (n ⫹ 1/2))
337
338
CHAPTER 7 Wavelet Bases
is then calculated from the closest 2p samples f (2 j k) for 2 j k ∈ [0, 1]. The new
interpolation coefficients are computed in order to recover exactly all polynomials of
degree 2p ⫺ 1 [450]. For p ⫽ 2,the problem occurs only at n ⫽ 0 and the appropriate
boundary coefficients are
hi [0] ⫽
5
,
16
hi [⫺1] ⫽
15
,
16
hi [⫺2] ⫽
⫺5
,
16
hi [⫺3] ⫽
1
.
16
The symmetric boundary filter hi [⫺n] is used on the other side at n ⫽ 2⫺j ⫺ 1.
7.7 SEPARABLE WAVELET BASES
To any wavelet orthonormal basis {j,n }( j,n)∈Z2 of L 2 (R), one can associate a
separable wavelet orthonormal basis of L 2 (R2 ):
.
j1 ,n1 (x1 ) j2 ,n2 (x2 )
.
(7.215)
4
( j1 ,j2 ,n1 ,n2 )∈Z
The functions j1 ,n1 (x1 ) j2 ,n2 (x2 ) mix information at two different scales 2 j1 and
2 j2 along x1 and x2 , which we often want to avoid. Separable multiresolutions lead
to another construction of separable wavelet bases with wavelets that are products
of functions dilated at the same scale. These multiresolution approximations also
have important applications in computer vision, where they are used to process
images at different levels of details. Lower-resolution images are indeed represented
by fewer pixels and might still carry enough information to perform a recognition
task.
Signal decompositions in separable wavelet bases are computed with a separable
extension of the filter bank algorithm described in Section 7.7.3. Section 7.7.4 constructs separable wavelet bases in any dimension, and explains the corresponding
fast wavelet transform algorithm. Nonseparable wavelet bases can also be constructed [85, 334] but they are used less often in image processing. Section 7.8.3
gives examples of nonseparable quincunx biorthogonal wavelet bases, which have
a single quasi-istropic wavelet at each scale.
7.7.1 Separable Multiresolutions
As in one dimension, the notion of resolution is formalized with orthogonal projections in spaces of various sizes. The approximation of an image f (x1 , x2 ) at the
resolution 2⫺j is defined as the orthogonal projection of f on a space Vj2 that is
included in L 2 (R2 ). The space Vj2 is the set of all approximations at the resolution
2⫺j . When the resolution decreases, the size of Vj2 decreases as well. The formal
definition of a multiresolution approximation {Vj2 }j∈Z of L 2 (R2 ) is a straightforward extension of Definition 7.1 that specifies multiresolutions of L 2 (R). The same
causality, completeness, and scaling properties must be satisfied.
7.7 Separable Wavelet Bases
We consider the particular case of separable multiresolutions. Let {Vj }j∈Z be a
multiresolution of L 2 (R). A separable two-dimensional multiresolution is composed
of the tensor product spaces
Vj2 ⫽ Vj ⊗ Vj .
(7.216)
The space Vj2 is the set of finite energy functions f (x1 , x2 ) that are linear expansions
of separable functions:
⫹⬁
f (x1 , x2 ) ⫽
a[m] fm (x1 ) gm (x2 )
with
fm ∈ V j ,
gm ∈ Vj .
m⫽⫺⬁
Section A.5 reviews the properties of tensor products. If {Vj }j∈Z is a multiresolution
approximation of L 2 (R),then {Vj2 }j∈Z is a multiresolution approximation of L 2 (R2 ).
Theorem 7.1 demonstrates the existence of a scaling function such that
{j,m }m∈Z is an orthonormal basis of Vj . Since Vj2 ⫽ Vj ⊗ Vj , Theorem A.6 proves
that for x ⫽ (x1 , x2 ) and n ⫽ (n1 , n2 ),
1 x 1 ⫺ 2 j n 1 x2 ⫺ 2 j n 2 2
j,n (x) ⫽ j,n1 (x1 ) j,n2 (x2 ) ⫽ j
2
2j
2j
n∈Z2
is an orthonormal basis of Vj2 . It is obtained by scaling by 2 j the two-dimensional separable scaling function 2 (x) ⫽ (x1 ) (x2 ) and translating it on a two-dimensional
square grid with intervals 2 j .
EXAMPLE 7.13: Piecewise Constant Approximation
Let Vj be the approximation space of functions that are constant on [2 j m, 2 j (m ⫹ 1)] for
any m ∈ Z. The tensor product defines a two-dimensional piecewise constant approximation.
The space Vj2 is the set of functions that are constant on any square [2 j n1 , 2 j (n1 ⫹ 1)] ⫻
[2 j n2 , 2 j (n2 ⫹ 1)], for (n1 , n2 ) ∈ Z2 . The two-dimensional scaling function is
2 (x) ⫽ (x1 ) (x2 ) ⫽
1
0
if 0 ⭐ x1 ⭐ 1 and 0 ⭐ x2 ⭐ 1
.
otherwise.
EXAMPLE 7.14: Shannon Approximation
Let Vj be the space of functions with Fourier transforms that have a support included in
[⫺2⫺j , 2⫺j ]. Space Vj2 is the set of functions the two-dimensional Fourier transforms of
which have a support included in the low-frequency square [⫺2⫺j , 2⫺j ] ⫻ [⫺2⫺j , 2⫺j ].
The two-dimensional scaling function is a perfect two-dimensional low-pass filter the Fourier
transform of which is
ˆ(1 ) (
ˆ 2) ⫽ 1
0
if |1 | ⭐ 2⫺j and |2 | ⭐ 2⫺j
.
otherwise.
339
340
CHAPTER 7 Wavelet Bases
EXAMPLE 7.15: Spline Approximation
Let Vj be the space of polynomial spline functions of degree p that are Cp⫺1 with nodes
located at 2⫺j m for m ∈ Z. The space Vj2 is composed of two-dimensional polynomial spline
functions that are p ⫺ 1 times continuously differentiable. The restriction of f (x1 , x2 ) ∈ Vj2 to
any square [2 j n1 , 2 j (n1 ⫹ 1)] ⫻ [2 j n2 , 2 j (n2 ⫹ 1)] is a separable product q1 (x1 )q2 (x2 ) of two
polynomials of degree at most p.
Multiresolution Vision
An image of 512 ⫻ 512 pixels often includes too much information for real-time
vision processing. Multiresolution algorithms process less image data by selecting
the relevant details that are necessary to perform a particular recognition task [58].
The human visual system uses a similar strategy. The distribution of photoreceptors
on the retina is not uniform. The visual acuity is greatest at the center of the retina
where the density of receptors is maximum. When moving apart from the center,
the resolution decreases proportionally to the distance from the retina center [428].
The high-resolution visual center is called the fovea. It is responsible for highacuity tasks such as reading or recognition. A retina with a uniform resolution
equal to the highest fovea resolution would require about 10,000 times more photoreceptors. Such a uniform resolution retina would increase considerably the size
of the optic nerve that transmits the retina information to the visual cortex and the
size of the visual cortex that processes these data.
Active vision strategies [83] compensate the nonuniformity of visual resolution
with eye saccades,which move successively the fovea over regions of a scene with a
high information content. These saccades are partly guided by the lower-resolution
information gathered at the periphery of the retina. This multiresolution sensor has
the advantage of providing high-resolution information at selected locations and a
large field of view with relatively little data.
Multiresolution algorithms implement in software [125] the search for important
high-resolution data. A uniform high-resolution image is measured by a camera but
only a small part of this information is processed. Figure 7.21 displays a pyramid
of progressively lower-resolution images calculated with a filter bank presented in
Section 7.7.3. Coarse to fine algorithms analyze first the lower-resolution image
and selectively increase the resolution in regions where more details are needed.
Such algorithms have been developed for object recognition and stereo calculations
[284].
7.7.2 Two-Dimensional Wavelet Bases
A separable wavelet orthonormal basis of L 2 (R2 ) is constructed with separable products of a scaling function and a wavelet . The scaling function is associated to
a one-dimensional multiresolution approximation {Vj }j∈Z . Let {Vj2 }j∈Z be the separable two-dimensional multiresolution defined by Vj2 ⫽ Vj ⊗ Vj . Let Wj2 be the detail
7.7 Separable Wavelet Bases
FIGURE 7.21
Multiresolution approximations aj [n1 , n2 ] of an image at scales 2 j for ⫺5 ⭓ j ⭓ ⫺8.
space equal to the orthogonal complement of the lower-resolution approximation
2 :
space Vj2 in Vj⫺1
2
Vj⫺1
⫽ Vj2 ⊕ Wj2 .
(7.217)
To construct a wavelet orthonormal basis of L 2 (R2 ),Theorem 7.25 builds a wavelet
basis of each detail space Wj2 .
Theorem 7.25. Let be a scaling function and be the corresponding wavelet
generating a wavelet orthonormal basis of L 2 (R). We define three wavelets:
1 (x) ⫽ (x1 ) (x2 ),
2 (x) ⫽ (x1 ) (x2 ),
and denote for 1 ⭐ k ⭐ 3,
k
j,n
(x) ⫽
1 k
2j
3 (x) ⫽ (x1 ) (x2 ),
(7.218)
x 1 ⫺ 2 j n 1 x2 ⫺ 2 j n 2
,
.
2j
2j
The wavelet family
.
3
1
2
j,n
, j,n
, j,n
(7.219)
is an orthonormal basis of Wj2 , and
.
3
1
2
j,n
, j,n
, j,n
(7.220)
n∈Z2
( j,n)∈Z3
is an orthonormal basis of L 2 (R2 ).
341
342
CHAPTER 7 Wavelet Bases
Proof. Equation (7.217) is rewritten as
Vj⫺1 ⊗ Vj⫺1 ⫽ (Vj ⊗ Vj ) ⊕ Wj2 .
(7.221)
The one-dimensional multiresolution space Vj⫺1 can also be decomposed into Vj⫺1 ⫽
Vj ⊕ Wj . By inserting this in (7.221), the distributivity of ⊕ with respect to ⊗ proves
that
Wj2 ⫽ (Vj ⊗ Wj ) ⊕ (Wj ⊗ Vj ) ⊕ (Wj ⊗ Wj ).
(7.222)
Since {j,m }m∈Z and {j,m }m∈Z are orthonormal bases of Vj and Wj , we derive that
1
2
j,n1 (x1 ) j,n2 (x2 ), j,n1 (x1 ) j,n2 (x2 ), j,n1 (x1 ) j,n2 (x2 ) (n ,n )∈Z2
1
2
is an orthonormal basis of Wj2 . As in the one-dimensional case, the overall space L 2 (R2 )
can be decomposed as an orthogonal sum of the detail spaces at all resolutions:
2
L 2 (R2 ) ⫽ ⊕⫹⬁
j⫽⫺⬁ Wj .
(7.223)
Thus,
1
2
j,n1 (x1 ) j,n2 (x2 ), j,n1 (x1 ) j,n2 (x2 ), j,n1 (x1 ) j,n2 (x2 ) ( j,n ,n )∈Z3
1
is an orthonormal basis of L 2 (R2 ).
2
■
The three wavelets extract image details at different scales and in different
ˆ and ˆ have an energy mainly concentrated,
directions. Overpositive frequencies,
respectively,on [0, ] and [, 2].The separable wavelet expressions (7.218) imply
that
ˆ 1 ) (
ˆ 2 ),
ˆ 1 (1 , 2 ) ⫽ (
ˆ 2 ),
ˆ 1 ) (
ˆ 2 (1 , 2 ) ⫽ (
ˆ 1 ) (
ˆ 2 ). Thus, |ˆ 1 (1 , 2 )| is large at low horizontal frequenand ˆ 3 (1 , 2 ) ⫽ (
cies 1 and high vertical frequencies 2 , whereas |ˆ 2 (1 , 2 )| is large at high
horizontal frequencies and low vertical frequencies, and |ˆ 3 (1 , 2 )| is large at
high horizontal and vertical frequencies. Figure 7.22 displays the Fourier transform
of separable wavelets and scaling functions calculated from a one-dimensional
Daubechies 4 wavelet.
Suppose that (t) has p vanishing moments and is orthogonal to one-dimensional
polynomials of degree p ⫺ 1. The wavelet 1 has p one-dimensional directional vanishing moments along x2 in the sense that it is orthogonal to any function f (x1 , x2 )
that is a polynomial of degree p ⫺ 1 along x2 for x1 fixed. It is a horizontal directional wavelet that yields large coefficients for horizontal edges, as explained in
Section 5.5.1. Similarly, 2 has p ⫺ 1 directional vanishing moments along x1 and is
a vertical directional wavelet. This is illustrated by the decomposition of a square
later in Figure 7.24. The wavelet 3 has directional vanishing moments along both
x1 and x2 and is therefore not a directional wavelet. It produces large coefficients
7.7 Separable Wavelet Bases
^
343
^
)1 (1, 2))
)2 (1, 2))
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
10
10
0
210 25
210
0
5
10
0
210
210
25
0
5
10
^
^
)3(1, 2))
)2(1, 2))
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
10
10
0
210
210
25
0
5
10
0
0
210
210
5
10
25
FIGURE 7.22
Fourier transforms of a separable scaling function and of three separable wavelets calculated from a
one-dimensional Daubechies 4 wavelet.
at corners or junctions. The three wavelets k for k ⫽ 1, 2, 3 are orthogonal to
two-dimensional polynomials of degree p ⫺ 1.
EXAMPLE 7.16
For a Shannon multiresolution approximation, the resulting two-dimensional wavelet basis
paves the two-dimensional Fourier plane (1 , 2 ) with dilated rectangles. The Fourier transforms ˆ and ˆ are the indicator functions of [⫺, ] and [⫺2, ⫺] ∪ [, 2], respectively.
The separable space Vj2 contains functions with a two-dimensional Fourier transform support
included in the low-frequency square [⫺2⫺j , 2⫺j ] ⫻ [⫺2⫺j , 2⫺j ]. This corresponds to
the support of ˆ 2j,n indicated in Figure 7.23.
2
The detail space Wj2 is the orthogonal complement of Vj2 in Vj⫺1
and thus includes
functions with Fourier transforms supported in the frequency annulus between the two squares
344
CHAPTER 7 Wavelet Bases
2
^
3j21
^2
j21
^
3j21
^
1j21
^
3j21
^
3j
^
1j
^
3j
^2
j
^
2j
^
2j
^3
j
^1
j
^3
j
^
1j
^
2j21
1
^
3j21
FIGURE 7.23
k is mostly concentrated
These dyadic rectangles indicate the regions where the energy of ˆ j,n
j
for 1 ⭐ k ⭐ 3. Image approximations at the scale 2 are restricted to the lower-frequency square.
[⫺2⫺j , 2⫺j ] ⫻ [⫺2⫺j , 2⫺j ] and [⫺2⫺j⫹1 , 2⫺j⫹1 ] ⫻ [⫺2⫺j⫹1 , 2⫺j⫹1 ]. As shown
in Figure 7.23, this annulus is decomposed in three separable frequency regions, which are
k for 1 ⭐ k ⭐ 3. Dilating these supports at all scales 2 j yields an
the Fourier supports of ˆ j,n
exact cover of the frequency plane (1 , 2 ).
For general separable wavelet bases, Figure 7.23 gives only an indication of the domains
where the energy of the different wavelets is concentrated. When the wavelets are constructed
with a one-dimensional wavelet of compact support, the resulting Fourier transforms have side
lobes that appear in Figure 7.22.
EXAMPLE 7.17
Figure 7.24 gives two examples of wavelet transforms computed using separable Daubechies
wavelets with p ⫽ 4 vanishing moments. They are calculated with the filter bank algorithm
from Section 7.7.3. Coefficients of large amplitude in dj1 , dj2 , and dj3 correspond, respectively,
to vertical high frequencies (horizontal edges), horizontal high frequencies (vertical edges),
and high frequencies in both directions (corners). Regions where the image intensity varies
smoothly yield nearly zero coefficients, shown in gray in the figure. The large number of nearly
zero coefficients makes it particularly attractive for compact image coding.
Separable Biorthogonal Bases
One-dimensional biorthogonal wavelet bases are extended to separable biorthogo˜
nal bases of L 2 (R2 ) with the same approach as in Theorem 7.25. Let , and ,
7.7 Separable Wavelet Bases
aL13 d2
L13
1
d L13
d3L13
2
dL12
2
dL11
1
3
d L12
dL12
1
d L11
3
dL11
FIGURE 7.24
Separable wavelet transforms of the Lena image and of a white square in a black background,
decomposed on 3 and 4 octaves, respectively. Black, gray, and white pixels correspond,
respectively, to positive, zero, and negative wavelet coefficients. The disposition of wavelet
k is illustrated on the top left.
image coefficients djk [n, m] ⫽ f , j,n
˜ be two dual pairs of scaling functions and wavelets that generate biorthogonal
wavelet bases of L 2 (R). The dual wavelets of 1, 2 , and 3 defined by (7.218) are
˜ 1 ) (x
˜ 2 ),
˜ 1 (x) ⫽ (x
˜ 2 ),
˜ 1 ) (x
˜ 2 (x) ⫽ (x
˜ 1 ) (x
˜ 2 ).
˜ 3 (x) ⫽ (x
(7.224)
One can verify that
-
3
1
2
, j,n
, j,n
j,n
.
( j,n)∈Z3
(7.225)
345
346
CHAPTER 7 Wavelet Bases
and
-
3
1
2
, ˜ j,n
, ˜ j,n
˜ j,n
.
(7.226)
( j,n)∈Z3
are biorthogonal Riesz bases of L 2 (R2 ).
7.7.3 Fast Two-Dimensional Wavelet Transform
The fast wavelet transform algorithm presented in Section 7.3.1 is extended in two
dimensions. At all scales 2 j and for any n ⫽ (n1 , n2 ), we denote
aj [n] ⫽ f , 2j,n and
k
djk [n] ⫽ f , j,n
for
1 ⭐ k ⭐ 3.
For any pair of one-dimensional filters y[m] and z[m] we write the product filter
yz[n] ⫽ y[n1 ] z[n2 ] and ȳ[m] ⫽ y[⫺m]. Let h[m] and g[m] be the conjugate mirror
filters associated to the wavelet .
The wavelet coefficients at the scale 2 j⫹1 are calculated from aj with twodimensional separable convolutions and subsamplings.The decomposition formulas
are obtained by applying the one-dimensional convolution formulas (7.102) and
(7.103) of Theorem 7.10 to the separable two-dimensional wavelets and scaling
functions for n ⫽ (n1 , n2 ):
aj⫹1 [n] ⫽ aj h̄h̄[2n],
(7.227)
1
[n] ⫽ aj h̄ḡ[2n],
dj⫹1
(7.228)
2
[n] ⫽ aj ḡ h̄[2n],
dj⫹1
(7.229)
3
[n] ⫽ aj ḡ ḡ[2n].
dj⫹1
(7.230)
We showed in (3.70) that a separable two-dimensional convolution can be factored
into one-dimensional convolutions along the rows and columns of the image. With
the factorization illustrated in Figure 7.25(a), these four convolutions equations are
computed with only six groups of one-dimensional convolutions.The rows of aj are
first convolved with h̄ and ḡ and subsampled by 2.The columns of these two output
images are then convolved, respectively, with h̄ and ḡ and subsampled, which gives
1 , d 2 , and d 3 .
the four subsampled images aj⫹1 , dj⫹1
j⫹1
j⫹1
We denote by y̌[n] ⫽ y̌[n1 , n2 ] the image twice the size of y[n], obtained by
inserting a row of zeros and a column of zeros between pairs of consecutive rows and
columns. The approximation aj is recovered from the coarser-scale approximation
k with two-dimensional separable convolutions
aj⫹1 and the wavelet coefficients dj⫹1
derived from the one-dimensional reconstruction formula (7.104)
3
1
2
aj [n] ⫽ ǎj⫹1 hh[n] ⫹ ďj⫹1
hg[n] ⫹ ďj⫹1
gh[n] ⫹ ďj⫹1
gg[n].
(7.231)
These four separable convolutions can also be factored into six groups of onedimensional convolutions along rows and columns, illustrated in Figure 7.25(b).
7.7 Separable Wavelet Bases
Rows
2
2
h
aj
2
g
2
Columns
2
2
h
a j11
2
g
2
d 1j11
2
h
2
d 2j 11
2
g
2
d 3j11
(a)
a j11
Columns
2
h
d 1j11
2
g
d 2j 11
2
h
d 3j11
2
g
1
Rows
h
2
1
2
1
aj
g
(b)
FIGURE 7.25
(a) Decomposition of aj with six groups of one-dimensional convolutions and subsamplings
along the image rows and columns. (b) Reconstruction of aj by inserting zeros between
k , and filtering the output.
the rows and columns of aj⫹1 and dj⫹1
Let b[n] be an input image with pixels at a distance 2L . We associate to b[n] a
function f (x) ∈ VL2 approximated at the scale 2L . Its coefficients aL [n] ⫽ f , 2L,n are defined like in (7.111) by
b[n] ⫽ 2⫺L aL [n] ≈ f (2L n).
(7.232)
The wavelet image representation of aL is computed by iterating (7.227–7.230) for
L⭐j ⬍J:
/
0
aJ , {dj1 , dj2 , dj3 }L⬍j⭐J .
(7.233)
The image aL is recovered from this wavelet representation by computing (7.231)
for J ⬎ j ⭓ L.
Finite Image and Complexity
When aL is a finite image of N ⫽ N1 N2 pixels, we face boundary problems when
computing the convolutions (7.227–7.231). Since the decomposition algorithm is
separable along rows and columns,we use one of the three one-dimensional boundary techniques described in Section 7.5. The resulting values are decomposition
347
348
CHAPTER 7 Wavelet Bases
coefficients in a wavelet basis of L 2 [0, 1]2 . Depending on the boundary treatment,
this wavelet basis is a periodic basis, a folded basis, or a boundary adapted basis.
For square images with N1 ⫽ N2 , the resulting images aj and djk have 2⫺2j samples. Thus, the images of the wavelet representation (7.233) include a total of N
samples. If h and g have size K ,the reader can verify that 2K 2⫺2( j⫺1) multiplications
and additions are needed to compute the four convolutions (7.227–7.230) with the
factorization of Figure 7.25(a). Thus, the wavelet representation (7.233) is calculated with fewer than 8/3 KN operations. The reconstruction of aL by factoring the
reconstruction equation (7.231) requires the same number of operations.
Fast Biorthogonal Wavelet Transform
The decomposition of an image in a biorthogonal wavelet basis is performed with
the same fast wavelet transform algorithm. Let (h̃, g̃) be the perfect reconstruction
filters associated to (h, g). The inverse wavelet transform is computed by replacing
the filters (h, g) that appear in (7.231) by (h̃, g̃).
7.7.4 Wavelet Bases in Higher Dimensions
Separable wavelet orthonormal bases of L 2 (Rp ) are constructed for any p ⭓ 2 with
a procedure similar to the two-dimensional extension. Let be a scaling function
and a wavelet that yields an orthogonal basis of L 2 (R). We denote 0 ⫽ and
1 ⫽ . To any integer 0 ⭐ ⬍ 2 p written in binary form ⫽ 1 , . . . , p , we associate
the p-dimensional functions defined in x ⫽ (x1 , . . . , xp ) by
(x) ⫽ 1 (x1 ) . . . n (xp ).
For ⫽ 0, we obtain a p-dimensional scaling function
0 (x) ⫽ (x1 ) . . . (xp ).
Nonzero indexes correspond to 2 p ⫺ 1 wavelets. At any scale 2 j and for n ⫽
(n1 , . . . , np ), we denote
j,n
(x) ⫽ 2⫺pj/2
x ⫺2 jn
1
2j
1
, ... ,
xp ⫺ 2 j np .
2j
Theorem 7.26. The family obtained by dilating and translating the 2 p ⫺ 1 wavelets for
⫽ 0,
.
j,n
,
(7.234)
p
p⫹1
1⭐⬍2 , ( j,n)∈Z
is an orthonormal basis of L 2 (Rp ).
The proof is done by induction on p. It follows the same steps as the proof of
Theorem 7.25, which associates to a wavelet basis of L 2 (R) a separable wavelet
basis of L 2 (R2 ). For p ⫽ 2,we verify that the basis (7.234) includes three elementary
wavelets. For p ⫽ 3, there are seven different wavelets.
7.7 Separable Wavelet Bases
Fast Wavelet Transform
Let b[n] be an input p-dimensional discrete signal sampled at intervals 2L . We
associate b[n] to an approximation f at the scale 2L with scaling coefficients
0 that satisfy
aL [n] ⫽ f , L,n
b[n] ⫽ 2⫺Lp/2 aL [n] ≈ f (2L n).
The wavelet coefficients of f at scales 2 j ⬎ 2L are computed with separable
convolutions and subsamplings along the p signal dimensions. We denote
0
aj [n] ⫽ f , j,n
and dj [n] ⫽ f , j,n
for
0 ⬍ ⬍ 2 p.
The fast wavelet transform is computed with filters that are separable products of
the one-dimensional filters h and g. The separable p-dimensional low-pass filter is
h0 [n] ⫽ h[n1 ] . . . h[np ].
Let us denote u0 [m] ⫽ h[m] and u1 [m] ⫽ g[m].To any integer ⫽ 1 , . . . , p written
in a binary form, we associate a separable p-dimensional band-pass filter
g [n] ⫽ u1 [n1 ] . . . up [np ].
Let ḡ [n] ⫽ g [⫺n]. One can verify that
aj⫹1 [n] ⫽ aj h̄0 [2n],
(7.235)
[n] ⫽ aj ḡ [2n].
dj⫹1
(7.236)
We denote by y̌[n] the signal obtained by adding a zero between any two samples
of y[n] that are adjacent in the p-dimensional lattice n ⫽ (n1 , . . . , np ). It doubles
the size of y[n] along each direction. If y[n] has M p samples, then y̌[n] has (2M)p
samples. The reconstruction is performed with
2 p ⫺1
aj [n] ⫽ ǎj⫹1 h [n] ⫹
0
ďj⫹1
g [n].
(7.237)
⫽1
The 2 p separable convolutions needed to compute aj and {dj }1⭐⭐2 p as well as
the reconstruction (7.237) can be factored in 2 p⫹1 ⫺ 2 groups of one-dimensional
convolutions along the rows of p-dimensional signals. This is a generalization of the
two-dimensional case, illustrated in Figure 7.25. The wavelet representation of aL is
/
0
{dj }1⭐⬍2 p , L⬍j⭐J , aJ .
(7.238)
It is computed by iterating (7.235) and (7.236) for L ⭐ j ⬍ J . The reconstruction of
aL is performed with the partial reconstruction (7.237) for J ⬎ j ⭓ L.
If aL is a finite signal of size N1 , · · · , Np , the one-dimensional convolutions are
modified with one of the three boundary techniques described in Section 7.5. The
resulting algorithm computes decomposition coefficients in a separable wavelet
349
350
CHAPTER 7 Wavelet Bases
basis of L 2 [0, 1]p . If N1 ⫽ · · · ⫽ Np , the signals aj and dj have 2⫺pj samples. Like aL ,
the wavelet representation (7.238) is composed of N samples. If the filter h has K
nonzero samples, then the separable factorization of (7.235) and (7.236) requires
pK 2⫺p( j⫺1) multiplications and additions. Thus, the wavelet representation (7.238)
is computed with fewer than p(1 ⫺ 2⫺p )⫺1 KN multiplications and additions. The
reconstruction is performed with the same number of operations.
7.8 LIFTING WAVELETS
The lifting scheme, introduced by Sweldens [451, 452], factorizes orthogonal and
biorthogonal wavelet transforms into elementary spatial operators called liftings.
It has two main applications. The first one is an acceleration of the fast wavelet
transform algorithm. The filter bank convolution and subsampling operations are
factorized into elementary filterings on even and odd samples, which reduces the
number of operations by nearly 2. Border treatments are also simplified. This is also
called a paraunitary filter bank implementation. Readers mainly interested in this
fast lifting implementation can skip directly to Section 7.8.5, which can be read
independently.
The second application is the design of wavelets adapted to multidimensionalbounded domains and surfaces, which is not possible with a Fourier transform
approach. Section 7.8.1 introduces biorthogonal multiresolution and wavelet bases
over nonstationary grids for arbitrary domains. The lifting construction of biorthogonal wavelet bases is explained in Section 7.8.2, with the resulting fast lifting
wavelet transform. Section 7.8.3 describes an application to nonseparable quincunx wavelet bases for images. The construction of wavelet bases over bounded
domains and surfaces is explained in Section 7.8.4, with computer graphics
examples.
7.8.1 Biorthogonal Bases over Nonstationary Grids
The lifting scheme constructs wavelet bases over an arbitrary domain ⍀ to represent
functions of finite energy defined over ⍀. This section defines biorthogonal filters
and wavelets that may be modified both in space and in scale to be adapted to the
domain geometry. Section 7.8.2 explains the calculation of these filters and wavelets
with a lifting scheme.
Embedded Grids
Biorthogonal multiresolutions,defined in Section 7.4, are generalized by considering
nested spaces
{0} ⊂ . . . ⊂ Vj⫹1 ⊂ Vj ⊂ Vj⫺1 ⊂ . . . ⊂ L2 (⍀),
which are defined over embedded approximation grids Gj ⊂ Gj⫺1 . Each index n ∈ Gj
is associated to a sampling point xn ∈ ⍀. Since the sampling grids are embedded,
7.8 Lifting Wavelets
this position does not change when the index is moved to finer grids n ∈ Gk for
k ⭐ j. Each space Vj is equipped with a Riesz basis {j,n }n∈Gj parameterized by the
approximation grid Gj .
Embedded grids are decomposed with complementary grids Cj :
Gj⫺1 ⫽ Gj ∪ Cj .
For example, over the interval [0, 1], the grid Gj⫺1 is the set of 2 j⫺1 n ∈ [0, 1] for
0 ⭐ n ⬍ 2⫺j . It is decomposed into the even grid points of 2 j⫺1 2n of Gj and the
odd grid points 2 j⫺1 (2n ⫹ 1) of Cj . A corresponding vector space decomposition
is defined Vj⫺1 ⫽ Vj ⊕ Wj , where the detail space Wj has a Riesz basis of wavelets
{j,m }m∈Cj indexed on the complementary grid Cj .
A dual-biorthogonal wavelet family {˜ j,m }m∈Gj and a dual basis of scaling
functions {˜ j,n }n∈Gj satisfy the biorthogonality conditions
˜ j,n⬘ ⫽ ␦[n ⫺ n⬘]
j,n ,
and j,m , ˜ j⬘,m⬘ ⫽ ␦[ j ⫺ j⬘] ␦[m ⫺ m⬘].
(7.239)
The resulting wavelet families {j,m }m∈Gj ,j∈Z and {˜ j,m }m∈Gj ,j∈Z are biorthogonal
wavelet bases of L 2 (⍀), which implies that for any f ∈ L 2 (⍀),
˜ J ,n ⫹
f , J ,n
f⫽
n∈GJ
f , j,m ˜ j,m ,
j⭓J m∈Cj
where 2 J is an arbitrary coarsest scale. The difference with the biorthogonal
wavelets of Section 7.4 is that scaling functions and wavelets are typically not translations and dilations of mother scaling functions and wavelets to be adapted to ⍀.
As a result, the decomposition filters are not convolution filters.
Spatially Varying Filters
The spatially varying filters associated to this biorthogonal multiresolution satisfy
j,n ⫽
hj [n, l] j⫺1,l
and j,m ⫽
l∈Gj⫺1
gj [m, l] j⫺1,l .
(7.240)
l∈Gj⫺1
Over a translation-invariant domain, hj [n, l] ⫽ h[n ⫺ 2l] are the perfect reconstruction filters of Section 7.3.2. Dual filters h̃ and g̃ are defined similarly by
˜ j⫺1,l
h̃j [n, l]
˜ j,n ⫽
and
˜ j⫺1,l .
g̃j [m, l]
˜ j,m ⫽
l∈Gj⫺1
(7.241)
l∈Gj⫺1
The biorthogonality relations (7.239) between wavelets and scaling functions
imply equivalent biorthogonality filter relations for all n, n⬘ ∈ Gj and m, m⬘ ∈ Cj :
gj [m, l] g̃j [m⬘, l] ⫽ ␦[m ⫺ m⬘]
l∈Gj⫺1
hj [n, l] h̃j [n⬘, l] ⫽ ␦[n ⫺ n⬘],
(7.242)
hj [n, l] g̃j [m, l] ⫽ 0.
(7.243)
l∈Gj⫺1
gj [m, l] h̃j [n, l] ⫽ 0
l∈Gj⫺1
l∈Gj⫺1
351
352
CHAPTER 7 Wavelet Bases
Wavelets and scaling functions can be written as an infinite product of these
filters. If these products converge in L 2 (⍀) and the filters that satisfy (7.242) and
(7.243), then the resulting wavelets and scaling functions define biorthogonal bases
of L 2 (⍀), which satisfy (7.239) [452].
To simplify notations, the filters hj and gj are also written as matrices that
transform discrete vector
᭙ a ∈ C|Gj⫺1 | ,
᭙ n ∈ Gj , (Hj a)[n] ⫽
hj [n, l] a[l]
l∈Gj⫺1
᭙ a ∈ C|Gj⫺1 | ,
᭙ m ∈ Cj , (Gj a)[m] ⫽
gj [m, l] a[l]
l∈Gj⫺1
and similarly for the dual matrices H̃j and G̃j .
The biorthogonality conditions (7.242) are rewritten as
/
0 Id
0
H̃j
Gj
Hj∗ Gj∗ ⫽
,
0
IdCj
G̃j
(7.244)
where A∗ is the complex transpose of a matrix A.
Vanishing Moments
Wavelets with p1 vanishing moments are orthogonal to polynomials of a degree
strictly smaller than p1 . Let d1 be the dimension of the space of polynomials of degree
q ⫺ 1 in dimension d. If d ⫽ 1, then d1 ⫽ p1 , and if d ⫽ 2, then d1 ⫽ p1 (p1 ⫹ 1)/2.
Such wavelets are orthogonal to a basis {q (k) }0⭐k⬍d1 of this polynomial space,
defined over ⍀ ⊂ Rd :
᭙ j, ᭙ m ∈ Cj , ᭙ k ⬍ d1 ,
j,m (x) q (k) (x) dx ⫽ 0,
(7.245)
⍀
and similarly for dual wavelets with p2 vanishing moments,
᭙ j, ᭙ m ∈ Cj , ᭙ k ⬍ d2 ,
˜ j,m (x) q (k) (x) dx ⫽ 0.
⍀
Lifting steps are used to increase the number of vanishing moments.
7.8.2 Lifting Scheme
The lifting scheme builds filters over arbitrary domains ⍀ as a succession of elementary lifting steps applied to lazy wavelets that are Diracs. Each lifting step transforms
a family of biorthogonal filters into new biorthogonal filters that define wavelets with
more vanishing moments.
Lazy Wavelets
A lifting begins from lazy wavelets,which are Diracs on grid points.The lazy discrete
orthogonal wavelet transform just splits the coefficients of a grid Gj⫺1 into the two
7.8 Lifting Wavelets
subgrids Gj and Cj . It corresponds to filters that are Diracs on these grids:
'
hoj [n, l] ⫽ h̃oj [n, l] ⫽ ␦[n ⫺ l],
᭙ l ∈ Gj⫺1 , ᭙ n ∈ Gj , ᭙ m ∈ Cj ,
gjo [m, l] ⫽ g̃jo [m, l] ⫽ ␦[m ⫺ l].
For a ∈ C|Gj⫺1 | , the vector Hjo a ∈ C|Gj | is the restriction of a to Gj , and Gjo a ∈ C|Cj | is
the restriction of a to Cj .
Since each index n ∈ Gj is associated to a sampling point xn ∈ ⍀ that does not
o (x) ⫽ o ⫽ ␦(x ⫺ x ) and
depend on the scale index j, one can verify that ˜ j,m
m
j,m
o
o
˜ j,n (x) ⫽ j,n ⫽ ␦(x ⫺ xn ), meaning that for any continuous function f (x):
o
o
f (x)j,m
(x)dx ⫽ f (x)˜ j,m
(x)dx ⫽ f (xm ).
⍀
⍀
This lazy wavelet basis is improved with a succession of liftings.
Increasing Vanishing Moments, Stability, and Regularity
A lifting modifies biorthogonal filters in order to increase the number of vanishing
moments of the resulting biorthogonal wavelets, and hopefully their stability and
regularity.
Increasing the number of vanishing moments decreases the amplitude of wavelet
coefficients in regions where the signal is regular, which produces a more sparse
representation. However, increasing the number of vanishing moments with a lifting also increases the wavelet support, which is an adverse effect that increases the
number of large coefficients produced by isolated singularities.
Each lifting step maintains the filter biorthogonality but provides no control on
the Riesz bounds and thus on the stability of the resulting wavelet biorthogonal
basis. When a basis is orthogonal then the dual basis is equal to the original basis.
Having a dual basis that is similar to the original basis is therefore an indication of
stability. As a result,stability is generally improved when dual wavelets have as much
vanishing moments as original wavelets and a support of similar size. This is why a
lifting procedure also increases the number of vanishing moments of dual wavelets.
It can also improve the regularity of the dual wavelet.
A lifting design is computed by adjusting the number of vanishing moments.
The stability and regularity of the resulting biorthogonal wavelets are measured a
posteriori, hoping for the best. This is the main weakness of this wavelet design
procedure.
Prediction
Starting from an initial set of biorthogonal filters {hoj , gjo , h̃oj , g̃jo }j , a prediction step
modifies each filter gjo to increase the number of vanishing moments of j,n . This is
done with an operator Pj that predicts the values in the grid Cj from samples in the
grid Gj :
᭙ a ∈ C|Gj | , ᭙ m ∈ Cj , (Pj a)[m] ⫽
pj [m, n] a[n].
n∈Gj
353
354
CHAPTER 7 Wavelet Bases
The number of vanishing moments of j,n is increased by modifying the filter gjo
with this predictor
hj ⫽ hoj
gj [m, l] ⫽ gjo [m, l] ⫺
and
pj [m, n] hoj [n, l].
(7.246)
n∈Gj
Biorthogonality is maintained by also modifying the dual filter h̃oj :
g̃j ⫽ g̃jo
and
h̃j [n, l] ⫽ h̃oj [n, l] ⫹
pj [m, n] g̃jo [m, l].
m∈Cj
The filter lifting (7.246) implies a retransformation of the scaling and wavelet
coefficients computed with the original filters hoj and gjo . The lifted scaling coefficients {aj [n] ⫽ f , j,n }n∈Gj and detail coefficients {dj [m] ⫽ f , j,m }m∈Cj are
computed from the coefficients {djo [m], aoj [n]}n∈Gj ,m∈Cj corresponding to hoj and
gjo by applying the predict operator
᭙ m ∈ Cj ,
dj [m] ⫽ djo [m] ⫺
pj [m, n] aoj [n],
n∈Gj
while the scaling coefficients are not modified:aj [n] ⫽ aoj [n]. If Pj is a good predictor
of djo [m] from aoj [n] on the coarse grid Gj , then the resulting coefficients dj [m] are
smaller, which is an indication that the wavelet has more vanishing moments.
The prediction (7.246) of the filters is rewritten with matrix notations
'
'
Hj ⫽ Hjo
H̃j ⫽ H̃jo ⫹ Pj∗ G̃jo
and
(7.247)
o
o
Gj ⫽ G j ⫺ P j H j
G̃j ⫽ G̃jo .
Since Hj ⫽ Hjo is not modified, the scaling functions j,n ⫽ oj,n are not modified.
In contrast, since H̃jo is modified, both the dual-scaling and wavelet functions are
modified:
j,n ⫽ oj,n ,
o
j,m ⫽ j,m
⫺
pj [m, n] oj,n ,
(7.248)
n∈Gj
˜ j⫺1,l ⫹
hoj [n, l]
˜ j,n ⫽
l∈Gj⫺1
pj [m, n] ˜ j,m ,
m∈Cj
˜ j⫺1,l . (7.249)
gjo [m, l]
˜ j,m ⫽
l∈Gj⫺1
Theorem 7.27 proves that this lifting step maintains the biorthogonality conditions
[451].
Theorem 7.27: Sweldens. The prediction (7.247) transforms the biorthogonal filters
{Hjo , Gjo , H̃jo , G̃jo } into a new set of biorthogonal filters {Hj , Gj , H̃j , G̃j }.
Proof. The lifting step (7.247) is written in matrix notation as
IdGj
Hj
⫽
⫺Pj
Gj
0
IdCj
Hjo
Gjo
and
IdGj
H̃j
⫽
0
G̃j
Pj∗
IdCj
!
"
H̃jo
.
G̃jo
7.8 Lifting Wavelets
The proof of the biorthogonality relation (7.244) follows from
IdGj
0
Pj
IdCj
IdGj
⫺Pj
⫽
0
IdCj
IdGj
0
0
.
IdCj
&
■
To increase the number of vanishing moments of j,m , and get ⍀ j,m (x)q (k) (x)
dx ⫽ 0 for a basis of d1 polynomial of degree p1 , (7.248) shows that predict
coefficients must satisfy
o
(k)
j,m (x)q (x) dx ⫽
pj [m, n]
oj,n (x)q (k) (x) dx for 0 ⭐ k ⬍ d1 . (7.250)
⍀
⍀
n∈Gj
The predictor { pj [m, n]}n can be chosen to have d1 nonzero coefficients that solve
this d1 ⫻ d1 linear system for each m.
Update
The prediction (7.248) modifies j,n but does not change ˜ j,n . The roles of j,m
and ˜ j,m are reversed by applying a lifting step to increase the number of vanishing
moments of ˜ j,m as well. The goal is to obtain dual wavelets ˜ j,n that are as similar
as possible to j,n in order to improve the stability of the basis. It requires the use
of an update operator Uj defined by
᭙ b ∈ C|Cj | , ᭙ n ∈ Gj , (Uj b)[n] ⫽
uj [n, m] b[m].
m∈Cj
The update step is then
'
Hj ⫽ Hjo ⫹ Uj Gjo
Gj ⫽ Gjo
and
'
H̃j ⫽ H̃jo
G̃j ⫽ G̃jo ⫺ Uj∗ H̃jo .
(7.251)
Since predict and update steps are equivalent operations on dual filters,
Theorem 7.27 shows that this update operation defines new filters that remain
biorthogonal.
Let {djo [m], aoj [n]}n∈Gj ,m∈Cj be the wavelet and scaling coefficients corresponding to the filters hoj and gjo . The wavelet coefficients are not modified dj [n] ⫽ djo [n],
and {aj [n]}n∈Gj is computed by applying the update operator
᭙ n ∈ Gj ,
aj [n] ⫽ aoj [n] ⫹
uj [n, m] djo [m].
m∈Cj
The updated scaling functions and wavelets are:
j,n ⫽
hoj [n, l] j⫺1,l ⫹
l∈Gj⫺1
uj [n, m] j,m ,
j,m ⫽
m∈Cj
˜ j,n ⫽ ˜ oj,n ,
gjo [m, l] j⫺1,l , (7.252)
l∈Gj⫺1
uj [n, m] ˜ oj,n .
o
˜ j,m ⫽ ˜ j,m
⫺
n∈Gj
(7.253)
355
356
CHAPTER 7 Wavelet Bases
Theorem 7.28 proves that this update does not modify the number of vanishing
moments of the analyzing wavelet.
o }
Theorem 7.28. If the wavelets {j,m
j,m have p1 vanishing moments, then the wavelets
{j,m }j,m obtained with the update (7.252) have p1 vanishing moments.
˜ o ⫽
˜ j,n . Since the original multiresolution has p1
Proof. Equation (7.253) shows that
j,n
vanishing moments, it is orthogonal to a basis of d1 polynomials {q (k) }0⭐k⬍d1 of degree
q ⫺ 1 as defined in (7.245):
᭙ k ⬍ d1 ,
q (k) , oj,n ˜ oj,n ⫽
q (k) ⫽
n∈Gj
q (k) , oj,n ˜ j,n .
n∈Gj
Taking the inner product of this relation with each j,n⬘ leads to
᭙ k ⬍ d1 ,
q (k) , oj,n ˜ j,n , j,n⬘ ⫽ q (k) , oj,n⬘ .
q (k) , j,n⬘ ⫽
(7.254)
n∈Gj
Using the refinement equation (7.252) gives
᭙ k ⬍ d1 ,
j,m , q (k) ⫽
gjo [m, l]j⫺1,l , q (k) ⫽
l∈Gj⫺1
gjo [m, l]oj⫺1,l , q (k) l∈Gj⫺1
o
gjo [m, l] oj⫺1,l , q (k) ⫽ j,m
, q (k) ⫽ 0,
⫽
l∈Gj⫺1
where we used j⫺1,l , q (k) ⫽ oj⫺1,l , q (k) , which follows from (7.254).
■
To increase the number of vanishing moments of ˜ j,m and to get
(k)
˜
⍀ j,m (x)q (x) dx ⫽ 0 for a basis of d2 polynomials of degree p2 in ⍀, (7.252)
shows that update coefficients must satisfy
o
᭙k ⬍ d2 ,
(x)q (k) (x) dx ⫽
uj [n, m]
(7.255)
˜ oj,n (x)q (k) (x) dx.
˜ j,m
&
⍀
n∈Gj
⍀
Thus, the update coefficients {uj [m, n]}n can be chosen to have d2 nonzero
coefficients, which solves this d2 ⫻ d2 linear system for each m.
Predict plus Update Design and Algorithm
Wavelets synthesized with a lifting are constructed with one predict step followed
by one update step, because there is no technique that controls the wavelet stability and regularity over more lifting steps. Beginning from Dirac lazy wavelets
o ,
˜ o }j,m with no vanishing moment, a prediction (7.247) obtains biorthog{j,m
j,m
1 ,
˜ 1 }j,m with, respectively, p1 and zero vanishing moments.
onal wavelets {j,m
j,m
An update (7.251) then yields biorthogonal wavelets {j,m , ˜ j,m }j,m having, respectively, p1 and p2 vanishing moments.
A fast wavelet transform computes the coefficients
᭙ n ∈ Gj ,
aj [n] ⫽ f , j,n and
᭙ m ∈ Cj ,
dj [m] ⫽ f , j,m 7.8 Lifting Wavelets
aoj
aj1
Lazy
do
j
Pj
aj
Uj
aoj
Uj
Pj
Lazy1
aj1
do
dj
j
FIGURE 7.26
Predict and update decomposition of aj⫺1 into aj and dj , followed by the reconstruction of
aj⫺1 with the same update and predict operators.
by replacing convolution with conjugate mirror filters by a succession of lifting
and update steps. The algorithm takes in input a discrete signal aL ∈ C|GL | of length
N ⫽ |GL |, and applies the lazy decomposition, a predict, and an update operator, as
illustrated in Figure 7.26. For j ⫽ L ⫹ 1, . . . , J , it computes
1. Split: ᭙ m ∈ Cj , djo [m] ⫽ aj⫺1 [m],
2. Forward predict: ᭙ m ∈ Cj ,
3. Forward update: ᭙ n ∈ Gj ,
᭙ n ∈ Gj , aoj [n] ⫽ aj⫺1 [n].
dj [m] ⫽ djo [m] ⫺ n∈Gj pj [m, n] aoj [n].
aj [n] ⫽ aoj [n] ⫹ m∈Cj uj [n, m] dj [m].
This fast biorthogonal wavelet transform (7.157) requires O(N ) operations.
The reconstruction of aL from {dj }J ⭐j⬍L and aJ inverts these predict and update
steps. For j ⫽ J , . . . , L ⫹ 1, it computes
1. Backward update: ᭙ n ∈ Gj , aoj [n] ⫽ aj [n] ⫺ m∈Cj uj [n, m] dj [m].
2. Backward predict: ᭙ m ∈ Cj , djo [m] ⫽ dj [m] ⫹ n∈Gj pj [m, n] aoj [n].
3. Merge: ᭙ m ∈ Cj ,
aj⫺1 [m] ⫽ djo [m],
᭙ n ∈ Gj ,
aj⫺1 [n] ⫽ aoj [n].
Vanishing Moments
The predict and update filters are computed to create vanishing moments on the
0 (x) ⫽ ␦(x ⫺
resulting wavelets. After the prediction applied to the lazy Diracs j,m
xm ), 0j,n (x) ⫽ ␦(x ⫺ xn ), the resulting wavelets and scaling functions derived from
(7.248) and (7.249) are:
1j,n (x) ⫽ ␦(x ⫺ xn ),
1
j,m
(x) ⫽ ␦(x ⫺ xm ) ⫺
pj [m, n] ␦(x ⫺ xn )
(7.256)
1
˜ j,m
(x) ⫽ ␦(x ⫺ xm ).
(7.257)
n∈Gj
˜ 1j,n (x) ⫽ ␦(x ⫺ xn ) ⫹
pj [m, n] ␦(x ⫺ xm ),
m∈Cj
1 has p vanishing moments if for each m, the
According to (7.250), the wavelet j,m
1
p1 coefficients of { pj [m, n]}n solve the d1 ⫻ d1 linear system:
᭙ m ∈ Cj , ᭙ k ⬍ d1 ,
q (k) (xm ) ⫽
pj [m, n]q (k) (xn ).
n∈Gj
(7.258)
357
358
CHAPTER 7 Wavelet Bases
Following (7.253),after the update of the dual wavelet and the scaling functions,
are
˜ 2 ⫽ ˜ 1 ⫽ ˜ 2
j,n
j,n
j⫺1,n ⫹
pj [m, n] ˜ 2j⫺1,m
(7.259)
m∈Cj
uj [n, m] ˜ 1j,n ⫽ ˜ 2j⫺1,m ⫺
2
1
⫽ ˜ j,m
⫺
˜ j,m
n∈Gj
uj [n, m] ˜ 2j,m .
(7.260)
n∈Gj
1 are transferred to 2 after
Theorem 7.28 proves that the vanishing moments of j,m
j,m
2 has p vanishing
the dual lifting step. According to (7.258), the dual wavelet ˜ j,m
2
moments if for each m the d2 coefficients of {uj [m, n]}n solve the d2 ⫻ d2 linear
system:
᭙ k ⬍ d2 ,
k
Ijk (n) uj [n, m] ⫽ Ij⫺1
(m)
with
Ijk (n) ⫽ ˜ j,n , q (k) .
(7.261)
n∈Gj
The inner products Ijk (n) are computed iteratively with (7.259):
k
Ijk (n) ⫽ Ij⫺1
(n) ⫹
k
pj [m, n] Ij⫺1
(m).
(7.262)
m∈Cj
The recurrence is initialized at the finest scale j ⫽ L by setting ILk (n) ⫽ q (k) (xn ),where
xn ∈ ⍀ is the point associated to the index n ∈ GL . More elaborated initializations
using quadrature formula can also be used [427].
Linear Splines 5/3 Biorthogonal Wavelets
Linear spline wavelets are obtained with a two-step lifting construction beginning
from lazy wavelets.The one-dimensional grid Gj⫺1 is a uniform sampling at intervals
2 j⫺1 and the two subgrids Gj and Cj correspond to even and odd subsampling.
Figure 7.27 illustrates these embedded one-dimensional grids.
The lazy wavelet transform splits the coefficients aj⫺1 into two groups
᭙ n ∈ Gj ,
aoj [n] ⫽ aj⫺1 [n],
and ᭙ m ∈ Cj ,
djo [m] ⫽ aj⫺1 [m].
The value of djo in Cj is predicted with a linear interpolation of neighbor values in Gj
aoj [n ⫹ 2 j⫺1 ] ⫹ aoj [n ⫺ 2 j⫺1 ]
᭙ m ∈ Cj , dj [m] ⫽ djo [m] ⫺
.
(7.263)
2
This lifting step creates wavelets with two vanishing moments because this linear
interpolation predicts exactly the values of polynomials of degree 0 and 1.
A symmetric update step computes
᭙ n ∈ Gj ,
aj [n] ⫽ aoj [n] ⫹ (uj [n, m ⫺ 2 j⫺1 ] dj [m ⫺ 2 j⫺1 ]
⫹ uj [n, m ⫹ 2 j⫺1 ] dj [m ⫹ 2 j⫺1 ]).
(7.264)
˜ j,n (t), t k are
To obtain two vanishing moments, the inner products Ijk (n) ⫽
computed iteratively with (7.262),using two nonzero update coefficients uj [n, m ⫺
7.8 Lifting Wavelets
n m 2 j 1
m 2 j 1
m
aj 1[n]
aj 1[m]
aoj[n]
d oj[m]
Gj 1
Lazy
Predict
Update
Gj Cj 1
1
2
2
1
4
dj [m]
1
4
aj [n]
FIGURE 7.27
Predict and update steps for the construction of linear spline wavelets.
˜ j,n , 1 ⫽ 2 j⫺L .
2 j⫺1 ] and uj [n, m ⫹ 2 j⫺1 ] for each m. For k ⫽ 0 we get Ij0 (n) ⫽
Since t is antisymmetric, if this equation is valid for k ⫽ 0 and if uj [n, m ⫺ 2 j⫺1 ] ⫽
uj [n, m ⫹ 2 j⫺1 ], then it is valid for k ⫽ 1. Solving (7.261) for k ⫽ 0 gives uj [n, m ⫺
2 j⫺1 ] ⫽ uj [n, m ⫹ 2 j⫺1 ] ⫽ 1/4 and thus,
1
aj [n] ⫽ aoj [n] ⫹ (dj [m ⫺ 2 j⫺1 ] ⫹ dj [m ⫹ 2 j⫺1 ]).
4
(7.265)
Figure 7.27 illustrates the succession of predict and update. One can verify
(Exercise 7.20) that the resulting biorthogonal wavelets correspond to the spline
biorthogonal wavelets computed with p1 ⫽ p2 ⫽ 2 vanishing moments (shown in
Figure 7.15).The dual-scaling functions and wavelets are compactly supported linear
splines. Higher-order biorthogonal spline wavelets are constructed with a prediction
(7.263) and an update (7.264) providing more vanishing moments.
7.8.3 Quincunx Wavelet Bases
Separable two-dimensional wavelet bases are constructed in Section 7.7.2 from
one-dimensional wavelet bases. They are implemented with separable filter banks
that increase the scale by 2, by dividing the image grid in a coarse grid that keeps
one point out of four, plus three detail grids of the same size and that correspond
to three different wavelets. Other regular subsamplings of the image array lead to
nonseparable wavelet bases. A quincunx subsampling divides the image grid into a
coarse grid that keeps one point out of two and a detail grid of the same size that
corresponds to a quincunx wavelet. Thus, the scale increases only by a factor 21/2 .
A quincunx wavelet is more isotropic than separable wavelets.
Biorthogonal or orthogonal quincunx wavelets are constructed with perfect
reconstruction or conjugate mirror filters, defined with a quincunx subsampling,
which yields conditions on their transfer functions [170, 253, 254, 431]. Kovac̃ević
359
360
CHAPTER 7 Wavelet Bases
n
n
n
n
Gj Cj Gj 1 Cj 1 FIGURE 7.28
Two successive quincunx subsampling for j and j ⫹ 1, where j is odd.
and Sweldens [333] give a simple construction of biorthogonal quincunx wavelets
from lazy wavelets, with a predict followed by an update lifting.
We denote by (a, b)∗ a transposed two-dimensional vector column. Embedded
quincunx grids are defined by
1
2
Gj ⫽ L j Z2 ⫽ L j (n1 , n2 )∗ : n ∈ Z2
᭙ j,
Cj ⫽ L j Z2 ⫹ L j⫺1 e1 ⫽ Gj ⫹ L j⫺1 e1 ,
1 ⫺1
where L ⫽
, e1 ⫽ (1, 0)∗ and e2 ⫽ (0, 1).∗
1 1
Figure 7.28 shows two quincunx subsampled grids, depending on the parity of j.
Each point n ∈ Gj and m ∈ Cj have four neighbors
{n ⫹ }⌬j ⊂ Cj
and
{m ⫹ }⌬j ⊂ Gj ,
where the set of shifts has the following four elements:
⌬j ⫽ {⫾ L j⫺1 e1, ⫾ L j⫺1 e2 }.
The simplest symmetric prediction operator on these grids is the symmetric
averaging on the four neighbors, which performs a linear interpolation:
᭙ a ∈ C|Gj | ,
᭙ m ∈ Cj ,
(Pj a)[m] ⫽
1
4
a[m ⫹ ].
(7.266)
∈⌬j
This prediction operator applied to lazy wavelets yields a wavelet that is orthogonal
to constant and linear polynomial in R2 ,which gives p1 ⫽ 2 vanishing moments.The
update operator is defined with the same symmetry on the four neighbors
᭙ b ∈ C |C j | ,
᭙ n ∈ Gj ,
(Uj b)[n] ⫽
b[n ⫹ ].
(7.267)
∈⌬j
Choosing ⫽ 1/4 satisfies the vanishing moments conditions (7.261) for constant
and linear polynomials [333].
7.8 Lifting Wavelets
p 52
2
p 54
2
p 56
2
FIGURE 7.29
Quincunx dual wavelets ˜ j,m and ˜ j⫹1,m at two consecutive scales 2 j and 2 j⫹1/2 (first and
second row) with progressively more vanishing moments.
Figure 7.29 shows dual wavelets ˜ j,m and ˜ j⫹1,m at two consecutive scales
j
2 and 2 j⫹1/2 , corresponding to the predict and update operators (7.266) and
(7.267). They have p2 ⫽ 2 vanishing moments and are nearly isotropic. The analyzing wavelets j,m also have p1 ⫽ 2 vanishing moments but are more irregular.
The irregularity of analyzing wavelets is not a problem since the reconstruction is
performed by dual wavelets. Wavelets with more vanishing moments are obtained
by replacing the four-neighborhood linear interpolation (7.266) by higher-order
polynomial interpolations [333]. Figure 7.29 shows the resulting dual wavelets ˜ j,m
for p2 ⫽ 4 and p2 ⫽ 6, which correspond to wavelets j,m with as much vanishing
moments p1 ⫽ p2 , but which are more irregular. Increasing the number of vanishing
moments improves the regularity of dual wavelets, but it reduces the stability of
these biorthogonal bases, which limits their application.
Figure 7.30 shows an example of quincunx wavelet image transform with p1 ⫽ 2
vanishing moments. It is computed with the fast lifting algorithm from Section 7.8.2
with the predict and update operators (7.266) and (7.267). The quasi-isotropic
quincunx wavelet detects sharp transitions in all directions.
7.8.4 Wavelets on Bounded Domains and Surfaces
Processing three-dimensional surfaces and signals defined on surfaces is important
in multimedia and computer graphics applications. Biorthogonal wavelets on triangulated meshes were introduced by Lounsbery et al. [354], and Schröder and
Sweldens [427] improved these techniques with lifting schemes. Lifted wavelets
are used to compress functions defined on a surface ⍀ ⊂ R3 , and in particular on
a sphere to process geographical data on Earth. The sphere is represented with a
recursively subdivided three-dimensional mesh, and the signal is processed using
lifted wavelets on this embedded mesh.
361
362
CHAPTER 7 Wavelet Bases
dj23
dj21
d.
dj22
Image f
dj
dj21
Wavelet coefficients
dj22
dj23
FIGURE 7.30
Image decomposition in a biorthogonal quincunx wavelet basis with p1 ⫽ 2 vanishing moments.
The top right image shows wavelet coefficients packed over the image-sampling array. These
coefficients are displayed as square quincunx grids (below ) with a rotation for odd scales.
Lifted wavelets are also defined on a two-dimensional parametric domain ⍀ with
an arbitrary topology, to compress a three-dimensional surface S ⊂ R3 , viewed as a
mapping from ⍀ to R3 . The surface is represented as a three-dimensional mesh,
and the lifted wavelet transform computes three coefficients for each vertex of
the mesh—one per coordinate. Denoising and compression of surfaces are then
implemented by thresholding and quantizing these wavelet coefficients. Such multiresolution processings have applications in video games, where a large amount of
three-dimensional surface data must be displayed in real time. Lifting wavelets also
7.8 Lifting Wavelets
finds applications in computer-aided design, where surfaces are densely sampled to
represent manufactured objects, and should be compressed to reduce the storage
requirements.
Semiregular Triangulation
A triangulation is a data structure frequently used to index points xn ∈ ⍀ on a surface.
Embedded indexes n ∈ Gj with a triangulation topology are defined by recursively
subdividing a coarse triangulated mesh.
For each scale j, a triangulation (Ej , Tj ) is composed of edges Ej ⊂ Gj ⫻ Gj that
link pairs of points on the grid, and triangles Tj ⊂ Gj ⫻ Gj ⫻ Gj . Each triangle of Tj is
composed of three edges in Ej . These triangulations are supposed to be embedded
using the 1:4 subdivision rule of each triangle,illustrated in Figure 7.31, as follows.
■
■
For each edge e ∈ Ej , a midpoint ␥(e) ∈ Gj⫺1 is added to the vertices
1
2
Gj⫺1 ⫽ Gj ∪ ␥(e) : e ∈ Ej .
Each edge is subdivided into two finer edges
᭙ e ⫽ (n0 , n1 ) ∈ Ej ,
1 (e) ⫽ (n0 , ␥(e))
and
2 (e) ⫽ (n1 , ␥(e)).
The subdivided set of edges is then
2
1
Ej⫺1 ⫽ i (e) : i ⫽ 1, 2 and e ∈ Ej .
■
Each triangle face f ⫽ (n0 , n1 , n2 ) ∈ Fj is subdivided into four faces
1 ( f ) ⫽ (n0 , ␥(n0 , n1 ), ␥(n0 , n2 )), 2 ( f ) ⫽ (n1 , ␥(n1 , n0 ), ␥(n1 , n2 )),
3 ( f ) ⫽ (n2 , ␥(n2 , n0 ), ␥(n2 , n1 )), 4 ( f ) ⫽ (␥(n0 , n1 ), ␥(n1 , n2 ), ␥(n2 , n0 )).
(a)
j50
j 5 21
j 5 22
j 5 23
(b)
FIGURE 7.31
(a) Iterated regular subdivision 1:4 of one triangle in four equal subtriangles. (b) Planar
triangulation (G0 , E0 , T0 ) of a domain ⍀ in R2 , successively refined with a 1:4 subdivision.
363
364
CHAPTER 7 Wavelet Bases
j⫽0
j ⫽ ⫺1
j ⫽ ⫺2
FIGURE 7.32
Examples of semiregular mesh {Gj , Ej , Tj }j of a domain ⍀ that is a surface S in R3 .
The subdivided set of faces is then
1
2
Fj⫺1 ⫽ i ( f ) : i ⫽ 1, 2, 3, 4 and f ∈ Fj .
Figure 7.31 shows an example of coarse planar triangulation with points xn in a
domain ⍀ of R2 .
The semiregular mesh {Ej , Tj }j and the corresponding sample locations xn are
usually computed from an input surface S, represented either with a parametric
continuous function or with an unstructured set of polygons. This requires using
a hierarchical remeshing process to compute the embedded triangulation [354].
Figure 7.32 shows an example of semiregular triangulation of a three-dimensional
surface S, and in this case the points xn belong to a domain ⍀ that is the surface
S in R3 .
Wavelets to Process Functions on Surfaces
Let us consider a domain ⍀ ∈ R3 that is a three-dimensional surface, and each n ∈ Gj
indexes a point xn ∈ ⍀ of the surface. Figure 7.33 shows the local labeling convention
for the neighboring vertices in Gj of a given index m ∈ Cj in a butterfly neighborhood.
A predictor Pj is computed in this local neighborhood
2
᭙ m ∈ Cj , (Pj a)[m] ⫽
i
i a[vm
]⫹
i⫽1,2
i,j
i
i a[wm
]⫹
i⫽1,2
i,j a[zm ],
(7.268)
i,j⫽1
where the parameters i , i , i,j are calculated by solving (7.258) to obtain
vanishing moments for a collection of low-degree polynomials {q (k) }0⭐k⬍d1 [427].
The update operator ensures that the dual wavelets have one vanishing moment.
It is calculated on the direct neighbors in Cj of each point n ∈ Gj :
1
2
᭙ n ∈ Gj , Vn ⫽ m ⫽ ␥(n, n⬘) ∈ Cj : (n, n⬘) ∈ Ej .
The update operator is parameterized as follows:
᭙ b ∈ C|Cj | , ᭙ n ∈ Gj , (Uj b)[n] ⫽ n
b[m],
m∈Vn
(7.269)
7.8 Lifting Wavelets
1
1, 1
2, 1
wm
zm
1
vm
: grid Gj
1, 2
zm
: grid Cj
m
zm
2
vm
2
wm
2, 2
zm
FIGURE 7.33
Labeling of points in the butterfly neighborhood of a vertex m ∈ Cj .
where each n is computed to solve the system (7.261) to obtain vanishing
moments.This requires the iterative computation of the moments Ijk (n),as explained
in (7.262), with an initialization ILk (n) ⫽ 1 at the finest scale L of the mesh. In
a perfectly regular triangulation, where all the points have six neighbors, a constant weight n ⫽ can be used, and one can check that n ⫽ 1/24 guarantees one
vanishing moment.
Processing signals on a sphere is an important application of these wavelets on
surfaces [427]. The triangulated mesh is obtained by a 1:4 subdivision of a regular
polyhedron, for instance a tetrahedron. The positions of the points xn for n ∈ Gj are
defined recursively by projecting midpoints of the edges on the sphere:
᭙ (n0 , n1 ) ∈ Ej ,
x␥(n0 ,n1 ) ⫽
x n0 ⫹ x n1
.
xn0 ⫹ xn1 The signal f ∈ C|GL | is defined on the finest mesh GL .
A nonlinear approximation is obtained by setting to zero wavelet coefficients that
satisfy | f , j,m | ⭐ T j,m where T is a given threshold.The
value of j,m can be
approximated from the size of its support j,m ∼ supp(j,n ) [427]. Figure 7.34
shows an example of such a nonlinear approximation with an image of Earth defined
as a function on the sphere.
Wavelets to Process Surfaces
A three-dimensional surface S ⊂ R3 is represented as a mapping from a twodimensional parameter domain ⍀ to R3 . This surface is discretized with a semiregular mesh {Gj , Ej , Tj }L⭐j⭐0 , and thus ⍀ can be chosen as the finest grid GL viewed
as an abstract domain. The surface is a discrete mapping from ⍀ ⫽ GL to R3 that
assigns to each n ∈ ⍀ three values ( f1 [n], f2 [n], f3 [n]) ∈ S, which is a position in the
three-dimensional space.
Processing the discrete surface is equivalent to processing the three signals
( f1 , f2 , f3 ) where each fi ∈ C|GL | is defined on the finest grid GL . Since points in
365
366
CHAPTER 7 Wavelet Bases
100%
5%
2%
FIGURE 7.34
Nonlinear approximation of a function defined on a sphere using a decreasing number of large
wavelet coefficients.
j 5 22
j 5 23
j 5 24
FIGURE 7.35
Example of dual wavelets ˜ j,k (x) for x on a subdivided equilateral triangle. The height over the
triangle indicates the value of the wavelet.
the parametric domain ⍀ do not have positions in Euclidean space, the notion of
vanishing moments is not well defined, and the predict operator is computed using
weights that are calculated as if the faces of the mesh were equilateral triangles. One
can verify that the resulting parameters of the predict operator (7.268) are i ⫽ 1/2,
i ⫽ 1/4, and i,j ⫽ ⫺1/16 [427]. Figure 7.35 shows the corresponding wavelets on
a subdivided equilateral triangle.
These wavelets are used to compress each of the three signals fi ∈ C|GL | by uniformly quantizing the normalized coefficients fi , j,m /j,m .The resulting set of
quantized coefficients are within strings with an entropy coder algorithm,described
in Section 10.2.1. The quantization and coding of sparse signal representation is
described in Section 10.4.
Figure 7.36 shows an example of three-dimensional surface approximation using
biorthogonal wavelets on triangulated meshes. Wavelet coders based on lifting offer
state-of-the-art results in surface compression [327].There is no control on the Riesz
bounds of the lifted wavelet basis, because the lifted basis depends on the surface
properties, but good approximation results are obtained.
7.8 Lifting Wavelets
100%
10%
5%
2%
FIGURE 7.36
Nonlinear surface approximation using a decreasing proportion of large wavelet coefficients.
7.8.5 Faster Wavelet Transform with Lifting
Lifting is used to reduce the number of operations of a one-dimensional fast wavelet
transform by factorizing the filter bank convolutions. It also reduces the memory
requirements by implementing all operations,in place,within the input signal array.
Before the introduction of liftings for wavelet design, the factorization of filter
bank convolutions was studied as paraunitary filter banks by Vaidyanathan [68]
and other signal-processing researchers. Instead of implementing a filter bank with
convolutions and subsamplings, specific filters are designed for even and odd signal
coefficients. These filters are shown to be factorized as a succession of elementary
operators that correspond to the predict and update operators of a lifting. This
filtering architecture reduces by up to 2 the number of additions and multiplications,
and simplifies folding border treatments for symmetric wavelets.
The fast wavelet transform algorithm in Sections 7.3.1 and 7.3.2 decomposes
iteratively the scaling coefficients aj [n] at a scale 2 j into larger-scale coefficients
aj⫹1 [n] and detail wavelet coefficients dj⫹1 [n], with convolutions and subsampling
using two filters h and g. According to (7.157),
aj⫹1 [n] ⫽ aj h̄[2n],
dj⫹1 [n] ⫽ aj ḡ[2n],
(7.270)
with h̄[n] ⫽ h[⫺n] and ḡ[n] ⫽ g[⫺n].The reconstruction is performed with the dual
filters h̃ and g̃ according to (7.158),
aj [n] ⫽ ǎj⫹1 h̃[n] ⫹ ď j⫹1 g̃[n].
(7.271)
367
368
CHAPTER 7 Wavelet Bases
Sweldens and Daubechies [199] proved that the convolutions and subsamplings
(7.270) can be implemented with a succession of lifting steps and the reconstruction
(7.271) by inverting these liftings.
Each uniform one-dimensional signal sampling grid Gj of aj [n] is divided into
even samples in Gj⫹1 and odd samples in Cj where aj⫹1 and dj⫹1 are, respectively,
defined. Starting from the lazy wavelet transform that splits even and odd samples of
aj [n],K couples of predict and update convolution operators {P (k) , U (k) }1⭐k⭐K are
sequentially applied. Each predict operator P (k) corresponds to a filter p(k) [n] and
each update operator to a filter u(k) [m]. The filters have a small support of typically
two coefficients. They are computed from the biorthogonal filters (h, g, h̃, g̃) with
a Euclidean division algorithm [199]. A final scaling step renormalizes the filter
coefficients.
Let aL [n] be the input finest-scale signal for n ∈ GL . The lifting implementation
of a fast wavelet transform proceeds as follow. For j ⫽ L ⫹ 1, . . . , J :
(0)
(0)
1. Even–odd split: ᭙ m ∈ Cj , dj [m] ⫽ aj⫺1 [m], and ᭙ n ∈ Gj , aj [n] ⫽
aj⫺1 [n]. The following steps 2 and 3 are performed for k ⫽ 1, . . . , K .
(k)
(k⫺1)
2. Forward predict: ᭙ m ∈ Cj , dj [m] ⫽ dj
[m] ⫺ n∈Gj p(k) [m ⫺
(k⫺1)
n] aj
[n].
(k)
(k⫺1)
3. Forward update: ᭙ n ∈ Gj , aj [n] ⫽ aj
[n] ⫹
(K )
4. Scaling: The output coefficients are dj ⫽ dj
m∈Cj u
(k) [n ⫺ m] d (k) [m].
j
(K )
/ and aj ⫽ aj .
The fast inverse wavelet transform recovers aL from the set of wavelet coefficients {dj }0⭐j⬍L and aJ by inverting these predict and update operators. For
j ⫽ J , . . . , L ⫹ 1:
(K )
(K )
1. Inverse scaling: Initialize dj ⫽ dj and aj
and 3, are performed for k ⫽ K , . . . , 1.
(k⫺1)
2. Backward update: ᭙ n ∈ Gj , aj
(k)
(k)
⫽ aj /. The following steps, 2
[n] ⫽ aj [n] ⫺
m∈Cj u
(k) [n
⫺ m] dj [m].
(k⫺1)
3. Backward predict: ᭙ m ∈ Cj , dj
(k⫺1)
⫺ n] aj
(k)
[m] ⫽ dj [m] ⫹
n∈Gj p
(k) [m
[n].
(0)
4. Merge even–odd samples: ᭙ m ∈ Cj , aj⫺1 [m] ⫽ dj [m] and ᭙ n ∈ Gj ,
(0)
aj⫺1 [n] ⫽ aj [n].
One can verify (Exercise 7.21) that this implementation divides the number
of operations by up to a factor of 2, compared to direct convolutions and subsamplings calculated in (7.270) and (7.271). Moreover, this algorithm proceeds “in
place,” which means that it only uses the memory of the original array GL with
7.8 Lifting Wavelets
coefficients that are progressively modified by the lifting operations; it then stores
the resulting wavelet coefficients. Similarly, the reconstruction operates in place by
reconstructing progressively the coefficients in the array GL .
Symmetric Lifting on the Interval
The lifting implementation is described in more detail for symmetric wavelets on
an interval,corresponding to symmetric biorthogonal filters (h, g, h̃, g̃). Predict and
update convolutions are modified at the boundaries to implement folding boundary
conditions.
The input sampling grid GL has N ⫽ 2⫺L samples at integer positions 0 ⭐ n ⬍ N .
The embedded subgrids at scales 0 ⭓ 2 j ⬎ 2L are
1
Gj ⫽ kN 2 j : 0 ⭐ k ⬍ 2⫺j
2
2
1
and Cj ⫽ kN 2 j ⫹ N 2 j⫺1 : 0 ⭐ k ⬍ 2⫺j .
(7.272)
The resulting lifting operators {P (k) , U (k) }1⭐k⭐K are two-tap symmetric convolution
operators. A prediction of parameter is defined by
᭙ m ∈ Cj , m ⬍ N ⫺ N 2 j⫺1 ,
(P a)[m] ⫽ a[m ⫹ N 2 j⫺1 ] ⫹ a[m ⫺ N 2 j⫺1 ] . (7.273)
An update of parameter is defined as
᭙ n ∈ Gj , n ⬎ 0,
(U b)[n] ⫽ b[n ⫹ N 2 j⫺1 ] ⫹ b[n ⫺ N 2 j⫺1 ] .
(7.274)
At the interval boundaries, the predict and update operators must be modified for
m ⫽ N ⫺ N 2 j⫺1 in (7.273) and n ⫽ 0 in (7.274), because one of the two neighbors
falls outside the grids Gj and Cj .
Symmetric boundary conditions, described in Section 7.5.2, are implemented
with a folding, which leads to the following definition of boundary predict and
update operators:
(P a)[N ⫺ N 2 j⫺1 ] ⫽ 2 a[N ⫺ N 2 j ]
and
(U b)[0] ⫽ 2 b[N 2 j⫺1 ].
This ensures that the lifting operators have one vanishing moment, and that the
resulting boundary wavelets also have one vanishing moment.
Factorization of 5/3 and 9/7 Biorthogonal Wavelets
The 9/7 biorthogonal and 5/3 wavelets in Figure 7.15 are recommended for image
compression in JPEG-2000 and are often used in wavelet image-processing applications. The 5/3 biorthogonal wavelet has p1 ⫽ p2 ⫽ 2 vanishing moments, while the
9/7 wavelet has p1 ⫽ p2 ⫽ 4 vanishing moments.
The 5/3 wavelet transform is implemented with K ⫽ 1 pair of predict and update
steps, presented in Section 7.8.2. They correspond to
√
P (1) ⫽ P1/2 , U (1) ⫽ U1/4 , and ⫽ 2.
369
370
CHAPTER 7 Wavelet Bases
The 9/7 wavelet transform is implemented with K ⫽ 2 pairs of predict and update
steps defined as
(1)
P ⫽ P␣ , U (1) ⫽ U ,
␣ ⫽ 1.58613434342059,  ⫽ ⫺0.0529801185729,
where
P (2) ⫽ P␥ , U (2) ⫽ U␦ ,
␥ ⫽ ⫺0.8829110755309, ␦ ⫽ 0.4435068520439,
and the scaling constant is ⫽ 1.1496043988602.
7.9 EXERCISES
7.1
2 Let h be a conjugate mirror filter associated to a scaling function .
(a) Prove that if ĥ() is Cp and has a zero of order p at , then the lth-order
ˆ (l) (2k) ⫽ 0 for any k ∈ Z ⫺ {0} and l ⬍ p.
derivative
& ⫹⬁ q
q
(b) Derive that if q ⬍ p, then ⫹⬁
n⫽⫺⬁ n (n) ⫽ ⫺⬁ t (t) dt.
⫹⬁
n⫽⫺⬁ (t ⫺ n) ⫽ 1 if is an orthogonal scaling function.
7.2
2 Prove that
7.3
2 Let be the Battle-Lemarié scaling function of degree m defined in (7.18).
m
Let be the Shannon scaling function defined by ˆ ⫽ 1[⫺,] . Prove that
lim m ⫺ ⫽ 0.
m→⫹⬁
7.4
3 Suppose that h[n] is nonzero only for 0 ⭐ n ⬍ K .We denote m[n] ⫽
⫺1
The scaling equation is (t) ⫽ K
n⫽0 m[n] (2t ⫺ n).
√
2 h[n].
(a) Suppose that K ⫽ 2. Prove that if t is a dyadic number that can be written
in binary form with i digits, t ⫽ 0.1 2 · · · i with k ∈ {0, 1}, then (t) is
the product
(t) ⫽ m[0 ] ⫻ m[1 ] ⫻ · · · ⫻ m[i ] ⫻ (0).
(b) For K ⫽ 2, show that if m[0] ⫽ 4/3 and m[1] ⫽ 2/3, then (t) is singular
at all dyadic points. Verify numerically with WAVELAB that the resulting
scaling equation does not define a finite-energy function .
(c) Show that one can find two matrices M[0] and M[1] such that the
K -dimensional vector ⌽(t) ⫽ [(t), (t ⫹ 1), . . . , (t ⫹ K ⫺ 1)]T satisfies
⌽(t) ⫽ M[0] ⌽(2t) ⫹ M[1] ⌽(2t ⫺ 1).
(d) Show that one can compute ⌽(t) at any dyadic number t ⫽ 0.1 2 · · · i
with a product of matrices:
⌽(t) ⫽ M[0 ] ⫻ M[1 ] ⫻ · · · ⫻ M[i ] ⫻ ⌽(0).
7.5
2 Let us define
√
k⫹1 (t) ⫽ 2
⫹⬁
h[n] k (2t ⫺ n),
n⫽⫺⬁
with 0 ⫽ 1[0,1] , and ak [n] ⫽ k (t), k (t ⫺ n).
(7.275)
7.9 Exercises
(a) Let
P f̂ () ⫽
1 2
|ĥ
| f̂
⫹ |ĥ
⫹ |2 f̂
⫹ .
2
2
2
2
2
Prove that âk⫹1 () ⫽ P âk ().
(b) Prove that if there exists such
limk→⫹⬁ k ⫺ ⫽ 0, then 1 is an
that⫺1/2
ˆ
eigenvalue of P and ()
⫽ ⫹⬁
ĥ(2⫺p ). What is the degree of
2
p⫽1
freedom on 0 in order to still converge to the same limit ?
(c) Implement numerically the computations of k (t) for the Daubechies
conjugate mirror filter with p ⫽ 6 zeros at . How many iterations
are needed to obtain k ⫺ ⬍ 10⫺4 ? Try to improve the rate of
convergence by modifying 0 .
7.6
Let b[n] ⫽ f (N ⫺1 n) with 2L ⫽ N ⫺1 and f ∈ VL . We want to recover
aL [n] ⫽ f , L,n from b[n] to compute the wavelet coefficients of f with
Theorem 7.10.
(a) Let L [n] ⫽ 2⫺L/2 (2⫺L n). Prove that b[n] ⫽ aL L [n].
(b) Prove that if there exists C ⬎ 0 such that for all ∈ [⫺, ],
3
⫹⬁
ˆ d () ⫽
ˆ ⫹ 2k) ⭓ C,
(
k⫽⫺⬁
then aL can be calculated from b with a stable filter ⫺1
L [n].
(c) If is a cubic spline–scaling function, compute numerically ⫺1
L [n]. For
a given numerical precision, compare the number of operations needed
to compute aL from b with the number of operations needed to compute
the fast wavelet transform of aL .
(d) Show that calculating aL from b is equivalent to performing a change of
basis in VL from a Riesz interpolation basis to an orthonormal basis.
7.7
2 Quadrature mirror filters. We define a multirate filter bank with four filters
h, g, h̃, and g̃, which decomposes a signal a0 [n]
a1 [n] ⫽ a0 h[2n], d1 [n] ⫽ a0 g[2n].
Using the notation (7.101), we reconstruct
ã0 [n] ⫽ ǎ1 h̃[n] ⫹ ď1 g̃[n].
(a) Prove that ã0 [n] ⫽ a0 [n ⫺ l] if
,̃
,̃
ĝ() ⫽ ĥ( ⫹ ), h() ⫽ ĥ(), ,̃
g() ⫽ ⫺h( ⫹ ),
and if h satisfies the quadrature mirror condition
ĥ2 () ⫺ ĥ2 ( ⫹ ) ⫽ 2 e⫺il .
371
372
CHAPTER 7 Wavelet Bases
(b) Show that l is necessarily odd.
(c) Verify that the Haar filter (7.46) is a quadrature mirror filter (it is the only
finite impulse response solution).
7.8
1 Let f be a function of support [0, 1] that is equal to different polynomials
of degree q on the intervals {[k , k⫹1 ]}0⭐k⬍K with 0 ⫽ 0 and K ⫽ 1. Let
be a Daubechies wavelet with p vanishing moments. If q ⬍ p, compute the
number of nonzero wavelet coefficients f , j,n at a fixed scale 2 j . How
should we choose p to minimize this number? If q ⬎ p, what is the maximum
number of nonzero wavelet coefficients f , j,n at a fixed scale 2 j ?
7.9
2 Let be a box spline of degree m obtained by m ⫹ 1 convolutions of 1
[0,1]
with itself.
(a) Prove that
m⫹1
(t) ⫽
1
(⫺1)k
m! k⫽0
m⫹1
k
([t ⫺ k]⫹ )m ,
where [x]⫹ ⫽ max(x, 0). Hint: Write 1[0,1] ⫽ 1[0,⫹⬁) ⫺ 1(1,⫹⬁) .
(b) Let Am and Bm be the Riesz bounds of {(t ⫺ n)}n∈Z . With (7.9) prove
that limm→⫹⬁ Bm ⫽ ⫹⬁. Compute numerically Am and Bm for m ∈
{0, . . . , 5} with MATLAB.
7.10
1 Prove that if {
⫹⬁
R ⫺ {0},
not true.
7.11
j⫽⫺⬁
2
j,n }( j,n)∈Z2 is an orthonormal basis of L (R), then for all ∈
ˆ j )|2 ⫽ 1. Find an example showing that the converse is
|(2
3 Let us define
1
ˆ
()
⫽
0
if 4/7 ⭐ || ⭐ or 4 ⭐ || ⭐ 4 ⫹ 4/7
otherwise.
Prove that {j,n }( j,n)∈Z2 is an orthonormal basis of L 2 (R). Prove that
is not associated to a scaling function that generates a multiresolution
approximation.
7.12
2 Express the Coiflet property (7.99) as an equivalent condition on the
conjugate mirror filter ĥ(ei ).
7.13
1 Prove that (t) has p vanishing moments if and only if, for all j ⬎ 0, the
discrete wavelets j [n] defined in (7.140) have p discrete vanishing moments
⫹⬁
nk j [n] ⫽ 0
for
0 ⭐ k ⬍ p.
n⫽⫺⬁
7.14
2 Let (t) be a compactly supported wavelet calculated with Daubechies con-
jugate mirror filters (h, g). Let ⬘j,n (t) ⫽ 2⫺j/2 ⬘(2⫺j t ⫺ n) be the derivative
wavelets.
7.9 Exercises
(a) Verify that h1 and g1 defined by
ĥ1 () ⫽ 2 ĥ() (ei ⫺ 1)⫺1 , ĝ1 () ⫽ 2 (ei ⫺ 1) ĝ()
are finite impulse response filters.
(b) Prove that the Fourier transform of ⬘(t) can be written as
⫹⬁
ĝ1 (2⫺1 ) ĥ1 (2⫺p )
,
.
⬘()
⫽
√
√
2
2
p⫽2
(c) Describe a fast filter bank algorithm to compute the derivative wavelet
coefficients f , ⬘j,n [108].
7.15
3 Let (t) be a compactly supported wavelet calculated with Daubechies
conjugate mirror filters (h, g). Let ĥa () ⫽ |ĥ()|2 . We verify that ˆ a () ⫽
ˆ
()
ĥa (/4 ⫺ /2) is an almost analytic wavelet.
(a) Prove that a is a complex wavelet such that Re[ a ] ⫽ .
(b) Compute numerically a () for a Daubechies wavelet with four vanishing moments. Explain why a () ≈ 0 for ⬍ 0.
a (t) ⫽ 2⫺j/2 a (2⫺j t ⫺ n). Using the fact that
(c) Let j,n
⫺1
⫺2
ĝ(2 ) ĥ(2 ) |ĥ(2
ˆ a () ⫽ √
√
2
2
⫹⬁
⫺2 ⫺ 2⫺1 )|2 ĥ(2⫺k )
√
2
k⫽3
√
2
,
show that we can modify the fast wavelet transform algorithm to coma by inserting a new
pute the “analytic” wavelet coefficients f , j,n
filter.
(d) Let be the scaling function associated to . We define separable twodimensional “analytic” wavelets by:
1 (x) ⫽ a (x1 ) (x2 ), 2 (x) ⫽ (x1 ) a (x2 ),
3 (x) ⫽ a (x1 ) a (x2 ), 4 (x) ⫽ a (x1 ) a (⫺x2 ).
k (x) ⫽ 2⫺j k (2⫺j x ⫺ n) for n ∈ Z2 . Modify the separable wavelet
Let j,n
filter bank algorithm from Section 7.7.3 to compute the “analytic”
k .
wavelet coefficients f , j,n
k
(e) Prove that {j,n }1⭐k⭐4, j∈Z,n∈Z2 is a frame of the space of real functions
f ∈ L 2 (R2 ) [108].
7.16
2 Multiwavelets. We define the following two scaling functions:
1 (t) ⫽ 1 (2t) ⫹ 1 (2t ⫺ 1)
1
2 (t) ⫽ 2 (2t) ⫹ 2 (2t ⫺ 1) ⫺ 1 (2t) ⫹ 1 (2t ⫺ 1) .
2
(a) Compute the functions 1 and 2 . Prove that {1 (t ⫺ n), 2 (t ⫺ n)}n∈Z
is an orthonormal basis of a space V0 that will be specified.
373
374
CHAPTER 7 Wavelet Bases
(b) Find 1 and 2 with a support on [0, 1] that are orthogonal to each other
and to 1 and 2 . Plot these wavelets.Verify that they have two vanishing
moments and that they generate an orthonormal basis of L 2 (R).
7.17
3 Let f repl be the folded function defined in (7.179).
(a) Let ␣(t), (t) ∈ L 2 (R) be two functions that are either symmetric or antisymmetric about t ⫽ 0. If ␣(t), (t ⫹ 2k) ⫽ 0 and ␣(t), (2k ⫺ t) ⫽ 0
for all k ∈ Z, then prove that
1
␣repl (t) repl (t) dt ⫽ 0.
0
˜ are either symmetric or antisymmetric with respect
˜
(b) Prove that if , ,,
to t ⫽ 1/2 or t ⫽ 0, and generate biorthogonal bases of L 2 (R), then the
folded bases (7.181) and (7.182) are biorthogonal bases of L 2 [0, 1]. Hint:
Use the same approach as in Theorem 7.16.
7.18
3 A recursive filter has a Fourier transform that is a ratio of trigonometric
polynomials as in (2.31).
(a) Let p[n] ⫽ h h̄[n] with h̄[n] ⫽ h[⫺n]. Verify that if h is a recursive conjugate
filter, then p̂() ⫹ p̂( ⫹ ) ⫽ 2 and there exists r̂() ⫽
K ⫺1 mirror
⫺ik such that
k⫽0 r[k] e
p̂() ⫽
2|r̂()|2
.
|r̂()|2 ⫹ |r̂( ⫹ )|2
(7.276)
(b) Suppose that K is even and that r[K /2 ⫺ 1 ⫺ k] ⫽ r[K /2 ⫹ k]. Verify that
p̂() ⫽
|r̂()|2
.
2 |r̂() ⫹ r̂( ⫹ )|2
(7.277)
(c) If r̂() ⫽ (1 ⫹ e⫺i )K ⫺1 with K ⫽ 6, compute ĥ() with the factorization (7.277), and verify that it is a stable filter (Exercise 3.18). Compute
numerically this filter and plot the graph of the corresponding wavelet
(t).
7.19
2 Balancing. Suppose that h, h̃ define a pair of perfect reconstruction filters
satisfying (7.124).
(a) Prove that
hnew [n] ⫽
1
1
h[n] ⫹ h[n ⫺ 1] , h̃new [n] ⫽
h̃[n] ⫹ h̃[n ⫺ 1]
2
2
defines a new pair of perfect reconstruction filters. Verify that ĥnew ()
,̃
and hnew () have, respectively, one more and one less zero at than
,̃
ĥ() and h() [63].
(b) Relate this balancing operation to a lifting.
7.9 Exercises
(c) The Deslauriers-Dubuc filters are ĥ() ⫽ 1 and
1
,̃
h() ⫽
(⫺e⫺3i ⫹ 9 e⫺i ⫹ 16 ⫹ 9 ei ⫺ e3i ).
16
Compute hnew and h̃new as well as the corresponding biorthogonal
wavelets new , ˜ new , after one balancing and again after a second
balancing.
7.20
1 Compute numerically the wavelets and scaling functions associated to the
predict and update lifting steps (7.264) and (7.265). Verify that you obtain
the 5/3 wavelets displayed in Figure 7.15.
1
7.21
Give the reduction of the number of operations when implementing
a fast wavelet transform with 5/3 and 7/9 biorthogonal wavelets with
the lifting algorithm described in Section 7.8.5, compared with a direct
implementation with (7.270) and (7.271) by using the coefficients in
Table 7.4.
7.22
1 For a Deslauriers-Dubuc interpolation wavelet of degree 3, compute the
7.23
2 Prove that a Deslauriers-Dubuc interpolation function of degree 2p ⫺ 1
7.24
3 Let be an autocorrelation scaling function that reproduces polynomi-
dual wavelet ˜ in (7.212), which is a sum of Diracs. Verify that it has four
vanishing moments.
converges to a sinc function when p goes to ⫹⬁.
als of degree p ⫺ 1 as in (7.198). Prove that if f is uniformly Lipschitz ␣,
then under the same hypotheses as in Theorem 7.22, there exists K ⬎ 0 such
that
f ⫺ PVj f ⬁ ⭐ K 2␣j .
7.25
2 Let (t) be an interpolation function that generates an interpolation wavelet
basis of C0 (R). Construct a separable interpolation wavelet basis of the
space C0 (Rp ) of uniformly continuous p-dimensional signals f (x1 , . . . , xp ).
Hint: Construct 2 p ⫺ 1 interpolation wavelets by appropriately translating
(x1 ) · · · (xp ).
7.26
3 Fractional Brownian. Let (t) be a compactly supported wavelet with
p vanishing moments that generates an orthonormal basis of L 2 (R). The
covariance of a fractional Brownian motion BH (t) is given by (6.86).
(a) Prove that E{|BH , j,n |2 } is proportional to 2 j(2H⫹1). Hint:Use Exercise
6.15.
(b) Prove that the decorrelation between the same scale wavelet coefficients
increases when the number p of vanishing moments of increases:
E{BH , j,n BH , l,m } ⫽ O 2 j(2H⫹1) |n ⫺ m|2(H⫺p) .
375
376
CHAPTER 7 Wavelet Bases
(c) In two dimensions,synthesize“approximate”fractional Brownian motion
k that are independent
images B̃H with wavelet coefficients BH , j,n
Gaussian random variables, with variances proportional to 2 j(2H⫹2) .
Adjust H in order to produce textures that look like clouds in the sky.
7.27
2 Image mosaic.Let f [n , n ] and f [n , n ] be two square images of N pix0 1
2
1 1
2
els. We want to merge the center of f0 [n1 , n2 ] for N 1/2 /4 ⭐ n1 , n2 ⬍ 3N 1/2 /4
in the center of f1 . Compute numerically the wavelet coefficients of f0 and
f1 . At each scale 2 j and orientation 1 ⭐ k ⭐ 3, replace the 2⫺2j /4 wavelet
coefficients corresponding to the center of f1 by the wavelet coefficients
of f0 . Reconstruct an image from this manipulated wavelet representation.
Explain why the image f0 seems to be merged in f1 ,without the strong boundary effects that are obtained when directly replacing the pixels of f1 by the
pixels of f0 .
7.28
3 Foveal vision. A foveal image has a maximum resolution at the center,
with a resolution that decreases linearly as a function of the distance to
the center. Show that one can construct an approximate foveal image by
keeping a constant number of nonzero wavelet coefficients at each scale 2 j .
Implement this algorithm numerically.
7.29
2 High contrast. We consider a color image specified by three color chan-
nels: red r[n], green g[n], and blue b[n]. The intensity image (r ⫹ g ⫹ b)/3
averages the variations of the three color channels. To create a high-contrast
k we set f , k to be the coefficient among
image f , for each wavelet j,n
j,n
k
k
k
r, j,n , g, j,n , and b, j,n , which has the maximum amplitude. Implement this algorithm numerically and evaluate its performance for different
types of multispectral images. How does the choice of affect the results?
CHAPTER
Wavelet Packet and Local
Cosine Bases
8
Different types of time-frequency structures are encountered in complex signals
such as speech recordings. This motivates the design of bases with time-frequency
properties that may be adapted. Wavelet bases are one particular family of bases
that represent piecewise smooth signals effectively. Other time-frequency bases are
constructed to approximate different types of signals such as audio recordings.
Orthonormal wavelet packet bases are computed with conjugate mirror filters
that divide the frequency axis in separate intervals of various sizes. Different conjugate mirror filter banks correspond to different wavelet packet bases. If the signal
properties change over time, it is preferable to isolate different time intervals with
translated windows. Local cosine bases are constructed by multiplying these windows with cosine functions. Wavelet packets segment the frequency axis and are
uniformly translated in time, whereas local cosine bases divide the time axis and
are uniformly translated in frequency. Both types of bases are extended in two
dimensions for image processing.
8.1 WAVELET PACKETS
8.1.1 Wavelet Packet Tree
Wavelet packets were introduced by Coifman, Meyer, and Wickerhauser [182]
by generalizing the link between multiresolution approximations and wavelets.
A space V j of a multiresolution approximation is decomposed in a lower-resolution
space Vj⫹1 plus a detail space Wj⫹1 . This is done by dividing the orthogonal basis
{j (t ⫺ 2 j n)}n∈Z of Vj into two new orthogonal bases
{j⫹1 (t ⫺ 2 j⫹1 n)}n∈Z of Vj⫹1
and
{j⫹1 (t ⫺ 2 j⫹1 n)}n∈Z of Wj⫹1 .
The decompositions (7.107) and (7.109) of j⫹1 and j⫹1 in the basis {j (t ⫺
2 j n)}n∈Z are specified by a pair of conjugate mirror filters h[n] and
g[n] ⫽ (⫺1)1⫺n h[1 ⫺ n].
Theorem 8.1 generalizes this result to any space Uj that admits an orthogonal basis
of functions translated by n2 j for n ∈ Z.
377
378
CHAPTER 8 Wavelet Packet and Local Cosine Bases
Theorem 8.1: Coifman, Meyer, Wickerhauser. Let {j (t ⫺ 2 j n)}n∈Z be an orthonormal
basis of a space Uj . Let h and g be a pair of conjugate mirror filters. Define
0j⫹1 (t) ⫽
⫹⬁
h[n] j (t ⫺ 2 j n)
n⫽⫺⬁
The family
⫹⬁
and 1j⫹1 (t) ⫽
g[n] j (t ⫺ 2 j n).
(8.1)
n⫽⫺⬁
0
j⫹1 (t ⫺ 2 j⫹1 n), 1j⫹1 (t ⫺ 2 j⫹1 n) n∈Z
is an orthonormal basis of Uj .
Proof. This proof is very similar to the proof of Theorem 7.3. The main steps are outlined.
Theorem 3.4 shows that {j (t ⫺ 2 j n)}n∈Z is orthogonal if and only if
⫹⬁ 2k 2
1 ˆ j ⫹ j ⫽ 1.
2 j k⫽⫺⬁ 2
(8.2)
We derive from (8.1) that the Fourier transform of 0j⫹1 is
ˆ 0j⫹1 () ⫽ ˆ j ()
⫹⬁
h[n] exp(⫺i2 j n) ⫽ ĥ(2 j ) ˆ j ().
(8.3)
n⫽⫺⬁
Similarly, the Fourier transform of 1j⫹1 is
ˆ 1j⫹1 () ⫽ ĝ(2 j ) ˆ j ().
(8.4)
Proving that {0j⫹1 (t ⫺ 2 j⫹1 n)} and {1j⫹1 (t ⫺ 2 j⫹1 n)}n∈Z are two families of orthogonal
vectors is equivalent to showing that for l ⫽ 0 or l ⫽ 1
1
⫹⬁ l 2k 2
ˆ
j⫹1 ⫹ 2 j⫹1 ⫽ 1.
2 j⫹1 k⫽⫺⬁
(8.5)
These two families of vectors yield orthogonal spaces if and only if
ˆ 0j⫹1 ⫹ 2k
2 j⫹1 k⫽⫺⬁
2 j⫹1
1
⫹⬁
2k
ˆ 1∗
j⫹1 ⫹ j⫹1 ⫽ 0.
2
(8.6)
The relations (8.5) and (8.6) are verified by replacing ˆ 0j⫹1 and ˆ 1j⫹1 by (8.3) and (8.4),
respectively, and by using the orthogonality of the basis (8.2) and the conjugate mirror
filter properties
| ĥ()|2 ⫹ | ĥ( ⫹ )|2 ⫽ 2,
|ĝ()|2 ⫹ |ĝ( ⫹ )|2 ⫽ 2,
ĝ() ĥ∗ () ⫹ ĝ( ⫹ ) ĥ∗ ( ⫹ ) ⫽ 0.
8.1 Wavelet Packets
To prove that the family {0j⫹1 (t ⫺ 2 j⫹1 n), 1j⫹1 (t ⫺ 2 j⫹1 n)}n∈Z generates the same
space as {j (t ⫺ 2 j n)}n∈Z ,we must prove that for any a[n] ∈ 2 (Z) there exist b[n] ∈ 2 (Z)
and c[n] ∈ 2 (Z) such that
⫹⬁
⫹⬁
a[n] j (t ⫺ 2 j n) ⫽
n⫽⫺⬁
b[n] 0j⫹1 (t ⫺ 2 j⫹1 n) ⫹
n⫽⫺⬁
⫹⬁
c[n] 1j⫹1 (t ⫺ 2 j⫹1 n). (8.7)
n⫽⫺⬁
To do this, we relate b̂() and ĉ() to â(). The Fourier transform of (8.7) yields
â(2 j ) ˆ j () ⫽ b̂(2 j⫹1 ) ˆ 0j⫹1 () ⫹ ĉ(2 j⫹1 ) ˆ 1j⫹1 ().
(8.8)
One can verify that
b̂(2 j⫹1 ) ⫽
1
â(2 j ) ĥ∗ (2 j ) ⫹ â(2 j ⫹ ) ĥ∗ (2 j ⫹ )
2
ĉ(2 j⫹1 ) ⫽
1
â(2 j ) ĝ ∗ (2 j ) ⫹ â(2 j ⫹ ) ĝ ∗ (2 j ⫹ )
2
and
■
satisfy (8.8).
Theorem 8.1 proves that conjugate mirror filters transform an orthogonal basis
{j (t ⫺ 2 j n)}n∈Z in two orthogonal families {0j⫹1 (t ⫺ 2 j⫹1 n)}n∈Z and {1j⫹1 (t ⫺
0
1
2 j⫹1 n)}n∈Z . Let Uj⫹1
and Uj⫹1
be the spaces generated by each of these families.
0
1
Clearly Uj⫹1 and Uj⫹1 are orthogonal and
0
1
Uj⫹1
⊕ Uj⫹1
⫽ Uj .
Computing the Fourier transform of (8.1) relates the Fourier transforms of 0j⫹1 and
1j⫹1 to the Fourier transform of j :
ˆ 0j⫹1 () ⫽ ĥ(2 j ) ˆ j (),
ˆ 1j⫹1 () ⫽ ĝ(2 j ) ˆ j ().
(8.9)
Since the transfer functions ĥ(2 j ) and ĝ(2 j ) have their energy concentrated in
different frequency intervals, this transformation can be interpreted as a division of
the frequency support of ˆ j .
Binary Wavelet Packet Tree
Instead of dividing only the approximation spaces Vj to construct detail spaces
Wj and wavelet bases, Theorem 8.1 proves that we can set Uj ⫽ Wj and divide
these detail spaces to derive new bases. The recursive splitting of vector spaces
is represented in a binary tree. If the signals are approximated at the scale 2L , to
the root of the tree we associate the approximation space VL . This space admits an
orthogonal basis of scaling functions {L (t ⫺ 2L n)}n∈Z with L (t) ⫽ 2⫺L/2 (2⫺L t).
Any node of the binary tree is labeled by ( j, p), where j ⫺ L ⭓ 0 is the depth of
the node in the tree, and p is the number of nodes that are on its left at the same
379
380
CHAPTER 8 Wavelet Packet and Local Cosine Bases
0
WL
1
0
W L11
W L11
1
0
2
3
W L12 W L12 W L12 W L12
FIGURE 8.1
Binary tree of wavelet packet spaces.
depth j ⫺ L. Such a tree is illustrated in Figure 8.1. To each node ( j, p) we associate
p
p
a space Wj , which admits an orthonormal basis {j (t ⫺ 2 j n)}n∈Z by going down
the tree. At the root we have WL0 ⫽ VL and L0 ⫽ L . Suppose now that we have
p
p
p
already constructed Wj and its orthonormal basis Bj ⫽ {j (t ⫺ 2 j n)}n∈Z at the
node ( j, p). The two wavelet packet orthogonal bases at the children nodes are
defined by the splitting relations (8.1):
⫹⬁
2p
j⫹1 (t) ⫽
h[n] j (t ⫺ 2 j n)
p
(8.10)
p
(8.11)
n⫽⫺⬁
and
⫹⬁
2p⫹1
j⫹1 (t) ⫽
g[n] j (t ⫺ 2 j n).
n⫽⫺⬁
p
Since {j (t ⫺ 2 j n)}n∈Z is orthonormal,
2p
p
h[n] ⫽ j⫹1 (u), j (u ⫺ 2 j n),
2p
2p⫹1
p
g[n] ⫽ j⫹1 (u), j (u ⫺ 2 j n).
2p
2p⫹1
(8.12)
2p⫹1
Theorem 8.1 proves that Bj⫹1 ⫽ {j⫹1 (t ⫺ 2 j⫹1 n)}n∈Z and Bj⫹1 ⫽ {j⫹1 (t ⫺
2p
2p⫹1
2 j⫹1 n)}n∈Z are orthonormal bases of two orthogonal spaces Wj⫹1 and Wj⫹1 such
that
2p
2p⫹1
p
Wj⫹1 ⊕ Wj⫹1 ⫽ Wj .
(8.13)
This recursive splitting defines a binary tree of wavelet packet spaces where each
parent node is divided in two orthogonal subspaces. Figure 8.2 displays the eight
p
wavelet packets j at the depth j ⫺ L ⫽ 3, calculated with a Daubechies 5 filter.
These wavelet packets are frequency ordered from left to right, as explained in
Section 8.1.2.
8.1 Wavelet Packets
2
1
0
3(t)
0.4
0.4
0.3
0.2
0.5
0.2
0
0
⫺0.2
⫺0.2
0.1
0
⫺0.1
40
⫺20
60
0
4
3(t)
⫺40
20
0
0
⫺40
40
0
20
6
⫺0.5
20
0
⫺20
40
3(t)
0.5
0.5
0
0
⫺0.5
0
20
7
3(t)
5
0.5
⫺0.5
⫺20
3(t)
0.5
0
0
⫺0.4
⫺0.4
20
⫺20
3(t)
0.4
0.2
⫺0.5
3
3(t)
3(t)
381
⫺0.5
⫺20
0
20
0
20
40
FIGURE 8.2
Wavelet packets computed with a Daubechies 5 filter at the depth j ⫺ L ⫽ 3 of the wavelet packet tree,
with L ⫽ 0. They are ordered from low to high frequencies.
0
WL
0
2
W L1 2
W L1 2
2
3
6
7
W L1 3 W L1 3 W L1 3 W L1 3
FIGURE 8.3
Example of an admissible wavelet packet binary tree.
Admissible Tree
We call any binary tree where each node has either zero or two children an
admissible tree, as shown in Figure 8.3. Let { ji , pi }1⭐i⭐I be the leaves of an admissible binary tree. By applying the recursive splitting (8.13) along the branches of an
p
admissible tree, we verify that the spaces {Wji i }1⭐i⭐I are mutually orthogonal and
add up to WL0 :
p
WL0 ⫽ ⊕Ii⫽1 Wji i .
(8.14)
382
CHAPTER 8 Wavelet Packet and Local Cosine Bases
The union of the corresponding wavelet packet bases
p
{ji i (t ⫺ 2 ji n)}n∈Z,1⭐i⭐I
thus defines an orthogonal basis of WL0 ⫽ VL .
Number of Wavelet Packet Bases
The number of different wavelet packet orthogonal bases of VL is equal to the
number of different admissible binary trees.Theorem 8.2 proves that there are more
J ⫺1
than 22
different wavelet packet orthonormal bases included in a full wavelet
packet binary tree of depth J .
Theorem 8.2. The number BJ of wavelet packet bases in a full wavelet packet binary
tree of depth J satisfies
22
J ⫺1
5
⭐ BJ ⭐ 2 4 2
J ⫺1
.
(8.15)
Proof. This result is proved by induction on the depth J of the wavelet packet tree. The
number BJ of different orthonormal bases is equal to the number of different admissible
binary trees of depth of at most J , which have nodes with either zero or two children.
For J ⫽ 0, the tree is reduced to its root, so B0 ⫽ 1.
Observe that the set of trees of depth of at most J ⫹ 1 is composed of trees of depth of
at least 1 and at most J ⫹ 1 plus one tree of depth 0 that is reduced to the root. A tree
of depth of at least 1 has a left and a right subtree that are admissible trees of depth
of at most J . The configuration of these trees is a priori independent and there are BJ
admissible trees of depth J , so
BJ ⫹1 ⫽ BJ2 ⫹ 1.
Since B1 ⫽ 2 and BJ ⫹1 ⭓ BJ2 , we prove by induction that BJ ⭓ 22
(8.16)
J ⫺1
. Moreover,
log2 BJ ⫹1 ⫽ 2 log2 BJ ⫹ log2 (1 ⫹ BJ⫺2 ).
If J ⭓ 1, then BJ ⭓ 2, so
1
log2 BJ ⫹1 ⭐ 2 log2 BJ ⫹ .
4
(8.17)
Since B1 ⫽ 2,
J ⫺1
log2 BJ ⫹1 ⭐ 2 J ⫹
5 J ⫺1
so BJ ⭐ 2 4 2
.
1 j
2J
2 ⭐2J ⫹ ,
4 j⫽0
4
■
For discrete signals of size N , we shall see that the wavelet packet tree is at most
of depth J ⫽ log2 N . This theorem proves that the number of wavelet packet bases
satisfies 2N /2 ⭐ Blog2 N ⭐ 25N /8 .
8.1 Wavelet Packets
Wavelet Packets on Intervals
To construct wavelet packet bases of L 2 [0, 1], we use the border techniques
developed in Section 7.5 to design wavelet bases of L 2 [0, 1]. The simplest approach
constructs periodic bases. As in the wavelet case, the coefficients of f ∈ L 2 [0, 1] in
a periodic wavelet packet basis are the same as the decomposition coefficients
2
of f per (t) ⫽ ⫹⬁
k⫽⫺⬁ f (t ⫹ k) in the original wavelet packet basis of L (R). The
periodization of f often creates discontinuities at the borders t ⫽ 0 and t ⫽ 1,which
generate large-amplitude wavelet packet coefficients.
Section 7.5.3 describes a more sophisticated technique that modifies the filters h
and g in order to construct boundary wavelets that keep their vanishing moments.
A generalization to wavelet packets is obtained by using these modified filters in
Theorem 8.1. This avoids creating the large-amplitude coefficients at the boundary,
which is typical of the periodic case.
Biorthogonal Wavelet Packets
Nonorthogonal wavelet bases are constructed in Section 7.4 with two pairs of perfect reconstruction filters (h, g) and ( h̃, g̃) instead of a single pair of conjugate
mirror filters. The orthogonal splitting Theorem 8.1 is extended into a biorthogonal
splitting by replacing the conjugate mirror filters with these perfect reconstruction filters. A Riesz basis {j (t ⫺ 2 j n)}n∈Z of Uj is transformed into two Riesz bases
0
{0j⫹1 (t ⫺ 2 j⫹1 n)}n∈Z and {1j⫹1 (t ⫺ 2 j⫹1 n)}n∈Z of two nonorthogonal spaces Uj⫹1
1
and Uj⫹1 such that
0
1
Uj⫹1
⊕ Uj⫹1
⫽ Uj .
A binary tree of nonorthogonal wavelet packet Riesz bases can be derived by induction using this vector space division. As in the orthogonal case, the wavelet packets
at the leaves of an admissible binary tree define a basis of WL0 , but this basis is not
orthogonal.
The lack of orthogonality is not a problem by itself as long as the basis remains
stable. Cohen and Daubechies proved [171] that when the depth j ⫺ L increases,the
p
angle between the spaces Wj located at the same depth can become progressively
smaller. This indicates that some of the wavelet packet bases constructed from
an admissible binary tree become unstable. Thus, we concentrate on orthogonal
wavelet packets constructed with conjugate mirror filters.
8.1.2 Time-Frequency Localization
Time Support
If the conjugate mirror filters h and g have a finite impulse response of size K ,
Theorem 7.5 proves that has a support size of K ⫺ 1, so L0 ⫽ L has a support size
of (K ⫺ 1)2L . Since
2p
j⫹1 (t) ⫽
⫹⬁
n⫽⫺⬁
p
h[n] j (t ⫺ 2 j n),
2p⫹1
j⫹1 (t) ⫽
⫹⬁
n⫽⫺⬁
p
g[n] j (t ⫺ 2 j n),
(8.18)
383
384
CHAPTER 8 Wavelet Packet and Local Cosine Bases
p
an induction on j shows that the support size of j is (K ⫺ 1)2 j .Thus,the parameter
j specifies the scale 2 j of the support. The wavelet packets in Figure 8.2 are constructed with a Daubechies filter of K ⫽ 10 coefficients with j ⫽ 3 and thus have a
support size of 23 (10 ⫺ 1) ⫽ 72.
Frequency Localization
The frequency localization of wavelet packets is more complicated to analyze. The
Fourier transform of (8.18) proves that the Fourier transforms of wavelet packet
children are related to their parent by
2p
p
ˆ j⫹1 () ⫽ ĥ(2 j ) ˆ j (),
2p⫹1
p
ˆ j⫹1 () ⫽ ĝ(2 j ) ˆ j ().
(8.19)
p
The energy of ˆ j is mostly concentrated over a frequency band and the two filters
ĥ(2 j ) and ĝ(2 j ) select the lower- or higher-frequency components within this
band. To relate the size and position of this frequency band to the indexes ( p, j),
we consider a simple example.
Shannon Wavelet Packets
Shannon wavelet packets are computed with perfect discrete low-pass and highpass filters
√
2 if ∈ [⫺/2 ⫹ 2k, /2 ⫹ 2k] with k ∈ Z
|ĥ()| ⫽
(8.20)
0
otherwise
and
√
|ĝ()| ⫽
0
2 if ∈ [/2 ⫹ 2k, 3/2 ⫹ 2k] with k ∈ Z
otherwise.
(8.21)
In this case, it is relatively simple to calculate the frequency support of the wavelet
packets. The Fourier transform of the scaling function is
ˆ L0 ⫽ ˆ L ⫽ 1[⫺2⫺L ,2⫺L ] .
(8.22)
Each multiplication with ĥ(2 j ) or ĝ(2 j ) divides the frequency support of the
wavelet packets in two. The delicate point is to realize that ĥ(2 j ) does not always
play the role of a low-pass filter because of the side lobes that are brought into the
interval [⫺2⫺L , 2⫺L ] by the dilation. At depth j ⫺ L, Theorem 8.3 proves that
p
ˆ j is proportional to the indicator function of a pair of frequency intervals, which
are labeled Ijk .The permutation that relates p and k is characterized recursively [71].
Theorem 8.3: Coifman, Wickerhauser. For any j ⫺ L ⬎ 0 and 0 ⭐ p ⬍ 2 j⫺L , there exists
0 ⭐ k ⬍ 2 j⫺L such that
|ˆ j ()| ⫽ 2 j/2 1I k (),
p
j
(8.23)
8.1 Wavelet Packets
where Ijk is a symmetric pair of intervals
Ijk ⫽ [⫺(k ⫹ 1)2⫺j , ⫺k2⫺j ] ∪ [k2⫺j , (k ⫹ 1)2⫺j ].
(8.24)
The permutation k ⫽ G[ p] satisfies for any 0 ⭐ p ⬍ 2 j⫺L
G[2p] ⫽
2G[ p]
2G[ p] ⫹ 1
if G[ p] is even
if G[ p] is odd
(8.25)
G[2p ⫹ 1] ⫽
2G[ p] ⫹ 1
2G[ p]
if G[ p] is even
if G[ p] is odd.
(8.26)
Proof. Equations (8.23), (8.25), and (8.26) are proved by induction on depth j ⫺ L. For
j ⫺ L ⫽ 0, (8.22) shows that (8.23) is valid. Suppose that (8.23) is valid for j ⫽ l ⭓ L and
any 0 ⭐ p ⬍ 2l⫺L . We first prove that (8.25) and (8.26) are verified for j ⫽ l. From these
two equations we then easily carry the induction hypothesis to prove that (8.23) is true
for j ⫽ l ⫹ 1 and for any 0 ⭐ p ⬍ 2l⫹1⫺L .
Equations (8.20) and (8.21) imply that
√
2 if ∈ [⫺2⫺l⫺1 (4m ⫺ 1), 2⫺l⫺1 (4m ⫹ 1)] with m ∈ Z
0
otherwise
√
2 if ∈ [⫺2⫺l⫺1 (4m ⫹ 1), 2⫺l⫺1 (4m ⫹ 3)] with m ∈ Z
|ĝ(2l )| ⫽
0
otherwise.
|ĥ(2l )| ⫽
(8.27)
(8.28)
Since (8.23) is valid for l, the support of ˆ l is
p
Ilk ⫽ [⫺(2k ⫹ 2)2⫺l⫺1 , ⫺2k2⫺l⫺1 ] ∪ [2k2⫺l⫺1 , (2k ⫹ 2)2⫺l⫺1 ].
The two children are defined by
2p
p
ˆ l⫹1 () ⫽ ĥ(2l ) ˆ l (),
2p⫹1
p
ˆ l⫹1 () ⫽ ĝ(2l ) ˆ l ().
Thus, we derive (8.25) and (8.26) by checking the intersection of Ilk with the supports
■
of ĥ(2 j ) and ĝ(2 j ) specified by (8.27) and (8.28).
For Shannon wavelet packets, Theorem 8.3 proves that ˆ j has a frequency support located over two intervals of size 2⫺j , centered at ⫾ (k ⫹ 1/2)2⫺j . The
Fourier transform expression (8.23) implies that these Shannon wavelet packets
can be written as cosine-modulated windows
p
j (t) ⫽ 2⫺j/2⫹1 (2⫺j t) cos 2⫺j (k ⫹ 1/2)(t ⫺ j,p ) ,
p
(8.29)
with
(t) ⫽
sin(t/2)
,
t
and thus,
ˆ
()
⫽ 1[⫺/2,/2] ().
p
The translation parameter j,p can be calculated from the complex phase of ˆ j .
385
386
CHAPTER 8 Wavelet Packet and Local Cosine Bases
Frequency Ordering
p
It is often easier to label jk a wavelet packet j that has a Fourier transform centered
at ⫾ (k ⫹ 1/2)2⫺j with k ⫽ G[ p]. This means changing its position in the wavelet
packet tree from node p to node k. The resulting wavelet packet tree is frequency
ordered.The left child always corresponds to a lower-frequency wavelet packet and
the right child to a higher-frequency one.
The permutation k ⫽ G[ p] is characterized by the recursive equations (8.25)
and (8.26). The inverse permutation p ⫽ G ⫺1 [k] is called a Gray code in coding
theory. This permutation is implemented on binary strings by deriving the following
relations from (8.25) and (8.26). If pi is the i th binary digit of the integer p and ki
the i th digit of k ⫽ G[ p], then
⫹⬁ ki ⫽
pl mod 2,
(8.30)
l⫽i
pi ⫽ (ki ⫹ ki⫹1 ) mod 2.
(8.31)
Compactly Supported Wavelet Packets
Wavelet packets of compact support have a more complicated frequency behavior than Shannon wavelet packets, but the previous analysis provides important
insights. If h is a finite impulse response filter, ĥ does not have a support restricted
to [⫺/2, /2] over the interval [⫺, ]. It is true, however, that the energy of ĥ
is mostly concentrated in [⫺/2, /2]. Similarly, the energy of ĝ is mostly concentrated in [⫺, ⫺/2] ∪ [/2, ], for ∈ [⫺, ]. As a consequence, the localization
p
properties of Shannon wavelet packets remain qualitatively valid. The energy of ˆ j
is mostly concentrated over
Ijk ⫽ [⫺(k ⫹ 1)2⫺j ,
⫺k2⫺j ] ∪ [k2⫺j ,
(k ⫹ 1)2⫺j ],
with k ⫽ G[ p]. The larger the proportion of energy of ĥ in [⫺/2, /2], the more
p
concentrated the energy of ˆ j in Ijk . The energy concentration of ĥ in [⫺/2, /2]
is increased by having more zeroes at , so that ĥ() remains close to zero in
[⫺, ⫺/2] ∪ [/2, ]. Theorem 7.4 proves that this is equivalent to imposing that
the wavelets constructed in the wavelet packet tree have many vanishing moments.
p
These qualitative statements must be interpreted carefully. The side lobes of ˆ j
k
beyond the intervals Ij are not completely negligible. For example,wavelet packets
p
created with a Haar filter are discontinuous functions. Thus, |ˆ ()| decays like
j
||⫺1 at high frequencies, which indicates the existence of large side lobes outside
p
Ik . It is also important to note that contrary to Shannon wavelet packets,compactly
supported wavelet packets cannot be written as dilated windows modulated by
cosine functions of varying frequency. When the scale increases, wavelet packets
generally do not converge to cosine functions. They may have a wild behavior with
localized oscillations of considerable amplitude.
8.1 Wavelet Packets
387
Walsh Wavelet Packets
Walsh wavelet packets are generated by the Haar conjugate mirror filter
√1
if n ⫽ 0, 1
2
h[n] ⫽
0
otherwise.
They have very different properties from Shannon wavelet packets since the filter
h is well localized in time but not in frequency. The corresponding scaling function
is ⫽ 1[0,1] and the approximation space VL ⫽ WL0 is composed of functions that
are constant over the intervals [2L n, 2L (n ⫹ 1)) for n ∈ Z. Since all wavelet packets
created with this filter belong to VL , they are piecewise constant functions. The
p
support size of h is K ⫽ 2, so Walsh functions j have a support size of 2 j . The
wavelet packet recursive relations (8.18) become
1 p
1 p
2p
j⫹1 (t) ⫽ √ j (t) ⫹ √ j (t ⫺ 2 j ),
2
2
(8.32)
1 p
1 p
2p⫹1
j⫹1 (t) ⫽ √ j (t) ⫺ √ j (t ⫺ 2 j ).
2
2
(8.33)
and
p
p
Since j has a support size of 2 j , it does not intersect the support of j (t ⫺ 2 j ).
p
Thus, these wavelet packets are constructed by juxtaposing j with its translated
version that has a sign that might be changed. Figure 8.4 shows the Walsh functions
at depth j ⫺ L ⫽ 3 of the wavelet packet tree. Theorem 8.4 computes the number
p
of oscillations of j .
1
0
2
3(t)
3(t)
3
3(t)
3(t)
0.4
0.4
0.4
0.4
0.3
0.2
0.2
0.2
0
0
0
20.2
20.2
20.2
0.2
0.1
0
20.4
0
2
4
6
0
8
2
4
6
8
20.4
20.4
0
2
5
3(t)
4
3(t)
4
6
0
8
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0
0
0
0
20.2
20.2
20.2
20.2
20.4
20.4
0
2
4
6
8
0
2
4
6
8
4
6
8
6
8
7
3(t)
0.4
20.4
2
6
3(t)
20.4
0
2
4
6
8
0
2
4
FIGURE 8.4
Frequency-ordered Walsh wavelet packets computed with a Haar filter at depth j ⫺ L ⫽ 3 of the wavelet
packet tree, with L ⫽ 0.
388
CHAPTER 8 Wavelet Packet and Local Cosine Bases
p
Theorem 8.4. The support of a Walsh wavelet packet j is [0, 2 j ]. Over its support,
j (t) ⫽⫾ 2⫺j/2 . It changes sign k ⫽ G[ p] times, where G[ p] is the permutation defined
by (8.25) and (8.26).
p
Proof. By induction on j, we derive from (8.32) and (8.33) that the support is [0, 2 j ] and
p
p
that j (t) ⫽⫾ 2⫺j/2 over its support. Let k be the number of times that j changes sign.
2p
2p⫹1
The number of times that j⫹1 and j⫹1 change sign is either 2k or 2k ⫹ 1 depending
p
on the sign of the first and last nonzero values of j . If k is even, then the sign of the first
2p
p
2p⫹1
and last nonzero values of j are the same. Thus, the number of times j⫹1 and j⫹1
change sign is, respectively, 2k and 2k ⫹ 1. If k is odd, then the sign of the first and last
p
2p
2p⫹1
nonzero values of j are different. The number of times j⫹1 and j⫹1 change sign is
then 2k ⫹ 1 and 2k. These recursive properties are identical to (8.25) and (8.26).
■
p
Therefore,a Walsh wavelet packet j is a square wave with k ⫽ G[ p] oscillations
over a support size of 2 j .This result is similar to (8.29),which proves that a Shannon
p
wavelet packet j is a window modulated by a cosine of frequency 2⫺j k. In both
cases, the oscillation frequency of wavelet packets is proportional to 2⫺j k.
Heisenberg Boxes
p
For display purposes, we associate to any wavelet packet j (t ⫺ 2 j n) a Heisenberg
rectangle, which indicates the time and frequency domains where the energy of
this wavelet packet is mostly concentrated. The time support of the rectangle is set
p
to be the same as the time support of a Walsh wavelet packet j (t ⫺ 2 j n), which
is equal to [2 j n, 2 j (n ⫹ 1)]. The frequency support of the rectangle is defined as
the positive-frequency support [k2⫺j , (k ⫹ 1)2⫺j ] of Shannon wavelet packets
with k ⫽ G[ p]. The scale 2 j modifies the time and frequency elongation of this
time-frequency rectangle, but its surface remains constant. The indices n and k
give its localization in time and frequency. General wavelet packets—for example, computed with Daubechies filters—have a time and frequency spread that is
much wider than this Heisenberg rectangle. However,this convention has the advantage of associating a wavelet packet basis to an exact paving of the time-frequency
plane. Figure 8.5 shows an example of such a paving and the corresponding wavelet
packet tree.
Figure 8.6 displays the decomposition of a multichirp signal having a spectrogram
shown in Figure 4.3. The wavelet packet basis is computed with a Daubechies 10
filter. As expected, the large-amplitude coefficients are along the trajectory of linear
and quadratic chirps that appear in Figure 4.3. We also see the trace of the two
modulated Gaussian functions located at t ⫽ 0.5 and t ⫽ 0.87.
8.1.3 Particular Wavelet Packet Bases
Among the many wavelet packet bases, we describe here the properties of
M-band wavelet bases,“local cosine type”bases,and“best”bases.The wavelet packet
8.1 Wavelet Packets
t
FIGURE 8.5
The wavelet packet tree (left ) divides the frequency axis in several intervals. The Heisenberg
boxes (right ) of the corresponding wavelet packet basis.
f (t)
2
0
⫺2
t
0
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
Ⲑ 2
500
400
300
200
100
t
0
0
FIGURE 8.6
Wavelet packet decomposition of the multichirp signal the spectrogram of which is shown in
p
Figure 4.3. The darker the gray level of each Heisenberg box, the larger the amplitude | f , j |
of the corresponding wavelet packet coefficient.
389
390
CHAPTER 8 Wavelet Packet and Local Cosine Bases
(a)
(b)
FIGURE 8.7
(a) Wavelet packet tree of a dyadic wavelet basis. (b) Wavelet packet tree of an M-band wavelet
basis with M ⫽ 2.
tree is frequency ordered, which means that jk has a Fourier transform with an
energy essentially concentrated in the interval [k2⫺j , (k ⫹ 1)2⫺j ] for positive
frequencies.
M-Band Wavelets
The standard dyadic wavelet basis is an example of a wavelet packet basis of VL ,
obtained by choosing the admissible binary tree shown in Figure 8.7(a). Its leaves
are the nodes k ⫽ 1 at all depth j ⫺ L and thus correspond to the wavelet packet
basis
{j1 (t ⫺ 2 j n)}n∈Z, j⬎L
constructed by dilating a single wavelet 1 :
1
j1 (t) ⫽ √ 1
j
2
t
2j
.
The energy of ˆ 1 is mostly concentrated in the interval [⫺2, ⫺] ∪ [, 2]. The
octave bandwidth for positive frequencies is the ratio between the bandwidth of
the pass band and its distance to the zero frequency. It is equal to 1 octave. This
quantity remains constant by dilation and specifies the frequency resolution of the
wavelet transform.
Wavelet packets include other wavelet bases constructed with several wavelets
having a better frequency resolution. Let us consider the admissible binary trees of
Figure 8.7(b), which have leaves that are indexed by k ⫽ 2 and k ⫽ 3 at all depth
j ⫺ L. The resulting wavelet packet basis of VL is
{j2 (t ⫺ 2 j n), j3 (t ⫺ 2 j n)}n∈Z, j⬎L⫹1 .
These wavelet packets can be rewritten as dilations of two elementary wavelets 2
and 3 :
t t 1
1
j2 (t) ⫽ √
2 j⫺1 , j3 (t) ⫽ √
3 j⫺1 .
2
2
2 j⫺1
2 j⫺1
8.1 Wavelet Packets
/2
500
400
300
200
100
0
0
0.2
0.4
0.6
0.8
1
0.6
0.8
1
t
(a)
/2
500
400
300
200
100
0
0
0.2
0.4
t
(b)
FIGURE 8.8
(a) Heisenberg boxes of a two-band wavelet decomposition of the multichirp signal shown in
Figure 8.6. (b) Decomposition of the same signal in a pseudo-local cosine wavelet packet basis.
Over positive frequencies, the energy of ˆ 2 and ˆ 3 is mostly concentrated, respectively, in [, 3/2] and [3/2, 2]. Thus, the octave bandwidths of ˆ 2 and ˆ 3 are
equal to 1/2 and 1/3,respectively. Wavelets 2 and 3 have a higher-frequency resolution than 1 , but their time support is twice as large. Figure 8.8(a) gives a two-band
wavelet decomposition of the multichirp signal shown in Figure 8.6,calculated with
a Daubechies 10 filter.
Higher-resolution wavelet bases can be constructed with an arbitrary number of
M ⫽ 2l wavelets. In a frequency-ordered wavelet packet tree,we define an admissible
391
392
CHAPTER 8 Wavelet Packet and Local Cosine Bases
binary tree with leaves indexed by 2l ⭐ k ⬍ 2l⫹1 at the depth j ⫺ L ⬎ l. The resulting
wavelet packet basis
{jk (t ⫺ 2 j n)}M⭐k⬍2M, j⬎L⫹l
can be written as dilations and translations of M elementary wavelets
t 1
jk (t) ⫽ √
k j⫺l .
2
2 j⫺l
The support size of k is proportional to M ⫽ 2l . For positive frequencies,the energy
of ˆ k is mostly concentrated in [k2⫺l , (k ⫹ 1)2⫺l ].The octave bandwidth is therefore 2⫺l /(k2⫺l ) ⫽ k⫺1 for M ⭐ k ⬍ 2M. The M wavelets { k }M⭐k⬍2M have an
octave bandwidth smaller than M ⫺1 but a time support M times larger than the
support of 1. Such wavelet bases are called M-band wavelets. More general families of M-band wavelets can also be constructed with other M-band filter banks
studied in [68].
Pseudo-Local Cosine Bases
Pseudo-local cosine bases are constructed with an admissible binary tree that is a
full tree of depth J ⫺ L ⭓ 0. The leaves are the nodes indexed by 0 ⭐ k ⬍ 2 J ⫺L and
the resulting wavelet packet basis is
{Jk (t ⫺ 2 J n)}n∈Z, 0⭐k⬍2 J ⫺L .
(8.34)
If these wavelet packets are constructed with a conjugate mirror filter of size
K , they have a support of size (K ⫺ 1)2 J . For positive frequencies, the energy of
ˆ jk is concentrated in [k2⫺J , (k ⫹ 1)2⫺J ]. Therefore, the bandwidth of all these
wavelet packets is approximately constant and equal to 2⫺J.The Heisenberg boxes
of these wavelet packets have the same size and divide the time-frequency plane in
the rectangular grid illustrated in Figure 8.9.
Shannon wavelet packets Jk are written in (8.29) as a dilated window modulated by cosine functions of frequency 2⫺J (k ⫹ 1/2). In this case, the uniform
t
FIGURE 8.9
Admissible tree (left ) and Heisenberg boxes (right ) of a wavelet packet pseudo-local cosine
basis.
8.1 Wavelet Packets
wavelet packet basis (8.34) is a local cosine basis with windows of constant size.
This result is not valid for wavelet packets constructed with different conjugate
mirror filters. Nevertheless, the time and frequency resolution of uniform wavelet
packet bases (8.34) remains constant, like that of local cosine bases constructed
with windows of constant size. Figure 8.8(b) gives the decomposition coefficients
of a signal in such a uniform wavelet packet basis.
Best Basis
Applications of orthogonal bases often rely on their ability to efficiently approximate signals with only a few nonzero vectors. Choosing a wavelet packet basis that
concentrates the signal energy over a few coefficients also reveals its time-frequency
structures. Section 12.2.2 describes a fast algorithm that searches for a “best” basis
that minimizes a Schur concave cost function among all wavelet packet bases. The
wavelet packet basis of Figure 8.6 is calculated with this best basis search.
8.1.4 Wavelet Packet Filter Banks
Wavelet packet coefficients are computed with a filter bank algorithm that generalizes the fast discrete wavelet transform. This algorithm is a straightforward iteration
of the two-channel filter bank decomposition presented in Section 7.3.2. Therefore, it was used in signal processing by Croisier, Esteban, and Galand [189] when
they introduced the first family of perfect reconstruction filters. The algorithm is
presented here from a wavelet packet point of view.
To any discrete signal input b[n] sampled at intervals N ⫺1 ⫽ 2L , as in (7.111) we
associate f ∈ VL with decomposition coefficients aL [n] ⫽ f , L,n that satisfy
b[n] ⫽ N 1/2 aL [n] ≈ f (N ⫺1 n).
(8.35)
For any node ( j, p) of the wavelet packet tree, we denote the wavelet packet
coefficients
p
p
dj [n] ⫽ f (t), j (t ⫺ 2 j n).
At the root of the tree dL0 [n] ⫽ aL [n] is computed from b[n] with (8.35).
Wavelet Packet Decomposition
We denote x̄[n] ⫽ x[⫺n] and by x̌ the signal obtained by inserting a zero between
each sample of x. Theorem 8.5 generalizes the fast wavelet transformTheorem 7.10.
Theorem 8.5. At the decomposition
2p
p
2p⫹1
p
(8.36)
dj [k] ⫽ ďj⫹1 h[k] ⫹ ďj⫹1 g[k].
(8.37)
dj⫹1 [k] ⫽ dj h̄[2k]
and
dj⫹1 [k] ⫽ dj ḡ[2k].
At the reconstruction
p
2p
2p⫹1
The proof of these equations is identical to the proof of Theorem 7.10. The coeffi2p
2p⫹1
cients of wavelet packet children dj⫹1 and dj⫹1 are obtained by subsampling
393
394
CHAPTER 8 Wavelet Packet and Local Cosine Bases
–
h
–
h
2
d0
g–
g–
2
d1
–
h
g–
L11
dL0
L11
2
d0
2
d1
L12
2
d2
2
d3
L12
L12
L12
(a)
d0
2
h
d1
L12
2
g
d2
2
h
L12
1
d0
L11
2
h
1
L12
d3
2
L12
1
d1
L11
g
2
d0
L
g
( b)
FIGURE 8.10
(a) Wavelet packet filter bank decomposition with successive filterings and subsamplings.
(b) Reconstruction by inserting zeros and filtering the outputs.
p
the convolutions of dj with h̄ and ḡ. Iterating these equations along the branches
of a wavelet packet tree computes all wavelet packet coefficients,as illustrated by
Figure 8.10(a). From the wavelet packet coefficients at the leaves { ji , pi }1⭐i⭐I of
an admissible subtree, we recover dL0 at the top of the tree by computing (8.37)
for each node inside the tree, as illustrated by Figure 8.10(b).
■
Finite Signals
If aL is a finite signal of size 2⫺L ⫽ N ,we are facing the same border convolution problems as in a fast discrete wavelet transform. One approach explained in Section 7.5.1
is to periodize the wavelet packet basis.The convolutions (8.36) are then replaced by
circular convolutions.To avoid introducing sharp transitions with the periodization,
p
one can also use the border filters described in Section 7.5.3. In either case, dj has
2⫺j samples. At any depth j ⫺ L of the tree, the wavelet packet signals {dj }0⭐p⬍2 j⫺L
include a total of N coefficients. Since the maximum depth is log2 N , there are at
most N log2 N coefficients in a full wavelet packet tree.
In a full wavelet packet tree of depth log2 N , all coefficients are computed by
iterating (8.36) for L ⭐ j ⬍ 0. If h and g have K nonzero coefficients, this requires
KN log2 N additions and multiplications. This is quite spectacular since there are
more than 2N /2 different wavelet packet bases included in this wavelet packet tree.
The computational complexity to recover aL ⫽ dL0 from the wavelet packet coefficients of an admissible tree increases with the number of inside nodes of the
admissible tree. When the admissible tree is the full binary tree of depth log2 N ,
the number of operations is maximum and equal to KN log2 N multiplications and
p
8.2 Image Wavelet Packets
additions. If the admissible subtree is a wavelet tree, we need fewer than 2KN
multiplications and additions.
Discrete Wavelet Packet Bases of 2 (Z)
The signal decomposition in a conjugate mirror filter bank can also be interpreted
as an expansion in discrete wavelet packet bases of 2 (Z). This is proved with a
result similar to Theorem 8.1.
Theorem 8.6. Let {j [m ⫺ 2 j⫺L n]}n∈Z be an orthonormal basis of a space Uj with
j ⫺ L ∈ N. Define
⫹⬁
0j⫹1 [m] ⫽
h[n] j [m ⫺ 2 j⫺L n],
1j⫹1 [m] ⫽
n⫽⫺⬁
⫹⬁
g[n] j [m ⫺ 2 j⫺L n].
(8.38)
n⫽⫺⬁
The family
0j⫹1 [m ⫺ 2 j⫹1⫺L n], 1j⫹1 [m ⫺ 2 j⫹1⫺L n]
n∈Z
is an orthonormal basis of Uj .
The proof is similar to the proof of Theorem 8.1. As in the continuous-time case,
we derive from this theorem a binary tree of discrete wavelet packets. At the root
of the discrete wavelet packet tree is the space WL0 ⫽ 2 (Z) of discrete signals
obtained with a sampling interval N ⫺1 ⫽ 2L . It admits a canonical basis of Diracs
{L0 [m ⫺ n] ⫽ ␦[m ⫺ n]}n∈Z . The signal aL [m] is specified by its sample values in
this basis. One can verify that the convolutions and subsamplings (8.36) compute
p
p
dj [n] ⫽ aL [m], j [m ⫺ 2 j⫺L n],
p
p
where {j [m ⫺ 2 j⫺L n]}n∈Z is an orthogonal basis of a space Wj . These discrete
wavelet packets are recursively defined for any j ⭓ L and 0 ⭐ p ⬍ 2 j⫺L by
2p
j⫹1 [m] ⫽
⫹⬁
p
h[n] j [m ⫺ 2 j⫺L n],
n⫽⫺⬁
2p⫹1
j⫹1 [m] ⫽
⫹⬁
p
g[n] j [m ⫺ 2 j⫺L n].
n⫽⫺⬁
(8.39)
8.2 IMAGE WAVELET PACKETS
8.2.1 Wavelet Packet Quad-Tree
We construct wavelet packet bases of L 2 (R2 ) with separable products of wavelet
p
q
packets j (x1 ⫺ 2 j n1 ) j (x2 ⫺ 2 j n2 ) having the same scale along x1 and x2 . These
separable wavelet packet bases are associated to quad-trees, and divide the twodimensional Fourier plane (1 , 2 ) into square regions of varying sizes. Separable
wavelet packet bases are extensions of separable wavelet bases.
395
396
CHAPTER 8 Wavelet Packet and Local Cosine Bases
If images are approximated at the scale 2L , to the root of the quad-tree we
associate the approximation space VL2 ⫽ VL ⊗ VL ⊂ L 2 (R2 ) defined in Section 7.7.1.
In Section 8.1.1 we explain how to decompose VL with a binary tree of wavelet
p
p
packet spaces Wj ⊂ VL that admit an orthogonal basis {j (t ⫺ 2 j n)}n∈Z . The twodimensional wavelet packet quad-tree is composed of separable wavelet packet
spaces. Each node of this quad-tree is labeled by a scale 2 j and two integers 0 ⭐ p ⬍
2 j⫺L and 0 ⭐ q ⬍ 2 j⫺L , and corresponds to a separable space
p,q
Wj
p
q
⫽ Wj ⊗ Wj .
(8.40)
The resulting separable wavelet packet for x ⫽ (x1 , x2 ) is
p,q
j
p
q
(x) ⫽ j (x1 ) j (x2 ).
p,q
Theorem 7.25 proves that an orthogonal basis of Wj
is obtained with a separable
p
q
product of the wavelet packet bases of Wj and Wj , which can be written as
p,q
j
(x ⫺ 2 j n)
n∈Z2
.
At the root WL0,0 ⫽ VL2 , the wavelet packet is a two-dimensional scaling function
L0,0 (x) ⫽ 2L (x) ⫽ L (x1 ) L (x2 ).
One-dimensional wavelet packet spaces satisfy
p
2p
2p⫹1
Wj ⫽ Wj⫹1 ⊕ Wj⫹1
and
2q
q
p,q
Inserting these equations in (8.40) proves that Wj
orthogonal subspaces
p,q
Wj
2p,2q
⫽ Wj⫹1
2p⫹1,2q
⊕ Wj⫹1
2q⫹1
Wj ⫽ Wj⫹1 ⊕ Wj⫹1 .
2p,2q⫹1
⊕ Wj⫹1
is the direct sum of the four
2p⫹1,2q⫹1
⊕ Wj⫹1
.
(8.41)
These subspaces are located at the four children nodes in the quad-tree,as shown by
Figure 8.11. We call any quad-tree that has nodes with either zero or four children an
admissible quad-tree. Let { ji , pi , qi }0⭐i⭐I be the indices of the nodes at the leaves
of an admissible quad-tree.Applying recursively the reconstruction sum (8.41) along
the branches of this quad-tree gives an orthogonal decomposition of WL0,0 :
WL0,0 ⫽
I
p ,qi
W ji i
.
i⫽1
The union of the corresponding wavelet packet bases
p ,q
ji i i (x ⫺ 2 ji n)
2
(n1 ,n2 )∈Z , 1⭐i⭐I
is therefore an orthonormal basis of VL2 ⫽ WL0,0 .
8.2 Image Wavelet Packets
W jp, q
2p, 2q
⫹1
Wj
2p⫹1, 2q
W j⫹1
2p⫹1, 2q⫹1
2p, 2q⫹1
W j⫹1
W j⫹1
FIGURE 8.11
A wavelet packet quad-tree for images is constructed recursively by decomposing each
p,q
separable space Wj in four subspaces.
Number of Wavelet Packet Bases
The number of different bases in a full wavelet packet quad-tree of depth J is equal
to the number of admissible subtrees. Theorem 8.7 proves that there are more than
J ⫺1
24
such bases.
Theorem 8.7. The number BJ of wavelet packet bases in a full wavelet packet quad-tree
of depth J satisfies
24
J ⫺1
49
⭐ BJ ⭐ 2 48 4
J ⫺1
.
Proof. This result is proved with induction, as in the proof of Theorem 8.7. The reader
can verify that BJ satisfies an induction relation similar to (8.16):
BJ ⫹1 ⫽ BJ4 ⫹ 1.
(8.42)
Since B0 ⫽ 1, B1 ⫽ 2, and BJ ⫹1 ⭓ BJ4 , we derive that BJ ⭓ 24
log2 BJ ⫹1 ⫽ 4 log2 BJ ⫹ log2 (1 ⫹ BJ⫺4 ) ⭐ 4 log2 BJ ⫹
49 J ⫺1
which implies that BJ ⭓ 2 48 4
J ⫺1
. Moreover, for J ⭓ 1
J ⫺1
1
1 j
4 ,
⭐4J ⫹
16
16 j⫽0
■
.
For an image of N ⫽ N1 N2 , pixels, if N1 ⫽ N2 , then the wavelet packet quad-tree
has a depth at most log2 N1 ⫽ log2 N 1/2 . Thus, the number of wavelet packet bases
satisfies
N
49 N
2 4 ⭐ Blog2 N ⭐ 2 48 4 .
(8.43)
Spatial and Frequency Localization
The spatial and frequency localization of two-dimensional wavelet packets is derived
from the time-frequency analysis performed in Section 8.1.2. If the conjugate mirror
397
398
CHAPTER 8 Wavelet Packet and Local Cosine Bases
p
filter h has K nonzero coefficients,we proved that j has a support size of 2 j (K ⫺ 1),
p
q
thus j (x1 ) j (x2 ) has a square support of width 2 j (K ⫺ 1).
p
We showed that the Fourier transform of j has its energy mostly concentrated in
[⫺(k ⫹ 1)2⫺j , ⫺k2⫺j ] ∪ [k2⫺j , (k ⫹ 1)2⫺j ],
where k ⫽ G[ p] is specified by Theorem 8.3. Therefore, the Fourier transform of a
p,q
two-dimensional wavelet packet j has its energy mostly concentrated in
[k1 2⫺j , (k1 ⫹ 1)2⫺j ] ⫻ [k2 2⫺j , (k2 ⫹ 1)2⫺j ],
(8.44)
with k1 ⫽ G[ p] and k2 ⫽ G[q], and in the three squares that are symmetric with
respect to the two axes 1 ⫽ 0 and 2 ⫽ 0. An admissible wavelet packet quadtree decomposes the positive-frequency quadrant into squares of dyadic sizes, as
illustrated in Figure 8.12. For example, the leaves of a full wavelet packet quad-tree
of depth j ⫺ L define a wavelet packet basis that decomposes the positive-frequency
quadrant into squares of constant width equal to 2⫺j . This wavelet packet basis is
similar to a two-dimensional local cosine basis with windows of constant size.
0,0
WL
0, 1
0, 0
WL⫹1
WL⫹1
2⫺L
0
1,0
WL⫹1
1,1
WL⫹1
2
2⫺L
1
FIGURE 8.12
A wavelet packet quad-tree decomposes the positive-frequency quadrant into squares of
progressively smaller sizes as we go down the tree.
8.2 Image Wavelet Packets
8.2.2 Separable Filter Banks
The decomposition coefficients of an image in a separable wavelet packet basis
are computed with a separable extension of the filter bank algorithm described
in Section 8.1.4. Let b[n] be an input image with pixels at a distance 2L . We associate b[n] to a function f ∈ VL2 approximated at the scale 2L , with decomposition
coefficients aL [n] ⫽ f (x), 2L (x ⫺ 2L n) that are defined like in (7.232):
b[n] ⫽ 2⫺L aL [n] ≈ f (2L n).
The wavelet packet coefficients
p,q
dj
p,q
[n] ⫽ f , j
(x ⫺ 2 j n)
p,q
characterize the orthogonal projection of f in Wj
. At the root, dL0,0 ⫽ aL .
Separable Filter Bank
From the separability of wavelet packet bases and the one-dimensional convolution
formula of Theorem (8.5), we derive that for any n ⫽ (n1 , n2 ),
2p,2q
2p⫹1,2q
dj⫹1 [n] ⫽ dj
p,q
h̄h̄[2n],
dj⫹1
2p,2q⫹1
p,q
h̄ḡ[2n],
dj⫹1
dj⫹1
[n] ⫽ dj
p,q
[n] ⫽ dj
2p⫹1,2q⫹1
ḡ h̄[2n],
(8.45)
p,q
(8.46)
[n] ⫽ dj
ḡ ḡ[2n].
Thus, the coefficients of a wavelet packet quad-tree are computed by iterating
these equations along the quad-tree’s branches.The calculations are performed with
separable convolutions along the rows and columns of the image, as illustrated in
Figure 8.13.
At the reconstruction,
p,q
dj
2p,2q
2p⫹1,2q
[n] ⫽ ďj⫹1 hh[n] ⫹ ďj⫹1
gh[n]
2p,2q⫹1
2p⫹1,2q⫹1
hg[n] ⫹ ďj⫹1
gg[n].
⫹ ďj⫹1
(8.47)
The image aL ⫽ dL0,0 is reconstructed from wavelet packet coefficients stored at the
leaves of any admissible quad-tree by repeating the partial reconstruction (8.47) in
the inside nodes of this quad-tree.
Finite Images
If the image aL has N ⫽ 2⫺2L pixels, the one-dimensional convolution border
problems are solved with one of the two approaches described in Sections 7.5.1
p,q
and 7.5.3. Each wavelet packet image dj includes 2⫺2j pixels.At depth j ⫺ L,there
p,q
are N wavelet packet coefficients in {dj }0⭐p,q⬍2 j⫺L .Thus, a quad-tree of maximum
depth log2 N 1/2 includes N log2 N 1/2 coefficients. If h and g have K nonzero coefficients, the one-dimensional convolutions that implement (8.45) and (8.46) require
2K 2⫺2j multiplications and additions. Thus, all wavelet packet coefficients at depth
j ⫹ 1 ⫺ L are computed from wavelet packet coefficients located at depth j ⫺ L with
399
400
CHAPTER 8 Wavelet Packet and Local Cosine Bases
Rows
Columns
–
h
2
dj11
2p,2q
–
h
2
g–
2
dj11
g–
2
–
h
2
dj11
g–
2
dj11
2p,2q11
p,q
dj
2p11,2q
2p11,2q11
(a)
Columns
dj11
2p,2q
2
h
2p,2q11
2
g
dj11
Rows
1
2
h
1
dj11
2p11,2q
2
h
2p11,2q11
dj11
2
g
1
2
p, q
dj
g
(b)
FIGURE 8.13
(a) Wavelet packet decomposition implementing (8.45) and (8.46) with one-dimensional
p,q
convolutions along the rows and columns of d1 . (b) Wavelet packet reconstruction
implementing (8.47).
2KN calculations.The N log2 N 1/2 wavelet packet coefficients of a full tree of depth
log2 N 1/2 are therefore obtained with KN log2 N multiplications and additions. The
numerical complexity of reconstructing aL from a wavelet packet basis depends
on the number of inside nodes of the corresponding quad-tree. The worst case is a
reconstruction from the leaves of a full quad-tree of depth log2 N 1/2 ,which requires
KN log2 N multiplications and additions.
8.3 BLOCK TRANSFORMS
Wavelet packet bases are designed by dividing the frequency axis in intervals
of varying sizes. These bases are particularly well adapted to decomposing signals that have different behavior in different frequency intervals. If f has properties that vary in time, it is then more appropriate to decompose f in a block basis
that segments the time axis in intervals with sizes that are adapted to the signal
structures. Section 8.3.1 explains how to generate a block basis of L 2 (R) from any
basis of L 2 [0, 1]. The block cosine bases described in Sections 8.3.2 and 8.3.3 are
used for signal and image compression.
8.3 Block Transforms
8.3.1 Block Bases
Block orthonormal bases are obtained by dividing the time axis in consecutive
intervals [a p , a p⫹1 ] with
lim a p ⫽ ⫺⬁ and
p→⫺⬁
lim a p ⫽ ⫹⬁.
p→⫹⬁
The size l p ⫽ a p⫹1 ⫺ a p of each interval is arbitrary. Let g ⫽ 1[0,1] . An interval is
covered by the dilated rectangular window
t ⫺ap
.
(8.48)
g p (t) ⫽ 1[a p ,a p⫹1 ] (t) ⫽ g
lp
Theorem 8.8 constructs a block orthogonal basis of L 2 (R) from a single orthonormal
basis of L 2 [0, 1].
Theorem 8.8. If {ek }k∈Z is an orthonormal basis of L 2 [0, 1], then
t ⫺a 1
p
gp,k (t) ⫽ g p (t) ek
lp
lp
(8.49)
(p,k)∈Z
is a block orthonormal basis of L 2 (R).
Proof. One can verify that the dilated and translated family
t ⫺a 1
p
ek
l
lp
p
(8.50)
k∈Z
is an orthonormal basis of L 2 [a p , a p⫹1 ]. If p ⫽ q, then gp,k , gq,k ⫽ 0 since their
supports do not overlap. Thus, the family (8.49) is orthonormal. To expand a signal f
in this family, it is decomposed as a sum of separate blocks
f (t) ⫽
⫹⬁
f (t) g p (t),
p⫽⫺⬁
and each block f (t)g p (t) is decomposed in the basis (8.50).
■
Block Fourier Basis
A block basis is constructed with the Fourier basis of L 2 [0, 1]:
.
ek (t) ⫽ exp(i2kt)
k∈Z
The time support of each block Fourier vector gp,k is [a p , a p⫹1 ] of size l p . The
Fourier transform of g ⫽ 1[0,1] is
i
sin(/2)
exp
ĝ() ⫽
/2
2
401
402
CHAPTER 8 Wavelet Packet and Local Cosine Bases
and
ĝp,k () ⫽ l p ĝ(l p ⫺ 2k) exp
⫺i2ka p
lp
.
⫺1
⫺1
It is centered at 2k l ⫺1
p and has a slow asymptotic decay proportional to l p || .
Because of this poor frequency localization, even though a signal f is smooth,
its decomposition in a block Fourier basis may include large high-frequency
coefficients. This can also be interpreted as an effect of periodization.
Discrete Block Bases
For all p ∈ Z, we suppose that a p ∈ Z. Discrete block bases are built with discrete
rectangular windows having supports on intervals [a p , a p⫹1 ⫺ 1]:
g p [n] ⫽ 1[a p ,ap⫹1 ⫺1] (n).
Since dilations are not defined in a discrete framework, we generally cannot derive
bases of intervals of varying sizes from a single basis. Thus,Theorem 8.9 supposes
that we can construct an orthonormal basis of Cl for any l ⬎ 0. The proof is
straightforward.
Theorem 8.9. Suppose that {ek,l }0⭐k⬍l is an orthogonal basis of Cl for any l ⬎ 0. The
family
gp,k [n] ⫽ g p [n] ek,l p [n ⫺ a p ]
(8.51)
0⭐k⬍l p ,p∈Z
is a block orthonormal basis of 2 (Z).
A discrete block basis is constructed with discrete Fourier bases
i2kn
1
ek,l [n] ⫽ √ exp
.
l
l
0⭐k⬍l
The resulting block Fourier vectors gp,k have sharp transitions at the window border, and thus are not well localized in frequency. As in the continuous case, the
decomposition of smooth signals f may produce large-amplitude, high-frequency
coefficients because of border effects.
Block Bases of Images
General block bases of images are constructed by partitioning the plane R2 into
rectangles {[a p , b p ] ⫻ [c p , d p ]}p∈Z of arbitrary length l p ⫽ b p ⫺ a p and width w p ⫽
d p ⫺ c p . Let {ek }k∈Z be an orthonormal basis of L 2 [0, 1] and g ⫽ 1[0,1] . We denote
x ⫺a y⫺c x ⫺a y⫺c 1
p
p
p
p
g p,k,j (x, y) ⫽ g
ek
g
ej
.
lp
wp
lp
wp
l pw p
The family {g p,k,j }(k,j)∈Z2 is an orthonormal basis of L 2 ([a p , b p ] ⫻ [c p , d p ]),and thus
g p,k,j ( p,k,j)∈Z3 is an orthonormal basis of L 2 (R2 ).
8.3 Block Transforms
For discrete images, we define discrete windows that cover each rectangle
g p ⫽ 1[a p ,b p ⫺1]⫻[c p ,d p ⫺1] .
If {ek,l }0⭐k⬍l is an orthogonal basis of Cl for any l ⬎ 0, then
g p,k,j [n1 , n2 ] ⫽ g p [n1 , n2 ] ek,l p [n1 ⫺ a p ] ej,w p [n2 ⫺ c p ]
(k,j,p)∈Z3
is a block basis of 2 (Z2 ).
8.3.2 Cosine Bases
If f ∈ L 2 [0, 1] and f (0) ⫽ f (1), even though f might be a smooth function, the
Fourier coefficients
1
f (u), ei2ku ⫽
f (u) e⫺i2ku du
0
have a relatively large amplitude at high frequencies 2k. Indeed, the Fourier series
expansion
f (t) ⫽
⫹⬁
f (u), ei2ku ei2kt
k⫽⫺⬁
is a function of period 1, equal to f over [0, 1], and that is therefore discontinuous
if f (0) ⫽ f (1). This shows that the restriction of a smooth function to an interval generates large Fourier coefficients. As a consequence, block Fourier bases are
rarely used. A cosine I basis reduces this border effect by restoring a periodic extension f̃ of f , which is continuous if f is continuous. Thus, high-frequency cosine I
coefficients have a smaller amplitude than Fourier coefficients.
Cosine I Basis
We define f̃ to be the function of period 2 that is symmetric at about 0 and equal
to f over [0, 1]:
f̃ (t) ⫽
f (t)
f (⫺t)
for t ∈ [0, 1]
for t ∈ (⫺1, 0).
(8.52)
If f is continuous over [0, 1], then f̃ is continuous over R, as shown in Figure 8.14.
However, if f has a nonzero right derivative at 0 or left derivative at 1, then f̃ is
nondifferentiable at integer points.
The Fourier expansion of f̃ over [0, 2] can be written as a sum of sine and cosine
terms:
⫹⬁
⫹⬁
2kt
2kt
f̃ (t) ⫽
a[k] cos
b[k] sin
⫹
.
2
2
k⫽0
k⫽1
403
404
CHAPTER 8 Wavelet Packet and Local Cosine Bases
f (t)
~
f (t)
21
0
1
2
FIGURE 8.14
The function f̃ (t) is an extension of f (t); it is symmetric at about 0 and of period 2.
The sine coefficients b[k] are zero because f̃ is even. Since f (t) ⫽ f̃ (t) over [0, 1],
this proves that any f ∈ L 2 [0, 1] can be written as a linear combination of the cosines
{cos(kt)}k∈N . One can verify that this family is orthogonal over [0, 1]. Therefore,
it is an orthogonal basis of L 2 [0, 1], as stated by Theorem 8.10.
Theorem 8.10: Cosine I. The family
√
k 2 cos(kt)
k∈N
with
k ⫽
2⫺1/2
1
if k ⫽ 0
if k ⫽ 0
is an orthonormal basis of L 2 [0, 1].
■
Block Cosine Basis
Let us divide the real line with square windows g p ⫽ 1[a p ,ap⫹1 ] . Theorem 8.8 proves
that
t ⫺ap 2
k cos k
gp,k (t) ⫽ g p (t)
lp
lp
k∈N,p∈Z
is a block basis of L 2 (R). The decomposition coefficients of a smooth function have
a faster decay at high frequencies in a block cosine basis than in a block Fourier
basis, because cosine bases correspond to a smoother signal extension beyond the
intervals [a p , ap⫹1 ].
Cosine IV Basis
Other cosine bases are constructed from the Fourier series,with different extensions
of f beyond [0, 1]. The cosine IV basis appears in fast numerical computations of
cosine I coefficients. It is also used to construct local cosine bases with smooth
windows in Section 8.4.2.
Any f ∈ L 2 [0, 1] is extended into a function f̃ of period 4, which is symmetric
about 0 and antisymmetric about 1 and ⫺1:
⎧
f (t)
if t ∈ [0, 1]
⎪
⎪
⎨
f (⫺t)
if t ∈ (⫺1, 0)
f̃ (t) ⫽
⫺f
(2
⫺
t)
if t ∈ [1, 2)
⎪
⎪
⎩
⫺f (2 ⫹ t) if t ∈ [⫺1, ⫺2)
If f (1) ⫽ 0, the antisymmetry at 1 creates a function f̃ that is discontinuous at
f (2n ⫹ 1) for any n ∈ Z, as shown in Figure 8.15. Therefore, this extension is less
regular than the cosine I extension (8.52).
8.3 Block Transforms
f (t)
~
f (t)
22
21
0
1
2
FIGURE 8.15
A cosine IV extends f (t) into a signal f̃ (t) of period 4, which is symmetric with respect to 0
and antisymmetric with respect to 1.
Since f̃ is 4 periodic, it can be decomposed as a sum of sines and cosines of
period 4:
f̃ (t) ⫽
⫹⬁
a[k] cos
2kt k⫽0
4
⫹
⫹⬁
b[k] sin
k⫽1
2kt .
4
The symmetry about 0 implies that
2kt 1 2
f̃ (t) sin
b[k] ⫽
dt ⫽ 0.
2 ⫺2
4
For even frequencies, the antisymmetry about 1 and ⫺1 yields
2(2k)t 1 2
f̃ (t) cos
a[2k] ⫽
dt ⫽ 0.
2 ⫺2
4
Thus, the only nonzero components are cosines of odd frequencies:
f̃ (t) ⫽
⫹⬁
a[2k ⫹ 1] cos
k⫽0
(2k ⫹ 1)2t
.
4
(8.53)
Since f (t) ⫽ f̃ (t) over [0, 1], this proves that any f ∈ L 2 [0, 1] is decomposed as a
sum of such cosine functions. One can verify that the restriction of these cosine
functions to [0, 1] is orthogonal in L 2 [0, 1], which implies Theorem 8.11.
Theorem 8.11: Cosine IV. The family
√
1
2 cos k ⫹ t
2
k∈N
is an orthonormal basis of L 2 [0, 1].
■
The cosine transform IV is not used in block transforms because it has the same
drawbacks as a block Fourier basis. Block cosine IV coefficients of a smooth f have
a slow decay at high frequencies, because such a decomposition corresponds to
a discontinuous extension of f beyond each block. Section 8.4.2 explains how to
avoid this issue with smooth windows.
405
406
CHAPTER 8 Wavelet Packet and Local Cosine Bases
8.3.3 Discrete Cosine Bases
Discrete cosine bases are derived from the discrete Fourier basis with the same
approach as in the continuous time case.To simplify notations,the sampling distance
is normalized to 1. If the sampling distance was originally N ⫺1 , then the frequency
indexes that appear in this section must be multiplied by N .
Discrete Cosine I
A signal f [n] defined for 0 ⭐ n ⬍ N is extended by symmetry with respect to ⫺1/2
into a signal f̃ [n] of size 2N :
f̃ [n] ⫽
f [n]
f [⫺n ⫺ 1]
for 0 ⭐ n ⬍ N
for ⫺N ⭐ n ⭐ ⫺1.
(8.54)
The 2N discrete Fourier transform of f̃ can be written as a sum of sine and cosine
terms:
N
N
⫺1
⫺1
k k 1
1
f̃ [n] ⫽
a[k] cos
b[k] sin
n⫹
⫹
n⫹
.
N
2
N
2
k⫽0
k⫽0
Since f̃ is symmetric about ⫺1/2, then necessarily b[k] ⫽ 0 for 0 ⭐ k ⬍ N . Moreover,
f [n] ⫽ f̃ [n] for 0 ⭐ n ⬍ N , so any signal f ∈ CN can be written as a sum of these
cosine functions. The reader can also verify that these discrete cosine signals are
orthogonal in CN . Thus, we obtain Theorem 8.12.
Theorem 8.12: Cosine I. The family
k 1
2
k
cos
n⫹
N
N
2
with
k ⫽
0⭐k⬍N
2⫺1/2
1
if k ⫽ 0
otherwise
is an orthonormal basis of CN .
■
This theorem proves that any f ∈ CN can be decomposed into
N ⫺1
2 k 1
f [n] ⫽
f̂I [k] k cos
n⫹
,
N k⫽0
N
2
(8.55)
where
f̂I [k] ⫽ f [n], k cos
!
k 1
n⫹
N
2
⫽ k
N
⫺1
n⫽0
f [n] cos
k 1
n⫹
N
2
(8.56)
is the discrete cosine transform I (DCT-I) of f . The next section describes a fast
discrete cosine transform that computes f̂I with O(N log2 N ) operations.
8.3 Block Transforms
Discrete Block Cosine Transform
Let us divide the integer set Z with discrete windows g p [n] ⫽ 1[a p ,a p ⫺1] (n) with
a p ∈ Z. Theorem 8.9 proves that the corresponding block basis
k 1
2
cos
n⫹ ⫺ap
gp,k [n] ⫽ g p [n] k
lp
lp
2
0⭐k⬍N ,p∈Z
is an orthonormal basis of 2 (Z). Over each block of size l p ⫽ ap⫹1 ⫺ a p ,the fast DCTI algorithm computes all coefficients with O(l p log2 l p ) operations. Section 10.5.1
describes the JPEG image compression standard, which decomposes images in a
separable block cosine basis. A block cosine basis is used as opposed to a block
Fourier basis,because it yields smaller-amplitude,high-frequency coefficients,which
improves the coding performance.
Discrete Cosine IV
To construct a discrete cosine IV basis, a signal f of N samples is extended into a
signal f̃ of period 4N , which is symmetric with respect to ⫺1/2 and antisymmetric
with respect to N ⫺ 1/2 and ⫺N ⫹ 1/2. As in (8.53), the decomposition of f̃ over
a family of sines and cosines of period 4N has no sine terms and no cosine terms
of even frequency. Since f̃ [N ] ⫽ f [n] for 0 ⭐ n ⬍ N , we derive that f can also be
written as a linear expansion of these odd-frequency cosines, which are orthogonal
in CN . Thus, we obtain Theorem 8.13.
Theorem 8.13: Cosine IV. The family
2
1 1
cos
k⫹
n⫹
N
N
2
2
0⭐k⬍N
is an orthonormal basis of CN .
■
This theorem proves that any f ∈ CN can be decomposed into
f [n] ⫽
N ⫺1
2
1
1
f̂IV [k] cos
k⫹
n⫹
,
N k⫽0
N
2
2
(8.57)
where
N
⫺1
f̂IV [k] ⫽
f [n] cos
N
n⫽0
1
k⫹
2
1
n⫹
2
(8.58)
is the discrete cosine transform IV (DCT-IV) of f .
8.3.4 Fast Discrete Cosine Transforms
The DCT-IV of a signal of size N is related to the discrete Fourier transform (DFT)
of a complex signal of size N /2 with a formula introduced by Duhamel, Mahieux,
and Petit [40, 236]. By computing this DFT with the fast Fourier transform ( FFT )
407
408
CHAPTER 8 Wavelet Packet and Local Cosine Bases
described in Section 3.3.3,we need O(N log2 N ) operations to compute the DCT-IV.
The DCT-I coefficients are then calculated through an induction relation with the
DCT-IV, due to Wang [479].
Fast DCT-IV
To clarify the relation between a DCT-IV and a DFT, we split f [n] in two half-size
signals of odd and even indices:
b[n] ⫽ f [2n],
c[n] ⫽ f [N ⫺ 1 ⫺ 2n].
The DCT-IV (8.58) is rewritten as
f̂IV [k] ⫽
N
/2⫺1
2n ⫹
b[n] cos
n⫽0
N
/2⫺1
1 1
k⫹
⫹
2
2 N
N ⫺ 1 ⫺ 2n ⫹
c[n] cos
n⫽0
⫽
N
/2⫺1
b[n] cos
n⫹
n⫽0
(⫺1)k
N
/2⫺1
n⫽0
1 1 2
k⫹
⫹
4
2 N
c[n] sin
1 1
k⫹
2
2 N
n⫹
1 1 2
k⫹
.
4
2 N
Thus, the even-frequency indices can be expressed as a real part,
f̂IV [2k]
⫽"
#
Re exp ⫺ik
N
#
"
N /2⫺1
1
n⫽0 (b[n] ⫹ ic[n]) exp ⫺i(n ⫹ 4 ) N exp
⫺i2kn
N /2
,
(8.59)
whereas the odd coefficients correspond to an imaginary part,
f̂IV [N⫺ 2k ⫺ 1] ⫽
#
"
⫺Im exp ⫺ik
N
#
"
N /2⫺1
1
n⫽0 (b[n] ⫹ ic[n]) exp ⫺i(n ⫹ 4 ) N exp
⫺i2kn
N /2
.
(8.60)
For 0 ⭐ n ⬍ N /2, we denote
1
g[n] ⫽ (b[n] ⫹ i c[n]) exp ⫺i n ⫹
.
4 N
The DFT ĝ[k] of g[n] is computed with an FFT of size N /2. Equations (8.59) and
(8.60) prove that
⫺ik
ĝ[k] ,
f̂IV [2k] ⫽ Re exp
N
8.3 Block Transforms
and
f̂IV [N ⫺ 2k ⫺ 1] ⫽ ⫺Im exp
⫺ik
ĝ[k] .
N
The DCT-IV coefficients f̂IV [k] are obtained with one FFT of size N /2 plus O(N )
normalize the DCT-IV,
operations,which makes a total of O(N log2 N ) operations.To
√
the resulting coefficients must be multiplied by 2/N . An efficient implementation
of the DCT-IV with a split-radix FFT requires [40]
DCT ⫺IV (N ) ⫽
N
log2 N ⫹ N
2
(8.61)
3N
log2 N
2
(8.62)
real multiplications and
␣DCT ⫺IV (N ) ⫽
additions.
The inverse DCT-IV of f̂IV is given by (8.57). Up to the proportionality constant
2/N , this sum is the same as (8.58), where f̂IV and f are interchanged. This proves
that the inverse DCT-IV is computed with the same fast algorithm as the forward
DCT-IV.
Fast DCT-I
A DCT-I is calculated with an induction relation that involves the DCT-IV. Regrouping
the terms f [n] and f [N ⫺ 1 ⫺ n] of a DCT-I (8.56) yields
f̂I [2k] ⫽ k
(8.63)
(k ⫹ 1/2) 1
n⫹
.
N /2
2
(8.64)
( f [n] ⫹ f [N ⫺ 1 ⫺ n]) cos
n⫽0
f̂I [2k ⫹ 1] ⫽
1
k n⫹
,
N /2
2
N
/2⫺1
N
/2⫺1
( f [n] ⫺ f [N ⫺ 1 ⫺ n]) cos
n⫽0
The even index coefficients of the DCT-I are equal to the DCT-I of the signal f [n] ⫹
f [N ⫺ 1 ⫺ n] of length N /2.The odd coefficients are equal to the DCT-IV of the signal
f [n] ⫺ f [N ⫺ 1 ⫺ n] of length N /2. Thus, the number of multiplications of a DCT-I is
related to the number of multiplications of a DCT-IV by the induction relation
DCT ⫺I (N ) ⫽ DCT ⫺I (N /2) ⫹ DCT ⫺IV (N /2),
(8.65)
while the number of additions is
␣DCT ⫺I (N ) ⫽ ␣DCT ⫺I (N /2) ⫹ ␣DCT ⫺IV (N /2) ⫹ N .
(8.66)
Since the number of multiplications and additions of a DCT-IV is O(N log2 N ), this
induction relation proves that the number of multiplications and additions of this
algorithm is also O(N log2 N ).
409
410
CHAPTER 8 Wavelet Packet and Local Cosine Bases
If the DCT-IV is implemented with a split-radix FFT, by inserting (8.61) and
(8.62) in the recurrence equations (8.65) and (8.66), we derive that the number
of multiplications and additions to compute a DCT-I of size N is
N
DCT ⫺I (N ) ⫽ log2 N ⫹ 1,
(8.67)
2
and
3N
␣DCT ⫺I (N ) ⫽
(8.68)
log2 N ⫺ N ⫹ 1.
2
The inverse DCT-I is computed with a similar recursive algorithm.Applied to f̂I ,it
is obtained by computing the inverse DCT-IV of the odd index coefficients f̂I [2k ⫹
1] with (8.64) and an inverse DCT-I of size N /2 applied to the even coefficients
f̂I [2k] with (8.63). From the values f [n] ⫹ f [N ⫺ 1 ⫺ n] and f [n] ⫺ f [N ⫺ 1 ⫺ n],we
recover f [n] and f [N ⫺ 1 ⫺ n]. The inverse DCT-IV is identical to the forward DCTIV up to a multiplicative constant.Thus,the inverse DCT-I requires the same number
of operations as the forward DCT-I.
8.4 LAPPED ORTHOGONAL TRANSFORMS
Cosine and Fourier block bases are computed with discontinuous rectangular windows that divide the real line in disjoint intervals. Multiplying a signal with a
rectangular window creates discontinuities that produce large-amplitude coefficients at high frequencies. To avoid these discontinuity artifacts, it is necessary to
use smooth windows.
The Balian-Low theorem (5.20) proves that for any u0 and 0 , there exists no
differentiable window g of compact support such that
g(t ⫺ nu0 ) exp(ik0 t)
2
(n,k)∈Z
is an orthonormal basis of L 2 (R). This negative result discouraged any research in
this direction, until Malvar discovered in discrete signal processing that one could
create orthogonal bases with smooth windows modulated by a cosine IV basis [368,
369]. This result was independently rediscovered for continuous-time functions by
Coifman and Meyer [181] with a different approach that we shall follow here. The
roots of these new orthogonal bases are lapped projectors, which split signals in
orthogonal components with overlapping supports [43]. Section 8.4.1 introduces
these lapped projectors, the construction of continuous time and discrete lapped
orthogonal bases are explained in the following sections.The particular case of local
cosine bases is studied in more detail.
8.4.1 Lapped Projectors
Block transforms compute the restriction of f to consecutive intervals [a p , a p⫹1 ]
and decompose this restriction in an orthogonal basis of [a p , a p⫹1 ]. Formally, the
8.4 Lapped Orthogonal Transforms
restriction of f to [a p , a p⫹1 ] is an orthogonal projection on space W p of functions
with a support included in [a p , a p⫹1 ]. To avoid the discontinuities introduced by
this projection, we introduce new orthogonal projectors that perform a smooth
deformation of f .
Projectors on Half Lines
Let us first construct two orthogonal projectors that decompose any f ∈ L 2 (R) in
two orthogonal components P ⫹ f and P ⫺ f with supports that are, respectively,
[⫺1, ⫹⬁) and (⫺⬁, 1]. For this purpose we consider a monotone increasing profile
function  such that
(t) ⫽
0
1
if t ⬍ ⫺1
if t ⬎ 1
(8.69)
and
᭙t ∈ [⫺1, 1],
2 (t) ⫹ 2 (⫺t) ⫽ 1.
(8.70)
A naive definition
P ⫹ f (t) ⫽ 2 (t) f (t) and
P ⫺ f (t) ⫽ 2 (⫺t) f (t)
satisfies the support conditions but does not define orthogonal functions. Since the
supports of P ⫹ f (t) and P ⫺ f (t) overlap only on [⫺1, 1],the orthogonality is obtained
by creating functions having a different symmetry with respect to 0 on [⫺1, 1]:
P ⫹ f (t) ⫽ (t) [(t) f (t) ⫹ (⫺t) f (⫺t)] ⫽ (t) p(t),
(8.71)
P ⫺ f (t) ⫽ (⫺t) [(⫺t) f (t) ⫺ (t) f (⫺t)] ⫽ (⫺t) q(t).
(8.72)
and
The functions p(t) and q(t) are, respectively, even and odd, and since (t)(⫺t) is
even, it follows that
1
⫹
⫺
P f , P f ⫽
(t) (⫺t) p(t) q ∗ (t) dt ⫽ 0.
(8.73)
⫺1
Clearly, P ⫹ f
belongs to space W ⫹ of functions
p(t) ⫽ p(⫺t) with
f (t) ⫽
f ∈ L 2 (R) such that there exists
0
if t ⬍ ⫺1
(t) p(t) if t ∈ [⫺1, 1].
Similarly, P ⫺ f is in space W ⫺ composed of f ∈ L 2 (R) such that there exists q(t) ⫽
⫺q(⫺t) with
f (t) ⫽
0
if t ⬎ 1
(⫺t) q(t) if t ∈ [⫺1, 1].
411
412
CHAPTER 8 Wavelet Packet and Local Cosine Bases
Functions in W ⫹ and W ⫺ may have an arbitrary behavior on [1, ⫹⬁) and (⫺⬁, ⫺1],
respectively. Theorem 8.14 characterizes P ⫹ and P ⫺ . We denote the identity
operator by Id.
Theorem 8.14: Coifman, Meyer. Operators P ⫹ and P ⫺ are orthogonal projectors on W ⫹
and W ⫺ , respectively. Spaces W ⫹ and W ⫺ are orthogonal and
P ⫹ ⫹ P ⫺ ⫽ Id.
P⫹
(8.74)
f ∈ W⫹
P⫹f ⫽ f .
Proof. To verify that
is a projector we show that any
satisfies
If t ⬍ ⫺1, then P ⫹ f (t) ⫽ f (t) ⫽ 0, and if t ⬎ 1, then P ⫹ f (t) ⫽ f (t) ⫽ 1. If t ∈ [⫺1, 1],
then f (t) ⫽ (t) p0 (t) and inserting (8.71) yields
P ⫹ f (t) ⫽ (t) [2 (t) p0 (t) ⫹ 2 (⫺t) p0 (⫺t)] ⫽ (t) p0 (t),
because p0 (t) is even and (t) satisfies (8.70). Projector P ⫹ is proved to be orthogonal
by showing that it is self-adjoint:
P ⫹ f , g ⫽
1
2 (t) f (t) g ∗ (t) dt ⫹
⫺1
⫹⬁
1
⫺1
(t) (⫺t) f (⫺t) g ∗ (t) dt ⫹
f (t) g ∗ (t) dt.
1
A change of variable t⬘ ⫽ ⫺t in the second integral verifies that this formula is symmetric in
f and g and thus P ⫹ f , g ⫽ f , P ⫹ g. Identical derivations prove that P ⫺ is an orthogonal
projector on W ⫺ .
The orthogonality of W ⫺ and W ⫹ is proved in (8.73). To verify (8.74), for f ∈ L 2 (R)
we compute
P ⫹ f (t) ⫹ P ⫺ f (t) ⫽ f (t) [2 (t) ⫹ 2 (⫺t)] ⫽ f (t).
■
These half-line projectors are generalized by decomposing signals in two orthogonal components with supports that are, respectively, [a ⫺ , ⫹⬁) and (⫺⬁, a ⫹ ].
For this purpose, we scale and translate the profile function ( t⫺a
) so that it
increases from 0 to 1 on [a ⫺ , a ⫹ ], as illustrated in Figure 8.16. The symmetry with respect to 0, which transforms f (t) in f (⫺t), becomes a symmetry with
respect to a, which transforms f (t) in f (2a ⫺ t). The resulting projectors are
t ⫺a t ⫺a
a⫺t ⫹
Pa,
f (t) ⫽ 

f (t) ⫹ 
f (2a ⫺ t) ,
(8.75)
and
a⫺t a⫺t t ⫺a
⫺
Pa,
f (t) ⫽ 

f (t) ⫺ 
f (2a ⫺ t) .
(8.76)
⫹ is an orthogonal
A straightforward extension of Theorem 8.14 proves that Pa,
⫹
2
projector on space Wa, of functions f ∈ L (R) such that there exists p(t) ⫽
p(2a ⫺ t) with
f (t) ⫽
0
if t ⬍ a ⫺
(⫺1 (t ⫺ a)) p(t) if t ∈ [a ⫺ , a ⫹ ].
(8.77)
8.4 Lapped Orthogonal Transforms
1
a2t
1 2
0
t2a
1 2
a2
a
t
a1
FIGURE 8.16
a⫺t
A multiplication with ( t⫺a
) and ( ) restricts the support of functions to [a ⫺ , ⫹⬁) and
(⫺⬁, a ⫹ ].
⫺ is an orthogonal projector on space W ⫺ composed of f ∈ L 2 (R)
Similarly, Pa,
a,
such that there exists q(t) ⫽ ⫺q(2a ⫺ t) with
f (t) ⫽
0
if t ⬍ ⫺1
(⫺1 (a ⫺ t)) q(t) if t ∈ [a ⫺ , a ⫹ ].
(8.78)
⫹ and W ⫺ are orthogonal and
Spaces Wa,
a,
⫹
⫺
Pa,
⫹ Pa,
⫽ Id.
(8.79)
Projectors on Intervals
A lapped projector splits a signal in two orthogonal components that overlap on
[a ⫺ , a ⫹ ]. Repeating such projections at different locations performs a signal
decomposition into orthogonal pieces with supports that overlap. Let us divide the
time axis in overlapping intervals:
I p ⫽ [a p ⫺ p , ap⫹1 ⫹ p⫹1 ]
with
lim a p ⫽ ⫺⬁ and
p→⫺⬁
lim a p ⫽ ⫹⬁.
p→⫹⬁
(8.80)
To ensure that I p⫺1 and I p⫹1 do not intersect for any p ∈ Z, we impose that
ap⫹1 ⫺ p⫹1 ⭓ a p ⫹ p ,
and thus,
l p ⫽ ap⫹1 ⫺ a p ⭓ p⫹1 ⫹ p .
(8.81)
The support of f is restricted to I p by the operator
P p ⫽ Pa⫹p , p Pa⫺p⫹1,
p⫹1
.
(8.82)
Since Pa⫹p , p and Pa⫺p⫹1,
are orthogonal projections on Wa⫹p , p and Wa⫺p⫹1, ,
p⫹1
p⫹1
it follows that P p is an orthogonal projector on
W p ⫽ Wa⫹p , p ∩ Wa⫺p⫹1,
p⫹1
.
(8.83)
413
414
CHAPTER 8 Wavelet Packet and Local Cosine Bases
Let us divide I p in two overlapping intervals O p and Op⫹1 , and a central interval C p :
I p ⫽ [a p ⫺ p , ap⫹1 ⫹ p⫹1 ] ⫽ O p ∪ C p ∪ Op⫹1
(8.84)
with
O p ⫽ [a p ⫺ p , a p ⫹ p ]
C p ⫽ [a p ⫹ p , ap⫹1 ⫺ p⫹1 ].
and
Space W p is characterized by introducing a window g p support in I p , and that
has a raising profile on O p and a decaying profile on O p⫹1 :
⎧
0
if t ∈
/ Ip
⎪
⎪
⎨(⫺1 (t ⫺ a ))
if
t
∈
Op
p
p
g p (t) ⫽
(8.85)
1
if
t
∈
Cp
⎪
⎪
⎩
⫺1
(p⫹1 (ap⫹1 ⫺ t)) if t ∈ O p⫹1 .
This window is illustrated in Figure 8.17. It follows from (8.77), (8.78), and (8.83)
that W p is the space of functions f ∈ L 2 (R) that can be written as
f (t) ⫽ g p (t) h(t)
g
with
p21
h(t) ⫽
(t )
ap2 p
h(2a p ⫺ t)
if t ∈ O p
⫺h(2ap⫹1 ⫺ t) if t ∈ O p⫹1 .
g (t )
p
ap1 p
ap112 p11
g
p11
(8.86)
(t )
ap111 p11
FIGURE 8.17
Each window g p has a support [a p ⫺ p , ap⫹1 ⫹ p⫹1 ] with an increasing profile and a
decreasing profile over [a p ⫺ p , a p ⫹ p ] and [ap⫹1 ⫺ p⫹1 , ap⫹1 ⫹ p⫹1 ].
The function h is symmetric with respect to a p and antisymmetric with respect
to ap⫹1 , with an arbitrary behavior in C p . Projector P p on W p defined in (8.82) can
be rewritten as
⎧ ⫺
if t ∈ O p
⎪
⎨Pa p , p f (t)
P p f (t) ⫽ f (t)
(8.87)
if t ∈ C p
⎪
⎩ ⫹
Pap⫹1 ,p⫹1 f (t) if t ∈ O p⫹1 ⫽ g p (t) h p (t),
where h p (t) is calculated by inserting (8.75) and (8.76):
⎧
if t ∈ O p
⎨ g p (t) f (t) ⫹ g p (2a p ⫺ t) f (2a p ⫺ t)
h p (t) ⫽ f (t)
if t ∈ C p
⎩
g p (t) f (t) ⫺ g p (2ap⫹1 ⫺ t) f (2a p⫹1 ⫺ t) if t ∈ O p⫹1 .
(8.88)
8.4 Lapped Orthogonal Transforms
Theorem 8.15 derives a decomposition of the identity.
Theorem 8.15. Operator P p is an orthogonal projector on W p . If p ⫽ q, then W p is
orthogonal to W q and
⫹⬁
P p ⫽ Id.
(8.89)
p⫽⫺⬁
Proof. If p ⫽ q and |p ⫺ q| ⬎ 1, then functions in W p and W q have supports that do not
overlap, so these spaces are orthogonal. If q ⫽ p ⫹ 1, then
W p ⫽ Wa⫹p , p ∩ Wa⫺p⫹1,
and
p⫹1
W p⫹1 ⫽ Wa⫹p⫹1 ,p⫹1 ∩ Wa⫺p⫹2 , p⫹2 .
Since Wa⫺p⫹1,
is orthogonal to Wa⫹p⫹1 ,p⫹1 , it follows that W p is orthogonal to W p⫹1 .
p⫹1
To prove (8.89), we first verify that
P p ⫹ Pp⫹1 ⫽ Pa⫹p , p Pa⫺p⫹2 , p⫹2 .
(8.90)
This is shown by decomposing P p and Pp⫹1 with (8.87) and inserting
Pa⫹p⫹1 ,p⫹1 ⫹ Pa⫺p⫹1 ,p⫹1 ⫽ Id.
As a consequence,
m
P p ⫽ Pa⫹n ,n Pa⫺m ,m .
(8.91)
p⫽n
For any f ∈ L 2 (R),
f ⫺ Pa⫹n ,n Pa⫺m ,m f 2 ⭓
an ⫹n
⫺⬁
| f (t)|2 dt ⫹
⫹⬁
am ⫺m
| f (t)|2 dt
and inserting (8.80) proves that
lim f ⫺ Pa⫹n ,n Pa⫺m ,m f 2 ⫽ 0.
n→⫺⬁
m→⫹⬁
■
The summation (8.91) implies (8.89).
Discretized Projectors
Projectors P p that restrict the signal support to [a p ⫺ p , ap⫹1 ⫹ p⫹1 ] are easily extended for discrete signals. Suppose that {a p }p∈Z are half integers, which
means that a p ⫹ 1/2 ∈ Z. The windows g p (t) defined in (8.85) are uniformly sampled g p [n] ⫽ g p (n). As in (8.86) we define the space W p ⊂ 2 (Z) of discrete signals
f [n] ⫽ g p [n] h[n] with
h[n] ⫽
h[2a p ⫺ n]
if n ∈ O p
⫺h[2ap⫹1 ⫺ n] if n ∈ O p⫹1 .
(8.92)
The orthogonal projector P p on W p is defined by an expression identical to (8.87)
and (8.88):
P p f [n] ⫽ g p [n] h p [n]
(8.93)
415
416
CHAPTER 8 Wavelet Packet and Local Cosine Bases
with
⎧
if n ∈ O p
⎨ g p [n] f [n] ⫹ g p [2a p ⫺ n] f [2a p ⫺ n]
h p [n] ⫽ f [n]
if n ∈ C p
⎩
g p [n] f [n] ⫺ g p [2ap⫹1 ⫺ n] f [2a p⫹1 ⫺ n] if n ∈ O p⫹1 .
(8.94)
Finally, we prove as in Theorem 8.15 that if p ⫽ q, then W p is orthogonal to W q and
⫹⬁
P p ⫽ Id.
(8.95)
p⫽⫺⬁
8.4.2 Lapped Orthogonal Bases
An orthogonal basis of L 2 (R) is defined from a basis {ek }k∈N of L 2 [0, 1] by multiplying
a translation and dilation of each vector with a smooth window g p defined in (8.85).
A local cosine basis of L 2 (R) is derived from a cosine IV basis of L 2 [0, 1].
The support of g p is [a p ⫺ p , ap⫹1 ⫹ p⫹1 ] with l p ⫽ ap⫹1 ⫺ a p , as illustrated
in Figure 8.17. The design of these windows also implies symmetry and quadrature
properties on overlapping intervals:
g p (t) ⫽ gp⫹1 (2ap⫹1 ⫺ t)
for
t ∈ [ap⫹1 ⫺ p⫹1 , ap⫹1 ⫹ p⫹1 ],
2
g 2p (t) ⫹ gp⫹1
(t) ⫽ 1
for
t ∈ [ap⫹1 ⫺ p⫹1 , ap⫹1 ⫹ p⫹1 ].
(8.96)
and
Each ek ∈ L 2 [0, 1] is extended over R into a function ẽk that is symmetric with
respect to 0 and antisymmetric with respect to 1. The resulting ẽk has period 4 and
is defined over [⫺2, 2] by
⎧
if t ∈ [0, 1]
ek (t)
⎪
⎪
⎨
ek (⫺t)
if t ∈ (⫺1, 0)
ẽk (t) ⫽
⫺e
(2
⫺
t)
if t ∈ [1, 2)
⎪
k
⎪
⎩
⫺ek (2 ⫹ t) if t ∈ [⫺1, ⫺2).
Theorem 8.16 derives an orthonormal basis of L 2 (R).
Theorem 8.16: Coifman, Malvar, Meyer. Let {ek }k∈N be an orthonormal basis of L 2 [0, 1].
The family
t ⫺ap
1
(8.97)
gp,k (t) ⫽ g p (t) ẽk
lp
lp
k∈N, p∈Z
is an orthonormal basis of L 2 (R).
Proof. Since ẽk (l ⫺1
p (t ⫺ a p )) is symmetric with respect to a p and antisymmetric with respect
to ap⫹1 , it follows from (8.86) that gp,k ∈ W p for all k ∈ N. Theorem 8.15 proves that
8.4 Lapped Orthogonal Transforms
$
p
spaces W p and W q are orthogonal for p ⫽ q and that L 2 (R) ⫽ ⫹⬁
p⫽⫺⬁ W . To prove
2
that (8.97) is an orthonormal basis of L (R), we thus need to show that
1
gp,k (t) ⫽ g p (t) ẽk
lp
t ⫺ap
lp
(8.98)
k∈N,p∈Z
is an orthonormal basis of W p .
Let us prove first that any f ∈ W p can be decomposed over this family. Such a function
can be as written as f (t) ⫽ g p (t) h(t), where the restriction of h to [a p , ap⫹1 ] is arbitrary,
and h is, respectively, symmetric and antisymmetric with respect to a p and ap⫹1 . Since
{ẽk }k∈N is an orthonormal basis of L 2 [0, 1], clearly
t ⫺a 1
p
ẽk
l
lp
p
(8.99)
k∈N
is an orthonormal basis of L 2 [a p , ap⫹1 ].The restriction of h to [a p , ap⫹1 ] can therefore be
decomposed in this basis. This decomposition remains valid for all t ∈ [a p ⫺ p , ap⫹1 ⫹
⫺1/2
p⫹1 ] since h(t) and the l p ẽk (l ⫺1
p (t ⫺ a p )) have the same symmetry with respect
to a p and ap⫹1 . Therefore, f (t) ⫽ h(t)g p (t) can be decomposed over the family (8.98).
Lemma 8.1 finishes the proof by showing that the orthogonality of functions in (8.98) is
a consequence of the orthogonality of (8.99) in L 2 [a p , ap⫹1 ].
Lemma 8.1. If fb (t) ⫽ hb (t) g p (t) ∈ W p and fc (t) ⫽ hc (t) g p (t) ∈ W p , then
ap⫹1 ⫹p⫹1
ap⫹1
∗
fb , fc ⫽
fb (t) fc (t) dt ⫽
hb (t) h∗c (t) dt.
a p ⫺ p
Let us evaluate
fb , fc ⫽
(8.100)
ap
ap⫹1 ⫹p⫹1
a p ⫺ p
hb (t) h∗c (t) g 2p (t) dt.
(8.101)
We know that hb (t) and hc (t) are symmetric with respect to a p , so
a p ⫹ p
a p ⫺ p
hb (t) h∗c (t) g 2p (t) dt ⫽
a p ⫹ p
ap
hb (t) h∗c (t) [g 2p (t) ⫹ g 2p (2a p ⫺ t)] dt.
Since g 2p (t) ⫹ g 2p (2ap⫹1 ⫺ t) ⫽ 1 over this interval, we obtain
a p ⫹ p
a p ⫺ p
hb (t) h∗c (t) g 2p (t) dt ⫽
a p ⫹ p
hb (t) hc (t) dt.
(8.102)
ap
The functions hb (t) and hc (t) are antisymmetric with respect to ap⫹1 , so hb (t)h∗c (t) is
symmetric about ap⫹1 . Thus we prove similarly that
ap⫹1 ⫹p⫹1
ap⫹1 ⫺p⫹1
2
hb (t) h∗c (t) gp⫹1
(t) dt ⫽
ap⫹1
ap⫹1 ⫺p⫹1
hb (t) h∗c (t) dt.
(8.103)
Since g p (t) ⫽ 1 for t ∈ [a p ⫹ p , ap⫹1 ⫺ p⫹1 ], inserting (8.102) and (8.103) in (8.101)
proves the lemma property (8.100).
■
417
418
CHAPTER 8 Wavelet Packet and Local Cosine Bases
Theorem 8.16 is similar to the block basis theorem (8.8) but it has the advantage of
using smooth windows g p as opposed to the rectangular windows that are indicator
functions of [a p , ap⫹1 ]. It yields smooth functions gp,k only if the extension ẽk of
√
ek is a smooth function. This is the case for the cosine IV basis {ek (t) ⫽ 2 cos[(k ⫹
1/2)t]}k∈N of L 2 [0, 1] defined in Theorem 8.11. Indeed cos[(k ⫹ 1/2)t] has a
natural symmetric and antisymmetric extension with respect to 0 and 1 over R.
Corollary 8.1 derives a local cosine basis.
Corollary 8.1. The family of local cosine functions
1t ⫺ap
2
cos k ⫹
gp,k (t) ⫽ g p (t)
lp
2
lp
(8.104)
k∈N, p∈Z
is an orthonormal basis of L 2 (R).
Cosine–Sine I Basis
Other bases can be constructed with functions having a different symmetry. To
maintain the orthogonality of the windowed basis,we must ensure that consecutive
windows g p and gp⫹1 are multiplied by functions that have an opposite symmetry
with respect to ap⫹1 . For example, we can multiply g2p with functions that are
symmetric with respect to both ends a2p and a2p⫹1 , and multiply g2p⫹1 with functions that are antisymmetric with respect to a2p⫹1 and a2p⫹2 . Such bases can be
√
constructed with the cosine I basis { 2k cos(kt)}k∈Z defined√in Theorem 8.10,
with 0 ⫽ 2⫺1/2 and k ⫽ 1 for k ⫽ 0, and with the sine I family { 2 sin(kt)}k∈N∗ ,
which is also an orthonormal basis of L 2 [0, 1]. The reader can verify that if
t ⫺ a2p
2
g2p,k (t) ⫽ g2p (t)
k cos k
l2p
l2p
t ⫺ a2p⫹1
2
,
sin k
g2p⫹1,k (t) ⫽ g2p⫹1 (t)
l2p⫹1
l2p⫹1
then {gp,k }k∈N,p∈Z is an orthonormal basis of L 2 (R).
Lapped Transforms in Frequency
Lapped orthogonal projectors can also divide the frequency axis in separate overlapping intervals. This is done by decomposing the Fourier transform f̂ () of
f (t) over a local cosine basis defined on the frequency axis {gp,k ()}p∈Z,k∈N .
This is also equivalent to decomposing f (t) on its inverse Fourier transform
1
{ 2
ĝp,k (⫺t)}p∈Z,k∈N . As opposed to wavelet packets, which decompose signals in
dyadic-frequency bands, this approach offers complete flexibility on the size of the
frequency intervals [a p ⫺ p , ap⫹1 ⫹ p⫹1 ].
A signal decomposition in a Meyer wavelet or wavelet packet basis can be
calculated with a lapped orthogonal transform applied in the Fourier domain.
Indeed, the Fourier transform (7.87) of a Meyer wavelet has a compact support and
8.4 Lapped Orthogonal Transforms
ˆ j )|}j∈Z can be considered as a family asymmetric window, with support that
{|(2
only overlaps with adjacent windows with appropriate symmetry properties.These
ˆ j 2
windows cover the whole frequency axis: ⫹⬁
j⫽⫺⬁ |(2 )| ⫽ 1. As a result, the
Meyer wavelet transform can be viewed as a lapped orthogonal transform applied
in the Fourier domain. Thus, it can be efficiently implemented with the folding
algorithm from Section 8.4.4.
8.4.3 Local Cosine Bases
The local cosine basis defined in (8.104) is composed of functions
1t ⫺ap
2
gp,k (t) ⫽ g p (t)
cos k ⫹
lp
2
lp
with a compact support [a p ⫺ p , ap⫹1 ⫹ p⫹1 ]. The energy of their Fourier
transforms is also well concentrated. Let ĝ p be the Fourier transform of g p ,
exp(⫺ia p p,k ) 2 ĝp,k () ⫽
ĝ p ( ⫺ p,k ) ⫹ ĝ p ( ⫹ p,k ) ,
2
lp
where
p,k ⫽
(k ⫹ 1/2)
.
lp
The bandwidth of ĝp,k around p,k and ⫺p,k is equal to the bandwidth of ĝ p . If
sizes p and p⫹1 of the variation intervals of g p are proportional to l p , then this
bandwidth is proportional to l ⫺1
p .
For smooth functions f , we want to guarantee that the inner products f , gp,k have a fast decay when the center frequency p,k increases. The Parseval formula
proves that
exp(ia p p,k ) 2 ⫹⬁
f̂ () ĝ ∗p ( ⫺ p,k ) ⫹ ĝ ∗p ( ⫹ p,k ) d.
f , gp,k ⫽
2
l p ⫺⬁
The smoothness of f implies that | f̂ ()| has a fast decay at large frequencies .
Therefore, this integral will become small when p,k increases if g p is a smooth
window, because |ĝ p ()| has a fast decay.
Window Design
The regularity of g p depends on the regularity of profile , as shown by (8.85). This
profile must satisfy
2 (t) ⫹ 2 (⫺t) ⫽ 1
for
t ∈ [⫺1, 1],
plus (t) ⫽ 0 if t ⬍ ⫺1 and (t) ⫽ 1 if t ⬎ 1. One example is
(1 ⫹ t)
for t ∈ [⫺1, 1],
0 (t) ⫽ sin
4
(8.105)
419
420
CHAPTER 8 Wavelet Packet and Local Cosine Bases
0
t
g (t)
p
0
ap ⫺ 1 ap
ap ⫹ 1
t
1p
FIGURE 8.18
Heisenberg boxes of local cosine vectors define a regular grid over the time-frequency plane.
but its derivative at t ⫽⫾ 1 is nonzero, so  is not differentiable at ⫾ 1. Windows
of higher regularity are constructed with a profile k defined by induction for
k ⭓ 0 by
t k⫹1 (t) ⫽ k sin
for t ∈ [⫺1, 1].
2
For any k ⭓ 0,one can verify that k satisfies (8.105) and has 2k ⫺ 1 vanishing derivatives at t ⫽⫾ 1. The resulting  and g p are therefore 2k ⫺ 1 times continuously
differentiable.
Heisenberg Box
A local cosine basis can be symbolically represented as an exact paving of the timefrequency plane. The time and frequency region of high-energy concentration for
each local cosine vector gp,k is approximated by a Heisenberg rectangle
[a p , a p⫹1 ] ⫻ p,k ⫺
, p,k ⫹
,
2l p
2l p
as illustrated in Figure 8.18. A local cosine basis {gp,k }k∈N,p∈Z corresponds to a
time-frequency grid which varies in time.
Figure 8.19(a) shows the decomposition of a digital recording of the sound
“grea” coming from the word “greasy.” The window sizes are adapted to the signal
structures with the best basis algorithm described in Section 12.2.2. High-amplitude
8.4 Lapped Orthogonal Transforms
1000
0
⫺1000
0
0.25
0.5
0.75
1
0
0.25
0.5
0.75
1
0.75
1
2000
1500
1000
500
0
(a)
2000
1500
1000
500
0
0
0.25
0.5
(b)
FIGURE 8.19
(a) The signal at the top is a recording of the sound “grea” in the word “greasy.” This signal is
decomposed in a local cosine basis with windows of varying sizes. The larger the amplitude
of | f , gp,k |, the darker the gray level of the Heisenberg box. (b) Decomposition in a local
cosine basis with small windows of constant size.
421
422
CHAPTER 8 Wavelet Packet and Local Cosine Bases
coefficients are along spectral lines in the time-frequency plane. Most Heisenberg
boxes appear in white, which indicates that the corresponding inner product is
nearly zero. Thus, this signal can be approximated with a few nonzero local cosine
vectors. Figure 8.19(b) decomposes the same signal in a local cosine basis composed
of small windows of constant size. The signal time-frequency structures do not
appear as well as in Figure 8.19(a).
Translation and Phase
Cosine modulations as opposed to complex exponentials do not provide easy access
to phase information.The translation of a signal can induce important modifications
of its decomposition coefficients in a cosine basis. Consider, for example,
1t ⫺ap
2
f (t) ⫽ gp,k (t) ⫽ g p (t)
cos k ⫹
.
lp
2
lp
Since the basis is orthogonal, f , gp,k ⫽ 1 and all other inner products are zero.
After a translation by ⫽ l p /(2k ⫹ 1),
lp
1t ⫺ap
2
f (t) ⫽ f t ⫺
sin k ⫹
⫽ g p (t)
.
2k ⫹ 1
lp
2
lk
The opposite parity of sine and cosine implies that f , gp,k ≈ 0. In contrast,
f , g p,k⫺1 and f , g p,k⫹1 become nonzero. After translation,a signal component
initially represented by a cosine of frequency (k ⫹ 1/2)/l p is therefore spread over
cosine vectors of different frequencies.
This example shows that the local cosine coefficients of a pattern are severely
modified by any translation. We are facing the same translation distortions as
observed in Section 5.1.5 for wavelets and time-frequency frames. This lack of
translation invariance makes it difficult to use these bases for pattern recognition.
8.4.4 Discrete Lapped Transforms
Lapped orthogonal bases are discretized by replacing the orthogonal basis of L 2 [0, 1]
with a discrete basis of CN , and uniformly sampling the windows g p . Discrete local
cosine bases are derived with discrete cosine IV bases.
Let {a p }p∈Z be a sequence of half integers a p ⫹ 1/2 ∈ Z with
lim a p ⫽ ⫺⬁ and
p→⫺⬁
lim a p ⫽ ⫹⬁.
p→⫹⬁
A discrete lapped orthogonal basis is constructed with the discrete projectors P p
defined in (8.93). These operators are implemented with the sampled windows
g p [n] ⫽ g p (n). Suppose that {ek,l [n]}0⭐k⬍l is an orthogonal basis of signals defined
for 0 ⭐ n ⬍ l. These vectors are extended over Z with a symmetry with respect to
⫺1/2 and an antisymmetry with respect to l ⫺ 1/2. The resulting extensions have a
8.4 Lapped Orthogonal Transforms
period 4l and are defined over [⫺2l, 2l ⫺ 1] by
⎧
if n ∈ [0, l ⫺ 1]
el,k [n]
⎪
⎪
⎨
el,k [⫺1 ⫺ n]
if n ∈ [⫺l, ⫺1]
ẽl,k [n] ⫽
⫺e [2l ⫺ 1 ⫺ n] if n ∈ [l, 2l ⫺ 1]
⎪
⎪
⎩ k
⫺ek [2l ⫹ n]
if n ∈ [⫺2l, ⫺l ⫺ 1].
Theorem 8.17 proves that multiplying these vectors with the discrete windows
g p [n] yields an orthonormal basis of 2 (Z).
Theorem 8.17: Coifman, Malvar, Meyer. Suppose that {ek,l }0⭐k⬍l is an orthogonal basis
of Cl for any l ⬎ 0. The family
gp,k [n] ⫽ g p [n] ẽk,l p [n ⫺ a p ]
(8.106)
0⭐k⬍l p ,p∈Z
is a lapped orthonormal basis of 2 (Z).
The proof of this theorem is identical to the proof of Theorem 8.16 since we have
a discrete equivalent of the spaces W p and their projectors. It is also based on
a discrete equivalent of Lemma 8.1, which is verified with the same derivations.
Beyond the proof of Theorem 8.17, we shall see that Lemma 8.2 is important for
quickly computing the decomposition coefficients f , gp,k .
■
Lemma 8.2. Any fb [n] ⫽ g p [n] hb [n] ∈ W p and fc [n] ⫽ g p [n] hc [n] ∈ W p satisfy
fb , fc ⫽
fb [n] fc∗ [n] ⫽
a p ⫺ p ⬍n⬍ap⫹1 ⫹p⫹1
hb [n] h∗c [n].
(8.107)
a p ⬍n⬍ap⫹1
Theorem 8.17 is similar to the discrete block basis theorem (8.9) but constructs
an orthogonal basis with smooth discrete windows g p [n]. The discrete cosine IV
bases
2
1 1
el,k [n] ⫽
cos
k⫹
n⫹
l
l
2
2
0⭐k⬍l
have the advantage of including vectors that have a natural symmetric and antisymmetric extension with respect to ⫺1/2 and l ⫺ 1/2. This produces a discrete local
cosine basis of 2 (Z).
Corollary 8.2. The family
gp,k [n] ⫽ g p [n]
2
1n⫺ap
cos k ⫹
lp
2
lp
0⭐k⬍l p ,p∈Z
is an orthonormal basis of 2 (Z).
(8.108)
423
424
CHAPTER 8 Wavelet Packet and Local Cosine Bases
Fast Lapped Orthogonal Transform
A fast algorithm introduced by Malvar [40] replaces the calculations of f , gp,k by a computation of inner products in the original bases {el,k }0⭐k⬍l with a folding
procedure. In a discrete local cosine basis, these inner products are calculated with
the fast DCT-IV algorithm.
To simplify notations, as in Section 8.4.1 we decompose I p ⫽ [a p ⫺ p , ap⫹1 ⫹
p⫹1 ] into I p ⫽ O p ∪ C p ∪ Op⫹1 with
O p ⫽ [a p ⫺ p , a p ⫹ p ]
and
C p ⫽ [a p ⫹ p , ap⫹1 ⫺ p⫹1 ].
The orthogonal projector P p on space W p generated by {gp,k }0⭐k⬍l p was calculated
in (8.93):
P p f [n] ⫽ g p [n] h p [n],
where hp is a folded version of f :
⎧
if n ∈ O p
⎨ g p [n] f [n] ⫹ g p [2a p ⫺ n] f [2a p ⫺ n]
hp [n] ⫽ f [n]
if n ∈ C p
⎩
g p [n] f [n] ⫺ g p [2ap⫹1 ⫺ n] f [2a p⫹1 ⫺ n] if n ∈ O p⫹1 .
(8.109)
Since gp,k ∈ W p ,
f , gp,k ⫽ P p f , gp,k ⫽ g p h p , g p ẽl p ,k .
Since ẽl p ,k [n] ⫽ el p ,k [n] for n ∈ [a p , ap⫹1 ], Lemma 8.2 derives that
f , gp,k ⫽
hp [n] el p ,k [n] ⫽ h p , el p ,k [a p ,ap⫹1 ] .
(8.110)
a p ⬍n⬍ap⫹1
This proves that the decomposition coefficients f , gp,k can be calculated by folding
f into hp and computing the inner product with the orthogonal basis {el p ,k }0⭐k⬍l p
defined over [a p , ap⫹1 ].
For a discrete cosine basis, the DCT-IV coefficients
2
1n⫺ap
hp , el p ,k [a p ,ap⫹1 ] ⫽
(8.111)
h p [n]
cos k ⫹
lp
2
lp
a ⬍n⬍a
p
p⫹1
are computed with the fast DCT-IV algorithm from Section 8.3.4, which requires
O(l p log2 l p ) operations. The inverse lapped transform recovers h p [n] over
[a p , ap⫹1 ] from the l p inner products {h p , el p ,k [a p ,ap⫹1 ] }0⭐k⬍l p . In a local cosine
IV basis, this is done with the fast inverse DCT-IV, which is identical to the forward
DCT-IV and requires O(l p log2 l p ) operations. The reconstruction of f is done by
applying (8.95), which proves that
f [n] ⫽
⫹⬁
p⫽⫺⬁
P p f [n] ⫽
⫹⬁
p⫽⫺⬁
gp [n] h p [n].
(8.112)
8.4 Lapped Orthogonal Transforms
⫹
Let us denote O⫺
p ⫽ [a p ⫺ p , a p ] and O p ⫽ [a p , a p ⫹ p ]. The restriction of
(8.112) to [a p , ap⫹1 ] gives
⎧
⫹
⎨g p [n] h p [n] ⫹ g p⫺1 [n] h p⫺1 [n] if n ∈ O p
if n ∈ C p
f [n] ⫽ h p [n]
⎩
⫺ .
g p [n] h p [n] ⫹ g p⫹1 [n] h p⫹1 [n] if n ∈ Op⫹1
The symmetry of the windows guarantees that g p⫺1 [n] ⫽ g p [2a p ⫺ n] and g p⫹1 [n] ⫽
g p [2a p⫹1 ⫺ n]. Since h p⫺1 [n] is antisymmetric with respect to a p and h p⫹1 [n] is
symmetric with respect to ap⫹1 , we can recover f [n] on [a p , ap⫹1 ] from the values
of h p⫺1 [n], h p [n], and h p⫹1 [n] computed, respectively, on [a p⫺1 , a p ], [a p , a p⫹1 ],
and [a p⫹1 , a p⫹2 ]:
⎧
if n ∈ O⫹
⎨g p [n] h p [n] ⫺ g p [2a p ⫺ n] h p⫺1 [2a p ⫺ n]
p
h
[n]
if n ∈ C p
(8.113)
f [n] ⫽
p
⎩
⫺ .
g p [n] h p [n] ⫹ g p [2ap⫹1 ⫺ n] h p⫹1 [2ap⫹1 ⫺ n] if n ∈ Op⫹1
This unfolding formula is implemented with O(l p ) calculations. Thus, the inverse
local cosine transform requires O(l p log2 l p ) operations to recover f [n] on each
interval [a p , ap⫹1 ] of length l p .
Finite Signals
If f [n] is defined for 0 ⭐ n ⬍ N , the extremities of the first and last interval must be
a0 ⫽ ⫺1/2 and aq ⫽ N ⫺ 1/2. A fast local cosine algorithm needs O(l p log2 l p ) additions and multiplications to decompose or reconstruct the signal on each interval
of length l p . On the whole signal of length N , it thus needs a total of O(N log2 L)
operations, where L ⫽ sup0⭐p⬍q l p .
Since we do not know the values of f [n] for n ⬍ 0, at the left border we set
0 ⫽ 0. This means that g0 [n] jumps from 0 to 1 at n ⫽ 0. The resulting transform
on the left boundary is equivalent to a straight DCT-IV. Section 8.3.2 shows that
since cosine IV vectors are even on the left boundary, the DCT-IV is equivalent to
a symmetric signal extension followed by a discrete Fourier transform. This avoids
creating discontinuity artifacts at the left border.
At the right border, we also set q ⫽ 0 to limit the support of gq⫺1 to [0, N ⫺ 1].
Section 8.4.4 explains that since cosine IV vectors are odd on the right boundary,
the DCT-IV is equivalent to an antisymmetric signal extension. If f [N ⫺ 1] ⫽ 0, this
extension introduces a sharp signal transition that creates artificial high frequencies.
To reduce this border effect, we replace the cosine IV modulation
1 n ⫺ aq⫺1
2
gq⫺1,k [n] ⫽ gq⫺1 [n]
cos k ⫹
lq⫺1
2
lq⫺1
with a cosine I modulation,
gq⫺1,k [n] ⫽ gq⫺1 [n]
n ⫺ aq⫺1
.
k cos k
lq⫺1
lq⫺1
2
425
426
CHAPTER 8 Wavelet Packet and Local Cosine Bases
The orthogonality with the other elements of the basis is maintained because
these
" cosine I vectors, like
# cosine IV vectors, are even with respect to aq⫺1 . Since
cos k(n ⫺ aq⫺1 )/lq⫺1 is also symmetric with respect to aq ⫽ N ⫺ 1/2,computing
a DCT-I is equivalent to performing a symmetric signal extension at the right boundary,which avoids discontinuities. In the fast local cosine transform,we thus compute
a DCT-I of the last folded signal hq⫺1 instead of a DCT-IV. The reconstruction
algorithm uses an inverse DCT-I to recover hq⫺1 from these coefficients.
8.5 LOCAL COSINE TREES
Corollary 8.1 constructs local cosine bases for any segmentation of the time axis
into intervals [a p , ap⫹1 ] of arbitrary lengths. This result is more general than the
construction of wavelet packet bases that can only divide the frequency axis into
dyadic intervals with a length proportional to a power of 2. However, Coifman and
Meyer [181] showed that restricting the intervals to dyadic sizes has the advantage
of creating a tree structure similar to a wavelet packet tree. “Best”local cosine bases
can then be adaptively chosen with the fast dynamical programming algorithm,
described in Section 12.2.2.
8.5.1 Binary Tree of Cosine Bases
A local cosine tree includes orthogonal bases that segment the time axis in dyadic
intervals. For any j ⭓ 0, the interval [0, 1] is divided in 2 j intervals of length 2⫺j by
setting
ap, j ⫽ p 2⫺j
for
0⭐p⭐2 j.
These intervals are covered by windows gp, j defined by (8.85) with a support [ap, j ⫺
, a p⫹1,j ⫹ ]:
⎧
if t ∈ [ap, j ⫺ , ap, j ⫹ ]
(⫺1 (t ⫺ ap, j ))
⎪
⎪
⎨
1
if t ∈ [ap, j ⫹ , ap⫹1, j ⫺ ]
gp, j (t) ⫽
(8.114)
(⫺1 (a p⫹1, j ⫺ t)) if t ∈ [ap⫹1, j ⫺ , ap⫹1, j ⫹ ]
⎪
⎪
⎩
0
otherwise.
To ensure that the support of g p, j is in [0, 1] for p ⫽ 0 and p ⫽ 2 j ⫺ 1, we modify,
respectively, the left and right sides of these windows by setting g0, j (t) ⫽ 1 if t ∈
[0, ], and g2 j ⫺1, j (t) ⫽ 1 if t ∈ [1 ⫺ , 1]. It follows that g0,0 ⫽ 1[0,1] . The size of the
raising and decaying profiles of gp, j is independent of j. To guarantee that windows
overlap only with their two neighbors, length ap⫹1, j ⫺ ap, j ⫽ 2⫺j must be larger
than size 2 of the overlapping intervals, and thus
⭐ 2⫺j⫺1 .
(8.115)
Similar to wavelet packet trees, a local cosine tree is constructed by recursively
dividing spaces built with local cosine bases. A tree node at depth j and position p
8.5 Local Cosine Trees
p
is associated to space Wj generated by the local cosine family
p
Bj ⫽
gp, j (t)
1 t ⫺ ap, j
2
cos k ⫹
2⫺j
2
2⫺j
.
(8.116)
k∈Z
j
Any f ∈ W p has a support in [ap, j ⫺ , ap⫹1, j ⫹ ] and can be written as f (t) ⫽
g p, j (t) h(t), where h(t) is symmetric and antisymmetric, respectively, to ap, j and
p
2p
ap⫹1, j . Theorem 8.18 shows that Wj is divided in two orthogonal spaces Wj⫹1 and
2p⫹1
Wj⫹1 that are built over the two half intervals.
2p
2p⫹1
Theorem 8.18: Coifman, Meyer. For any j ⭓ 0 and p ⬍ 2 j , spaces Wj⫹1 and Wj⫹1 are
orthogonal and
p
2p
2p⫹1
Wj ⫽ Wj⫹1 ⊕ Wj⫹1 .
2p
(8.117)
2p⫹1
Proof. The orthogonality of Wj⫹1 and Wj⫹1
is proved by Theorem 8.15. We denote P p, j
p
as the orthogonal projector on Wj . With the notation of Section 8.4.1, this projector is
decomposed into two splitting projectors at a p, j and a p⫹1,j :
P p, j ⫽ Pa⫹p, j , Pa⫺p⫹1,j , .
Equation (8.90) proves that
P2p, j⫹1 ⫹ P2p⫹1, j⫹1 ⫽ Pa⫹2p, j⫹1 , Pa⫺2p⫹2, j⫹1 , ⫽ Pa⫹p, j , Pa⫺p⫹1, j , ⫽ P p, j .
This equality on orthogonal projectors implies (8.117).
■
p
Space Wj located at node ( j, p) of a local cosine tree is therefore the sum of
2p
2p⫹1
the two spaces Wj⫹1 and Wj⫹1 located at the children nodes. Since g0,0 ⫽ 1[0,1]
it follows that W00 ⫽ L 2 [0, 1]. The maximum depth J of the binary tree is limited by
the support condition ⭐ 2⫺J ⫺1 , and thus
J ⭐ ⫺ log2 (2).
(8.118)
Admissible Local Cosine Bases
As in a wavelet packet binary tree, many local cosine orthogonal bases are constructed from this local cosine tree. We call any subtree of the local cosine tree
with nodes that have either zero or two children an admissible binary tree. Let
{ ji , pi }1⭐i⭐I be the indices at the leaves of a particular admissible binary tree.
Applying the splitting property (8.117) along the branches of this subtree proves
that
L 2 [0, 1] ⫽ W00 ⫽
I
i⫽1
p
W ji i .
427
428
CHAPTER 8 Wavelet Packet and Local Cosine Bases
0
W0
1
0
W1
W1
2
W2
2
3
W2
2
FIGURE 8.20
An admissible binary tree of local cosine spaces divides the time axis in windows of dyadic
lengths.
%
p
Thus, the union of local cosine bases Ii⫽1 Bji i is an orthogonal basis of L 2 [0, 1].
This can also be interpreted as a division of the time axis into windows of various
length, as illustrated by Figure 8.20.
The number BJ of different dyadic local cosine bases is equal to the number of
different admissible subtrees of depth of at most J . For J ⫽ ⫺ log2 (2),Theorem 8.2
proves that
21/(4) ⭐ BJ ⭐ 23/(8) .
Figure 8.19 on page 421 shows the decomposition of a sound recording in two
dyadic local cosine bases selected from the binary tree. The basis in (a) is calculated
with the best basis algorithm of Section 12.2.2.
Choice of
At all scales 2 j ,windows gp, j of a local cosine tree have raising and decaying profiles
of the same size . Thus, these windows can be recombined independently from
their scale. If is small compared to the interval size 2⫺j , then gp, j has a relatively
sharp variation at its borders compared to the size of its support. Since is not
proportional to 2⫺j , the energy concentration of ĝp, j is not improved when the
window size 2⫺j increases. Even though f may be very smooth over [ap, j , ap⫹1, j ],
the border variations of the window create relatively large coefficients up to a
frequency of the order of /.
To reduce the number of large coefficients we must increase , but this also
increases the minimum window size in the tree, which is 2⫺J ⫽ 2. The choice of
is therefore the result of a trade-off between window regularity and the maximum resolution of the time subdivision. There is no equivalent limitation in the
construction of wavelet packet bases.
8.5 Local Cosine Trees
8.5.2 Tree of Discrete Bases
For discrete signals of size N ,a binary tree of discrete cosine bases is constructed like
a binary tree of continuous-time cosine bases. To simplify notations, the sampling
distance is normalized to 1. If it is equal to N ⫺1 , then frequency parameters must
be multiplied by N .
The subdivision points are located at half integers:
ap, j ⫽ p N 2⫺j ⫺ 1/2
for
0⭐p⭐2 j.
The discrete windows are obtained by sampling the windows g p (t) defined in
(8.114), gp, j [n] ⫽ gp, j (n). The same border modification is used to ensure that the
support of all gp, j [n] is in [0, N ⫺ 1].
p
A node at depth j and position p in the binary tree corresponds to space Wj
generated by the discrete local cosine family
2
1 n ⫺ ap, j
p
Bj ⫽ gp, j [n]
.
cos k ⫹
2⫺j N
2 2⫺j N
⫺j
0⭐k⬍N 2
Since g0,0 ⫽ 1[0,N ⫺1] ,the space W00 at the root of the tree includes any signal defined
p
over 0 ⭐ n ⬍ N , so W00 ⫽ CN. As in Theorem 8.18 we verify that Wj is orthogonal to
q
Wj for p ⫽ q and that
p
2p
2p⫹1
Wj ⫽ Wj⫹1 ⊕ Wj⫹1 .
(8.119)
p
The splitting property (8.119) implies that the union of local cosine families Bj
located at the leaves of an admissible subtree is an orthogonal basis of W00 ⫽ CN .
The minimum window size is limited by 2 ⭐ 2⫺j N , so the maximum depth of
J ⫺1
N
this binary tree is J ⫽ log2 2
. Thus, one can construct more than 22 ⫽ 2N /(4)
different discrete local cosine bases within this binary tree.
Fast Calculations
The fast local cosine transform algorithm described in Section 8.4.4 requires
O(2⫺j N log2 (2⫺j N )) operations to compute the inner products of f with the 2⫺j N
p
vectors in the local cosine family Bj . The total number of operations to perform
these computations at all nodes ( j, p) of the tree, for 0 ⭐ p ⬍ 2 j and 0 ⭐ j ⭐ J , is
therefore O(N J log2 N ). The local cosine decompositions in Figure 8.19 are calculated with this fast algorithm. To improve the right border treatment, Section 8.4.4
explains that the last DCT-IV should be replaced by a DCT-I at each scale 2 j . The
signal f is recovered from the local cosine coefficients at the leaves of any admissible binary tree with the fast local cosine reconstruction algorithm, which needs
O(N log2 N ) operations.
8.5.3 Image Cosine Quad-Tree
A local cosine binary tree is extended in two dimensions into a quad-tree, which
recursively divides square image windows into four smaller windows.This separable
429
430
CHAPTER 8 Wavelet Packet and Local Cosine Bases
approach is similar to the extension of wavelet packet bases in two dimensions,
described in Section 8.2.
Let us consider square images of N pixels. A node of the quad-tree is labeled by
its depth j and two indices p and q. Let gp, j [n] be the discrete one-dimensional window defined in Section 8.5.2. At depth j, a node ( p, q) corresponds to a separable
space
p,q
Wj
p
q
⫽ Wj ⊗ Wj ,
(8.120)
which is generated by a separable local cosine basis of 2⫺2j N vectors
2
1 n1 ⫺ ap, j
p,q
Bj ⫽ gp, j [n1 ] gq, j [n2 ] ⫺j 1/2 cos k1 ⫹
2 N
2 2⫺j N 1/2
1 n2 ⫺ aq, j
cos k2 ⫹
2 2⫺j N 1/2
0⭐k1 ,k2 ⬍2⫺j N 1/2
We know from (8.119) that
p
2p
2p⫹1
Wj ⫽ Wj⫹1 ⊕ Wj⫹1
and
2q
q
p,q
Inserting these equations in (8.120) proves that Wj
orthogonal subspaces:
p,q
Wj
2p,2q
2p⫹1,2q
⫽ Wj⫹1 ⊕ Wj⫹1
2q⫹1
Wj ⫽ Wj⫹1 ⊕ Wj⫹1 .
2p,2q⫹1
⊕ Wj⫹1
is the direct sum of four
2p⫹1,2q⫹1
⊕ Wj⫹1
.
(8.121)
p,q
Space Wj at node ( j, p, q) is therefore decomposed in the four subspaces located
at the four children nodes of the quad-tree. This decomposition can also be interpreted as a division of the square window gp, j [n1 ]gq, j [n2 ] into four subwindows of
equal sizes, as illustrated in Figure 8.21. The space located at the root of the tree is
W00,0 ⫽ W00 ⊗ W00 .
(8.122)
It includes all images of N pixels. Size of the raising and decaying profiles of
1/2
the one-dimensional windows defines the maximum depth J ⫽ log2 N2 of the
quad-tree.
22jN
p, q
Wj
22jN
2p, 2q
Wj11
2p,2q11
Wj11
2p11, 2q
Wj11
2p11, 2q11
Wj11
FIGURE 8.21
p,q
Functions in Wj have a support located in a square region of the image. It is divided into four
subspaces that cover smaller squares in the image.
8.5 Local Cosine Trees
FIGURE 8.22
The grid shows the support of the windows gj,p [n1 ] gj,q [n2 ] of a “best” local cosine basis
selected in the local cosine quad-tree.
Admissible Quad-Trees
An admissible subtree of this local cosine quad-tree has nodes that have either zero or
four children. Applying the decomposition property (8.121) along the branches of
p ,q
an admissible quad-tree proves that spaces Wji i i located at the leaves decompose
W00,0 in orthogonal subspaces. The union of the corresponding two-dimensional
p ,q
local cosine bases Bji i i is therefore an orthogonal basis of W00,0 .We proved in (8.42)
J ⫺1
that there are more than 24 ⫽ 2N /16 different admissible trees of maximum
1/2
depth J ⫽ log2 N2 . These bases divide the image plane into squares of varying
sizes. Figure 8.22 gives an example of image decomposition in a local cosine basis
corresponding to an admissible quad-tree. This local cosine basis is selected with
the best basis algorithm of Section 12.2.2.
Fast Calculations
2
p,q
The decomposition of an image f [n] over a separable local cosine family Bj
requires O(2⫺2j N log2 (2⫺j N )) operations with a separable implementation of the
fast one-dimensional local cosine transform. For a full local cosine quad-tree of depth
J , these calculations are performed for 0 ⭐ p, q ⬍ 2 j and 0 ⭐ j ⭐ J , which requires
O(N J log2 N ) multiplications and additions. The original image is recovered from
the local cosine coefficients at the leaves of any admissible subtree with O(N log2 N )
computations.
431
432
CHAPTER 8 Wavelet Packet and Local Cosine Bases
8.6 EXERCISES
8.1
2 Prove the discrete splitting Theorem 8.6.
8.2
2 Meyer wavelet packets are calculated with a Meyer conjugate mirror filter
8.3
1 Extend the separable wavelet packet tree of Section 8.2.2 for discrete
p
(7.84). Compute the size of the frequency support of ˆ j as a function of 2 j .
Study the convergence of j,n (t) when the scale 2 j goes to ⫹⬁.
p-dimensional signals. Verify that the wavelet packet tree of a p-dimensional
discrete signal of N samples includes O(N log2 N ) wavelet packet coefficients that are calculated with O(K N log2 N ) operations if the conjugate
mirror filter h has K nonzero coefficients.
8.4
2 Anisotropic wavelet packets p [a ⫺ 2L⫺j n ] q [b ⫺ 2L⫺l n ] may have dif1
2
j
l
j
l
ferent scales 2 and 2 along the rows and columns. A decomposition over
such wavelet packets is calculated with a filter bank that filters and subsamples the image rows j ⫺ L times, whereas the columns are filtered and
subsampled l ⫺ L times. For an image f [n] of N pixels, show that a dictionary of anisotropic wavelet packets includes O(N (log2 N )2 ) different
vectors. Compute the number of operations needed to decompose f in this
dictionary.
8.5
2 Hartley transform. Let cas(t) ⫽ cos(t) ⫹ sin(t). We define
2nk
1
B ⫽ gk [n] ⫽ √ cas
.
N
N
0⭐k⬍N
(a) Prove that B is an orthonormal basis of CN .
(b) For any signal f [n] of size N , find a fast Hartley transform algorithm
based on the FFT, which computes { f , gk }0⭐k⬍N with O(N log2 N )
operations.
√
8.6 2 Prove that { 2 sin[(k ⫹ 1/2)t]}k∈Z is an orthonormal basis of L 2 [0, 1].
Find a corresponding discrete orthonormal basis of CN .
√
8.7 2 Prove that { 2 sin(kt)}k∈Z is an orthonormal basis of L 2 [0, 1]. Find a
corresponding discrete orthonormal basis of CN .
8.8
3 Lapped Fourier basis:
(a) Construct a lapped orthogonal basis {g̃p,k }(p,k)∈Z of L 2 (R) from the
Fourier basis {exp(i2kt)}k∈Z of L 2 [0, 1].
(b) Explain why this local Fourier basis does not contradict the Balian-Low
Theorem 5.20.
8.6 Exercises
(c) Let f ∈ L 2 (R) be such that | f̂ ()| ⫽ O((1 ⫹ || p )⫺1 ) for some p ⬎ 0.
Compute the rate of decay of | f , g̃p,k | when the frequency index |k|
increases. Compare it with the rate of decay of | f , gp,k |, where gp,k
is a local cosine vector (8.104). How do the two bases compare for
signal-processing applications?
8.9
3 Describe a fast algorithm to compute the Meyer orthogonal wavelet trans-
form with a lapped transform applied in the Fourier domain. Calculate the
numerical complexity of this algorithm for periodic signals of size N . Compare this result with the numerical complexity of the standard fast wavelet
transform algorithm, where the convolutions with Meyer conjugate mirror
filters are calculated with an FFT.
8.10
3 Mirror wavelets. Let (h, g) be a pair of conjugate mirror filters. A mir-
ror wavelet packet first decomposes the input signal a0 into a1 and d1
by filtering it with h̄[n] ⫽ h[⫺n] and ḡ[n] ⫽ g[⫺n], respectively, and subsampling the output. The signals a1 and d1 are subdecomposed with an
orthogonal wavelet transform filter bank tree. Decomposing a1 and d1 with
a cascade of j ⫺ 2 filtering with h̄ and a filtering with ḡ yields wavelet coefficients a0 , j,m and wavelet packet coefficients that we write as a0 , ˜ j,m .
Prove that the discrete Fourier transform of these mirror wavelets satisfies
&
j,m [N /2 ⫺ k]|. Show that if the filters h̄ and ḡ that decom|˜ j,m [k]| ⫽ |&
pose d1 are replaced by h and g, then the resulting mirror wavelets satisfy
˜ j,m [n] ⫽ (⫺1)n⫺1 j,m [1 ⫺ n].
8.11
8.12
2 Arbitrary Walsh tilings:
p
p⬘
(a) Prove that two Walsh wavelet packets j,n and j⬘,n⬘ are orthogonal if
their Heisenberg boxes, defined in Section 8.1.2, do not intersect in the
time-frequency plane [71].
(b) A dyadic tiling of the time-frequency plane is an exact cover {[2 j n,
2 j (n ⫹ 1)] ⫻ [k2⫺j , (k ⫹ 1)2⫺j ]}( j,n,k)∈I , where the index set I is
adjusted to guarantee that the time-frequency boxes do not intersect
and that they leave no hole. Prove that any such tiling corresponds to a
p
Walsh orthonormal basis of L 2 (R) {j,n }( p, j,n)∈I .
3 Double tree. We want to construct a dictionary of block wavelet packet
bases, which has the freedom to segment both the time and frequency axes.
For this purpose, as in a local cosine basis dictionary, we construct a binary
tree, which divides [0, 1] in 2 j intervals [ p2⫺j , ( p ⫹ 1)2⫺j ] that correspond
to nodes indexed by p at the depth j of the tree. At each of these nodes,
we construct another tree of wavelet packet orthonormal bases of L 2 [ p2⫺j ,
( p ⫹ 1)2⫺j ] [299].
(a) Define admissible subtrees in this double tree,with leaves corresponding
to orthonormal bases of L 2 [0, 1]. Give an example of an admissible tree
and draw the resulting tiling of the time-frequency plane.
433
434
CHAPTER 8 Wavelet Packet and Local Cosine Bases
(b) Give a recursive equation that relates the number of admissible subtrees
of depth J ⫹ 1 and of depth J . Give an upper bound and a lower bound
for the total number of orthogonal bases in this double-tree dictionary.
(c) Can one find a basis in a double tree that is well adapted to implement
an efficient transform code for audio signals? Justify your answer.
8.13
3 An anisotropic local cosine basis for images is constructed with rectangu-
lar windows that have a width 2 j that may be different from their height
2l . Similar to a local cosine tree, such bases are calculated by progressively
dividing windows,but the horizontal and vertical divisions of these windows
are done independently. Show that a dictionary of anisotropic local cosine
bases can be represented as a graph. Implement numerically an algorithm
that decomposes images in a graph of anisotropic local cosine bases.
CHAPTER
Approximations in Bases
9
It is time to wonder why we are constructing so many different orthonormal bases.
In signal processing, orthogonal bases are of interest because they can provide
sparse representations of certain types of signals with few vectors. Compression
and denoising are applications studied in Chapters 10 and 11.
Approximation theory studies the error produced by different approximation
schemes. Classic sampling theorems are linear approximations that project the
analog signal over low-frequency vectors chosen a priori in a basis. The discrete
signal representation may be further reduced with a linear projection over the first
few vectors of an orthonormal basis. However, better nonlinear approximations
are obtained by choosing the approximation vectors depending on the signal. In
a wavelet basis, these nonlinear approximations locally adjust the approximation
resolution to the signal regularity.
Approximation errors depend on the signal regularity. For uniformly regular signals,linear and nonlinear approximations perform similarly,whether in a wavelet or
in a Fourier basis. When the signal regularity is not uniform, nonlinear approximations in a wavelet basis can considerably reduce the error of linear approximations.
This is the case for piecewise regular signals or bounded variation signals and images.
Geometric approximations of piecewise regular images with regular edge curves
are studied with adaptive triangulations and curvelets.
9.1 LINEAR APPROXIMATIONS
Analog signals are discretized in Section 3.1.3 with inner products in a basis. In the
following sections we compute the resulting linear approximation error in wavelet
and Fourier bases, which depends on the uniform signal regularity. For signals modeled as realizations of a random vector, in Section 9.1.4 we prove that the optimal
basis is the Karhunen-Loève basis (principal components), which diagonalizes the
covariance matrix.
9.1.1 Sampling and Approximation Error
Approximation errors of linear sampling processes are related to the error of linear
approximations in an orthogonal basis. These errors are computed from the decay
of signal coefficients in this basis.
435
436
CHAPTER 9 Approximations in Bases
¯ s (t) and a uniform
An analog signal f (t) is discretized with a low-pass filter
sampling interval s:
f ¯ s (ns) ⫽
⫹⬁
⫺⬁
¯ s (ns ⫺ u) du ⫽ f (t), s (t ⫺ ns),
f (u)
(9.1)
¯ s (t) ⫽ s (⫺t). Let us consider an analog signal of compact support, normalwith
ized to [0, 1]. At a resolution N corresponding to s ⫽ N ⫺1 , the discretization is
performed over N functions {n (t) ⫽ s (t ⫺ ns)}0⭐n⬍N that are modified at the
boundaries to maintain their support in [0, 1]. They define a Riesz basis of an
approximation space U N ⊂ L 2 [0, 1]. The best linear approximation of f in UN is
the orthogonal projection fN of f in UN , recovered with the biorthogonal basis
˜ n (t)}1⭐n⭐N :
{
N
⫺1
fN (t) ⫽
f , n ˜ n (t).
(9.2)
n⫽0
To compute the approximation error f ⫺ fN , we introduce an orthonormal
basis B ⫽ {gm }m∈N of L 2 [0, 1], with N vectors {gm }0⭐m⬍N defining an orthogonal basis of the same approximation space UN . Fourier and wavelet bases provide
such bases for many classic approximation spaces. The orthogonal projection fN of
f in UN can be decomposed on the first N vectors of this basis:
fN ⫽
N
⫺1
f , gm gm .
m⫽0
Since B is an orthonormal basis of L 2 [0, 1], f ⫽
f ⫺ fN ⫽
⫹⬁
⫹⬁
m⫽0 f , gm gm , so
f , gm gm ,
m⫽N
and the resulting approximation error is
l (N , f ) ⫽ f ⫺ fN 2 ⫽
⫹⬁
| f , gm |2 .
(9.3)
m⫽N
The fact that f 2 ⫽
⫹⬁
m⫽0 | f , gm |
2 ⬍ ⫹⬁ implies that the error decays to zero:
lim l (N , f ) ⫽ 0.
N →⫹⬁
However,the decay rate of l (N , f ) as N increases depends on the decay of | f , gm |
as m increases. Theorem 9.1 gives equivalent conditions on the decay of l (N , f )
and | f , gm |.
9.1 Linear Approximations
Theorem 9.1. For any s ⬎ 1/2, there exists A, B ⬎ 0 such that if
| f , gm |2 ⬍ ⫹⬁, then
A
⫹⬁
m2s | f , gm |2 ⭐
⫹⬁
N 2s⫺1 l (N , f ) ⭐ B
N ⫽0
m⫽0
⫹⬁
⫹⬁
m⫽0 |m|
m2s | f , gm |2 ,
2s
(9.4)
m⫽0
and thus l (N , f ) ⫽ o(N ⫺2s ).
Proof. By inserting (9.3), we compute
⫹⬁
N 2s⫺1 l (N , f ) ⫽
N ⫽0
⫹⬁
⫹⬁ N 2s⫺1 | f , gm |2 ⫽
N ⫽0 m⫽N
⫹⬁
| f , gm |2
m
N 2s⫺1 .
N ⫽0
m⫽0
For any s ⬎ 1/2,
m
x 2s⫺1 dx ⭐
0
m
m
N 2s⫺1 ⭐
m⫹1
x 2s⫺1 dx,
1
N ⫽0
N 2s⫺1 ∼ m2s and thus proves (9.4).
which implies that N ⫽0
To verify that l (N , f ) ⫽ o(N ⫺2s ), observe that l (m, f ) ⭓ l (N , f ) for m ⭐ N , so
l (N , f )
N
⫺1
m⫽N /2
Since
⫹⬁
m⫽1 m
m2s⫺1 ⭐
N
⫺1
m2s⫺1 l (m, f ) ⭐
m⫽N /2
⫹⬁
m2s⫺1 l (m, f ).
(9.5)
m⫽N /2
2s⫺1 (m, f ) ⬍ ⫹⬁, it follows that
l
lim
N →⫹⬁
⫹⬁
m2s⫺1 l (m, f ) ⫽ 0.
m⫽N /2
Moreover, there exists C ⬎ 0 such that
limN →⫹⬁ l (N , f ) N 2s ⫽ 0.
N ⫺1
m⫽N /2 m
2s⫺1 ⭓ C N 2s , so (9.5) implies that
■
This theorem proves that the linear approximation error of f in basis B decays
faster than N ⫺2s if f belongs to the space
⫹⬁
2s
2
WB,s ⫽ f ∈ H :
m | f , gm | ⬍ ⫹⬁ .
m⫽0
One can also prove that this linear approximation is asymptotically optimal over
this space [20]. Indeed, there exists nonlinear or nonlinear approximation scheme
with error decays that are at least like N ⫺␣ , with ␣ ⬎ 2s, for all f ∈ WB,s .
In the next sections we prove that if B is a Fourier or wavelet basis, then
WB,s is a Sobolev space, and therefore that linear approximations of Sobolev functions are optimal in Fourier and wavelet bases. However, we shall also see that
for more complex functions, the linear approximation of f from the first N vectors of B is not always precise because these vectors are not necessarily the best
ones to approximate f . Nonlinear approximations calculated with vectors chosen
adaptively depending on f are studied in Section 9.2.
437
438
CHAPTER 9 Approximations in Bases
9.1.2 Linear Fourier Approximations
The Shannon-Whittaker sampling theorem performs a perfect low-pass filter that
keeps the signal low frequencies. Thus, it is equivalent to a linear approximation
over the lower frequencies of a Fourier basis. Linear Fourier approximations are
asymptotically optimal for uniformly regular signals. The approximation error is
related to the Sobolev differentiability. It is also calculated for nonuniform regular
signals, such as discontinuous signals having a bounded total variation.
Theorem 3.6 proves (modulo a change of variable) that {ei2mt }m∈Z is an
orthonormal basis of L 2 [0, 1]. Thus, we can decompose f ∈ L 2 [0, 1] in the Fourier
series
f (t) ⫽
⫹⬁
f (u), ei2mu ei2mt ,
(9.6)
m⫽⫺⬁
with
f (u), ei2mu ⫽
1
f (u) e⫺i2mu du.
0
The decomposition (9.6) defines a periodic extension of f for all t ∈ R. The decay of
the Fourier coefficients | f (u), ei2mu | as m increases depends on the regularity
of this periodic extension.
The linear approximation of f ∈ L 2 [0, 1] by the N sinusoids of lower frequencies
is obtained by a linear filtering that sets all higher frequencies to zero:
fN (t) ⫽
f (u), ei2mu ei2mt .
|m|⭐N /2
It projects f in space UN of functions having Fourier coefficients that are zero
above the frequency N .
Error Decay versus Sobolev Differentiability
The decay of the linear Fourier approximation error depends on the Sobolev differentiability. The regularity of f can be measured by the number of times it is
differentiable. Sobolev differentiability extends derivatives to nonintegers with a
Fourier decay condition. To avoid boundary issues, we first consider functions f (t)
defined for all t ∈ R.
Recall that the Fourier transform of the derivative f ⬘(t) is i f̂ ().The Plancherel
formula proves that f ⬘ ∈ L 2 (R) if
⫹⬁
⫹⬁
||2 | f̂ ()|2 d ⫽ 2
| f ⬘(t)|2 dt ⬍ ⫹⬁.
⫺⬁
⫺⬁
This suggests replacing the usual pointwise definition of the derivative by a definition
based on the Fourier transform. We say that f ∈ L 2 (R) is differentiable in the sense
of Sobolev if
⫹⬁
||2 | f̂ ()|2 d ⬍ ⫹⬁.
(9.7)
⫺⬁
9.1 Linear Approximations
This integral imposes that | f̂ ()| has a sufficiently fast decay when the frequency
goes to ⫹⬁. As in Section 2.3.1, the regularity of f is measured from the asymptotic
decay of its Fourier transform.
This definition is generalized for any s ⬎ 0. Space W s (R) of s times differentiable
Sobolev functions is the space of functions f ∈ L 2 (R) having a Fourier transform
that satisfies [67]
⫹⬁
||2s | f̂ ()|2 d ⬍ ⫹⬁.
(9.8)
⫺⬁
If s ⬎ n ⫹ 1/2, then one can verify (Exercise 9.7) that f is n times continuously
differentiable.
Let W s [0, 1] be the space of functions in L 2 [0, 1] that can be extended outside
[0, 1] into a function f ∈ W s (R). To avoid border problems at t ⫽ 0 or at t ⫽ 1, let us
consider functions f that have supports that are strictly included in (0, 1). A simple
regular extension on R is obtained by setting its value to 0 outside [0, 1], and f ∈
W s [0, 1] if this extension is in W s (R). In this case,one can prove (not trivial) that the
Sobolev integral condition (9.8) reduces to a discrete sum,meaning that f ∈ W s [0, 1]
if and only if
⫹⬁
|m|2s | f (u), ei2mu |2 ⬍ ⫹⬁.
(9.9)
m⫽⫺⬁
For such differentiable functions in the sense of Sobolev, Theorem 9.2 computes the approximation error
1
l (N , f ) ⫽ f ⫺ fN 2 ⫽
| f (t) ⫺ fN (t)|2 dt ⫽
| f (u), ei2mu |2 .
(9.10)
0
|m|⬎N /2
Theorem 9.2. Let f ∈ L 2 [0, 1] be a function with support strictly included in (0, 1). Then
f ∈ W s [0, 1] if and only if
⫹⬁
N ⫽1
N 2s
l (N , f )
⬍ ⫹⬁,
N
which implies l (N , f ) ⫽ o(N ⫺2s ).
(9.11)
■
The proof relies on the fact that functions in W s [0, 1] with a support in (0, 1) are
characterized by (9.9). Therefore, this theorem is a consequence of Theorem 9.1.
Thus, the linear Fourier approximation decays quickly if and only if f has a large
regularity exponent s in the sense of Sobolev.
Discontinuities and Bounded Variation
If f is discontinuous, then f ∈
/ W s [0, 1] for any s ⬎ 1/2. Thus,Theorem 9.2 proves
that l (N , f ) can decay like N ⫺␣ only if ␣ ⭐ 1. For bounded variation functions,
439
440
CHAPTER 9 Approximations in Bases
which are introduced in Section 2.3.3, Theorem 9.3 proves that l (N , f ) ⫽ O(N ⫺1 ).
A function has a bounded variation if
1
f V ⫽
| f ⬘(t)| dt ⬍ ⫹⬁.
0
The derivative must be taken in the sense of distributions because f may be discontinuous. If f ⫽ 1[0,1/2] ,then f V ⫽ 2. Recall that a[N ] ∼ b[N ] if a[N ] ⫽ O(b[N ])
and b[N ] ⫽ O(a[N ]).
Theorem 9.3.
■
If f V ⬍ ⫹⬁, then l (N , f ) ⫽ O( f 2V N ⫺1 ).
■
If f ⫽ C 1[0,1/2] , then l (N , f ) ∼ f 2V N ⫺1 .
Proof. If f V ⬍ ⫹⬁, then
1
| f (u), exp(i2mu)| ⫽ f (u) exp(⫺i2mu) du
0
1
exp(⫺i2mu) f V
f ⬘(u)
dt ⭐
.
⫽ ⫺i2m
2|m|
0
Thus,
l (N , f ) ⫽
| f (u), exp(i2mu)|2 ⭐
|m|⬎N /2
f 2V
4 2
|m|⬎N /2
1
⫽ O( f 2V N ⫺1 ).
m2
If f ⫽ C 1[0,1/2] , then f V ⫽ 2C and
| f (u), exp(i2mu)| ⫽
0
if m ⫽ 0 is even
C/( |m|) if m is odd,
so l (N , f ) ∼ C 2 N ⫺1 .
■
This theorem shows that when f is discontinuous with bounded variations,then
l (N , f ) decays typically like N ⫺1 . Figure 9.1(b) shows a bounded variation signal
approximated by Fourier coefficients of lower frequencies.The approximation error
is concentrated in the neighborhood of discontinuities where the removal of high
frequencies creates Gibbs oscillations (see Section 2.3.1).
Localized Approximations
To localize Fourier series approximations over intervals, we multiply f by smooth
windows that cover each of these intervals. The Balian-Low theorem (5.20) proves
that one cannot build local Fourier bases with smooth windows of compact support. However, in Section 8.4.2 we construct orthonormal bases by replacing
complex exponentials by cosine functions. For appropriate windows gp of compact support [ap ⫺ p , ap⫹1 ⫹ p⫹1 ], Corollary 8.1 constructs an orthonormal basis
of L 2 (R):
2
1 t ⫺ ap
gp,k (t) ⫽ gp (t)
cos k ⫹
.
lp
lp
2
k∈N,p∈Z
9.1 Linear Approximations
f (t)
40
20
0
⫺20
t
0
0.2
0.4
0.6
0.8
1
0.6
0.8
1
0.6
0.8
1
(a)
fN (t)
40
20
0
⫺20
t
0
0.2
0.4
(b)
fN (t)
40
20
0
⫺20
t
0
0.2
0.4
(c)
FIGURE 9.1
(a) Original signal f . (b) Signal fN approximated from N ⫽ 128 lower-frequency Fourier
coefficients with f ⫺ fN / f ⫽ 8.63 10⫺2 . (c) Signal fN approximated from larger-scale
Daubechies 4 wavelet coefficients with N ⫽ 128 and f ⫺ fN / f ⫽ 8.58 10⫺2 .
Writing f in this local cosine basis is equivalent to segmenting it into several
windowed components fp (t) ⫽ f (t) gp (t), which are decomposed in a cosine IV
basis. If gp is C⬁ , the regularity of gp (t) f (t) is the same as the regularity of f over
[ap ⫺ p , ap⫹1 ⫹ p⫹1 ]. Section 8.3.2 relates cosine IV coefficients to Fourier series
coefficients. It follows fromTheorem 9.2 that if fp ∈ W s (R),then the approximation
fp,N ⫽
N
⫺1
f , gp,k gp,k
k⫽0
yields an error of
l (N , fp ) ⫽ fp ⫺ fp,N 2 ⫽ o(N ⫺2s ).
Thus,the approximation error in a local cosine basis depends on the local regularity
of f over each window support.
441
442
CHAPTER 9 Approximations in Bases
9.1.3 Multiresolution Approximation Errors with Wavelets
Wavelets are constructed as bases of orthogonal complements of multiresolution
approximation spaces.Thus,the projection error on multiresolution approximation
spaces depends on the decay of wavelet coefficients. Linear approximations with
wavelets behave essentially like Fourier approximations but with a better treatment
of boundaries. They are also asymptotically optimal for uniformly regular signals.
The linear error decay is computed for Sobolev differentiable functions and for
uniformly Lipschitz ␣ functions.
Uniform Approximation Grid
Section 7.5 constructs multiresolution approximation spaces UN ⫽ VL of L 2 [0, 1]
with their orthonormal basis of N ⫽ 2⫺L scaling functions {L,n (t)}0⭐n⬍2⫺l . These
scaling functions L,n (t) ⫽ L (t ⫺ 2L n) with L (t) ⫽ 2⫺L/2 (2⫺L t) are finite elements translated over a uniform grid, modified near 0 and 1 so that their support
remains in [0, 1]. The resulting projection of f in such a space is
fN ⫽ P V L f ⫽
2⫺L
⫺1
f , L,n L,n ,
(9.12)
n⫽0
and
f , L,n ⫽
f (t) L (t ⫺ 2L n) dt ⫽ f ¯ L (ns)
with
¯ L (t) ⫽ L (⫺t).
A different orthogonal basis of VL is obtained from wavelets at scales 2 j ⬎ 2L and
scaling functions at a large scale 2 J :
{J ,n }0⭐n⬍2⫺J , {j,n }l⬍j⭐J , 0⭐n⬍2⫺j .
(9.13)
The approximation (9.12) can also be written as a wavelet approximation:
⫺j
fN ⫽ P V L f ⫽
J
2
⫺1
f , j,n j,n ⫹
2⫺J
⫺1
j⫽L⫹1 n⫽0
f , J ,n J ,n .
(9.14)
n⫽0
Since wavelets define an orthonormal basis of L 2 [0, 1],
{J ,n }0⭐n⬍2⫺J , {j,n }⫺⬁⬍j⭐J , 0⭐n⬍2⫺j ,
(9.15)
the approximation error is the energy of wavelet coefficients at scales smaller
than 2L :
⫺j
l (N , f ) ⫽ f ⫺ fN 2 ⫽
L 2
⫺1
| f , j,n |2 .
(9.16)
j⫽⫺⬁ n⫽0
In the following we suppose that wavelets j,n are Cq (q times differentiable)
and have q vanishing moments. The treatment of boundaries is the key difference
9.1 Linear Approximations
with Fourier approximations. Fourier approximations consider that the signal is
periodic and if f (0) ⫽ f (1),then the approximation of f behaves as if f was discontinuous. The periodic orthogonal wavelet bases in Section 7.5.1 do essentially the
same. To improve this result, wavelets at the boundaries must keep their q vanishing moments, which requires us to modify them near the boundaries so that their
supports remain in [0, 1]. Section 7.5.3 constructs such wavelet bases. They take
advantage of the regularity of f over [0, 1] with no condition on f (0) and f (1). For
mathematical analysis, we only use these wavelets, without explicitly writing their
shape modifications at the boundaries, to simplify notations. In numerical experiments, the folding boundary solution of Section 7.5.2 is more often used because
it has a simpler algorithmic implementation. Folded wavelets have one vanishing
moment at the boundary, which is often sufficient in applications.
Approximation Error versus Sobolev Regularity
Like a Fourier basis,a wavelet basis provides an efficient approximation of uniformly
regular signals. The decay of wavelet linear approximation errors is first related to
the differentiability in the sense of Sobolev. Let W s [0, 1] be the Sobolev space of
functions that are restrictions over [0, 1] of s times differentiable Sobolev function
W s (R) defined over R. If has q vanishing moments, then (6.11) proves that the
wavelet transform is a multiscale differential operator of order q at least. To test the
differentiability of f up to order s, we need q ⬎ s. Theorem 9.4 gives a necessary
and sufficient condition on wavelet coefficients so that f ∈ W s [0, 1].
Theorem 9.4. Let 0 ⬍ s ⬍ q be a Sobolev exponent. A function f ∈ L 2 [0, 1] is in W s [0, 1]
if and only if
⫺j
J 2
⫺1
2⫺2sj | f , j,n |2 ⬍ ⫹⬁.
(9.17)
j⫽⫺⬁ n⫽0
Proof. We give an intuitive justification but not a proof of this result. To simplify, we suppose that the support of f is included in (0, 1). If we extend f by zeros outside [0, 1],
then f ∈ W s (R), which means that
⫹⬁
⫺⬁
||2s | f̂ ()|2 d ⬍ ⫹⬁.
(9.18)
The low-frequency part of this integral always remains finite because f ∈ L 2 (R):
||⭐2⫺J
||2s | f̂ ()|2 d ⭐ 2⫺2sJ 2s
||⭐
| f̂ ()|2 d ⭐ 2⫺2sJ 2s f 2 .
The energy of ˆ j,n is essentially concentrated in the intervals, [⫺2⫺j 2, ⫺2⫺j ] ∪
[2⫺j , 2⫺j 2]. As a consequence,
⫺j ⫺1
2
n⫽0
| f , j,n | ∼
2
2⫺j ⭐||⭐2⫺j⫹1
| f̂ ()|2 d.
443
444
CHAPTER 9 Approximations in Bases
Over this interval || ∼ 2⫺j , so
⫺j ⫺1
2
2⫺2sj | f , j,n |2 ∼
n⫽0
2⫺j ⭐||⭐2⫺j⫹1
||2s | f̂ ()|2 d.
It follows that
⫺j
J 2
⫺1
2⫺2sj | f , j,n |2 ∼
j⫽⫺⬁ n⫽0
||⭓2⫺J
||2s | f̂ ()|2 d,
which explains why (9.18) is equivalent to (9.17).
■
This theorem proves that the Sobolev regularity of f is equivalent to a fast
decay of the wavelet coefficients | f , j,n | when scale 2 j decreases. If has q
vanishing moments but is not q times continuously differentiable, then f ∈ W s [0, 1]
implies (9.17), but the opposite implication is not true. Theorem 9.5 uses the decay
condition (9.17) to characterize the linear approximation error with N wavelets.
Theorem 9.5. Let 0 ⬍ s ⬍ q be a Sobolev exponent. A function f ∈ L 2 [0, 1] is in W s [0, 1]
if and only if
⫹⬁
N ⫽1
N 2s
l (N , f )
⬍ ⫹⬁,
N
(9.19)
which implies l (N , f ) ⫽ o(N ⫺2s ).
Proof. Let us write the wavelets j,n ⫽ gm with m ⫽ 2⫺j ⫹ n. One can verify that the Sobolev
condition (9.17) is equivalent to
⫹⬁
|m|2s | f , gm |2 ⬍ ⫹⬁.
m⫽0
The proof ends by applying Theorem 9.1.
■
If the wavelet has q vanishing moments but is not q times continuously differentiable, then f ∈ W s [0, 1] implies (9.19) but the opposite implication is false.
Theorem 9.5 proves that f ∈ W s [0, 1] if and only if the approximation error l (N , f )
decays slightly faster than N ⫺2s . The wavelet approximation error is of the same
order as the Fourier approximation error calculated in (9.11). However, Fourier
approximations impose that the support of f is strictly included in [0, 1], whereas
wavelet approximation does not impose this condition or any other boundary condition because of the finer wavelet treatment of boundaries previously explained.
Lipschitz Regularity
A different measure of uniform regularity is provided by Lipschitz exponents,which
compute the error of a local polynomial approximation. A function f is uniformly
9.1 Linear Approximations
Lipschitz ␣ over [0, 1] if there exists K ⬎ 0, such that for any v ∈ [0, 1], one can find
a polynomial pv of degree ␣ such that
᭙t ∈ [0, 1],
| f (t) ⫺ pv (t)| ⭐ K |t ⫺ v|␣ .
(9.20)
The infimum of the K that satisfy (9.20) is the homogeneous Hölder ␣ norm f C̃␣ .
The Hölder ␣ norm of f also imposes that f is bounded:
f C␣ ⫽ f C̃␣ ⫹ f ⬁ .
(9.21)
Space C␣ [0, 1] of functions f such that f C␣ ⬍ ⫹⬁ is called a Hölder space.
Theorem 9.6 characterizes the decay of wavelet coefficients.
Theorem 9.6. There exists B ⭓ A ⬎ 0 such that
A f C̃␣ ⭐
sup
j⭓J ,0⭐n⬍2⫺j
2⫺j(␣⫹1/2) | f , j,n | ⭐ B f C̃␣ .
(9.22)
Proof. The proof of the equivalence between uniform Lipschitz regularity and the coefficient decay of a continuous wavelet transform is given in Theorem 6.3. This theorem
gives a nearly equivalent result in the context of orthonormal wavelet coefficients that
correspond to a sampling of a continuous wavelet transform computed with the same
mother wavelet. Thus, the theorem proof is an adaptation of the proof of Theorem 6.3.
This is illustrated by proving the right inequality of (9.22).
If f is uniformly Lipschitz ␣ on the support of j,n , since j,n is orthogonal the
polynomial p2 j n , approximating f at v ⫽ 2 j n yields
| f , j,n | ⫽ | f ⫺ p2 j n , j,n | ⭐ f C̃␣
2⫺j/2 |(2⫺j (t ⫺ 2 j n))| |t ⫺ 2 j n|␣ dt. (9.23)
With a change of variable, we get
| f , j,n | ⭐ f C̃␣ 2 j(␣⫹1/2)
|(t)| |t|␣ dt,
which proves the right inequality of (9.22). Observe that we do not use the wavelet
regularity in this proof.
The left inequality of (9.22) is proved by following the continuous wavelet transform theorem (6.3) steps and replacing integrals by discrete sums over the position
and scale of orthogonal wavelets. The regularity of wavelets plays an important role as
shown by the proof of Theorem of 6.3. In this case, there is no boundary issue because
wavelets are adapted to the interval [0, 1] and keep their vanishing moments at the
boundaries.
■
Similar to Theorem 9.4 for Sobolev differentiability, this theorem proves that
uniform Lipschitz regularity is characterized by the decay of orthogonal wavelet
coefficients when the scale 2 j decreases. Hölder and Sobolev spaces belong to the
larger class of Besov spaces, defined in Section 9.2.3.
If has q vanishing moments but is not q times continuously differentiable,
then the proof of Theorem 9.6 shows that the right inequality of (9.22) is valid.
445
446
CHAPTER 9 Approximations in Bases
Theorem 9.7 derives the decay of linear approximation errors for wavelets having
q vanishing moments but that are not necessarily Cq .
Theorem 9.7. If f is uniformly Lipschitz 0 ⬍ ␣ ⭐ q over [0, 1], then l (N , f ) ⫽
O( f 2C̃␣ N ⫺2␣ ).
Proof. Theorem 9.6 proves that
| f , j,n | ⭐ B f C̃␣ 2 j(␣⫹1/2) .
(9.24)
There are 2⫺j wavelet coefficients at scale 2 j , so there are 2⫺k wavelet coefficients at
scales 2 j ⬎ 2k . The right inequality (9.22) implies that
⫺j
l (2⫺k , f ) ⫽
k 2
⫺1
| f , j,n |2 ⭐ B2
j⫽⫺⬁ n⫽0
k
2⫺j 2 j(2␣⫹1) ⫽
j⫽⫺⬁
B2 f 2C̃␣ 22␣k
1 ⫺ 2⫺2␣
For k ⫽ ⫺ log2 N , we derive that l (N , f ) ⫽ O( f 2 ␣ 22␣k ) ⫽ O( f 2 ␣ N ⫺2␣ ).
C̃
C̃
.
■
Discontinuity and Bounded Variation
If f is not uniformly regular, then linear wavelet approximations perform poorly.
If f has a discontinuity in (0, 1), then f ∈
/ W s [0, 1] for s ⬎ 1/2, so Theorem 9.5
proves that we cannot have l (N , f ) ⫽ O(N ⫺␣ ) for ␣ ⬎ 1.
If f has a bounded total variation norm f V ,thenTheorem 9.14 will prove that
wavelet approximation errors satisfy l (N , f ) ⫽ O( f 2V N ⫺1 ). The same Fourier
approximation result was obtained in Theorem 9.3. If f ⫽ C 1[0,1/2] , then one can
verify that wavelet approximation gives l (N , f ) ∼ f 2V N ⫺1 (Exercise 9.10).
Figure 9.1 gives an example of a discontinuous signal with bounded variation
that is approximated by its larger-scale wavelet coefficients. The largest-amplitude
errors are in the neighborhood of singularities, where the scale should be refined.
The relative approximation error f ⫺ fN / f ⫽ 8.56 10⫺2 is almost the same as in
a Fourier basis.
9.1.4 Karhunen-Loève Approximations
Suppose that signals are modeled as realizations of a random process F . We prove
that the basis that minimizes the average linear approximation error is the KarhunenLoève basis,which diagonalizes the covariance operator of F .To avoid the subtleties
of diagonalizing an infinite dimensional operator, we consider signals of finite
dimension P, which means that F [n] is a random vector of size P.
Section A.6 in the Appendix reviews the covariance properties of random vectors. If F [n] does not have a zero mean, we subtract the expected value E{F [n]}
from F [n] to get a zero mean. The random vector F can be decomposed in an
orthogonal basis {gm }0⭐m⬍P :
F⫽
P⫺1
m⫽0
F , gm gm .
9.1 Linear Approximations
Each coefficient
F , gm ⫽
P⫺1
∗
F [n] gm
[n]
n⫽0
is a random variable (see Section A.6 in the Appendix). The approximation from the
first N vectors of the basis is the orthogonal projection on the space UN generated
by these vectors:
FN ⫽
N
⫺1
F , gm gm .
m⫽0
The resulting mean-square error is
P⫺1
E{l (N , F )} ⫽ E F ⫺ FN 2 ⫽
E |F , gm |2 .
m⫽N
The error is related to the covariance of F defined by
RF [n, m] ⫽ E{F [n] F ∗ [m]}.
Let KF be the covariance operator represented by this matrix. It is symmetric and
positive and is thus diagonalized in an orthogonal basis called a Karhunen-Loève
basis. This basis is not unique if several eigenvalues are equal. Theorem 9.8 proves
that a Karhunen-Loève basis is optimal for linear approximations.
Theorem 9.8. For all N ⭓ 1, the expected approximation error
E{l (N , F )} ⫽ E{F ⫺ FN 2 } ⫽
P⫺1
E{|F , gm |2 }
m⫽N
is minimum if and only if {gm }0⭐m⬍P is a Karhunen-Loève basis that diagonalizes the
covariance KF of F with vectors indexed in decreasing eigenvalue order:
KF gm , gm ⭓ KF gm⫹1 , gm⫹1 for
0 ⭐ m ⬍ P ⫺ 1.
Proof. Let us first observe that
P⫺1
E{l (F , N )} ⫽
KF gm , gm ,
m⫽N
because for any vector z[n],
P⫺1 P⫺1
E |F , z|2 ⫽ E
F [n] F [m] z[n] z ∗ [m]
n⫽0 m⫽0
⫽
P⫺1
P⫺1
RF [n, m] z[n] z ∗ [m]
n⫽0 m⫽0
⫽ KF z, z.
(9.25)
447
448
CHAPTER 9 Approximations in Bases
We now prove that (9.25) is minimum if the basis diagonalizes KF . Let us consider
an arbitrary orthonormal basis {hm }0⭐m⬍P . The trace tr(KF ) of KF is independent of the
basis:
tr(KF ) ⫽
P⫺1
KF hm , hm .
m⫽0
N ⫺1
Thus, the basis that minimizes P⫺1
m⫽N KF hm , hm , maximizes
m⫽0 KF hm , hm .
Let {gm }0⭐m⬍P be a basis that diagonalizes KF :
2
KF gm ⫽ m
gm
with
2
2
m
⭓ m⫹1
0 ⭐ m ⬍ P ⫺ 1.
for
The theorem is proved by verifying that for all N ⭓ 0,
N
⫺1
KF hm , hm ⭐
m⫽0
N
⫺1
KF gm , gm ⫽
m⫽0
N
⫺1
2
m
.
m⫽0
To relate KF hm , hm to the eigenvalues {i2 }0⭐i⬍P , we expand hm in the basis
{gi }0⭐i⬍P :
KF hm , hm ⫽
P⫺1
|hm , gi |2 i2 .
(9.26)
i⫽0
Thus,
N
⫺1
KF hm , hm ⫽
m⫽0
N
⫺1 P⫺1
|hm , gi |2 i2 ⫽
m⫽0 i⫽0
P⫺1
qi i2
i⫽0
with
0 ⭐ qi ⫽
N
⫺1
|hm , gi |2 ⭐ 1
and
m⫽0
P⫺1
qi ⫽ N .
i⫽0
We evaluate
N
⫺1
KF hm , hm ⫺
m⫽0
N
⫺1
i⫽0
⫽
P⫺1
qi i2 ⫺
i⫽0
P⫺1
qi i2 ⫺
i⫽0
⫽
i2 ⫽
N
⫺1
N
⫺1
i2
i⫽0
i2 ⫹ N2 ⫺1 N ⫺
i⫽0
P⫺1
qi
i⫽0
N
⫺1
P⫺1
i⫽0
i⫽N
(i2 ⫺ N2 ⫺1 ) (qi ⫺ 1) ⫹
qi (i2 ⫺ N2 ⫺1 ).
Since the eigenvalues are listed in order of decreasing amplitude, it follows that
N
⫺1
m⫽0
KF hm , hm ⫺
N
⫺1
m⫽0
2
m
⭐ 0.
9.1 Linear Approximations
Suppose that this last inequality is an equality. We finish the proof by showing that
{hm }0⭐m⬍P must be a Karhunen-Loève basis. If i ⬍ N , then i2 ⫽ N2 ⫺1 implies qi ⫽ 1. If
i ⭓ N , then i2 ⫽ N2 ⫺1 implies qi ⫽ 0. This is valid for all N ⭓ 0 if hm , gi ⫽ 0 only when
2 . This means that the change of basis is performed inside each eigenspace of K ,
i2 ⫽ m
F
so {hm }0⭐m⬍P also diagonalizes KF .
■
The eigenvectors gm of the covariance matrix are called principal components.
Theorem 9.8 proves that a Karhunen-Loève basis yields the smallest expected linear
error when approximating a class of signals by their projection on N orthogonal
vectors.
Theorem 9.8 has a simple geometrical interpretation.The realizations of F define
a cloud of points in CP .The density of this cloud specifies the probability distribution
of F .The vectors gm of the Karhunen-Loève basis give the directions of the principal
2 correspond to directions g along which
axes of the cloud. Large eigenvalues m
m
the cloud is highly elongated. Theorem 9.8 proves that projecting the realizations
of F on these principal components yields the smallest average error. If F is a
Gaussian random vector, the probability density is uniform along ellipsoids with
axes proportional to m in the direction of gm . Thus, these principal directions are
truly the preferred directions of the process.
Random-Shift Processes
If the process is not Gaussian, its probability distribution can have a complex geometry, and a linear approximation along the principal axes may not be efficient.
As an example, we consider a random vector F [n] of size P that is a random-shift
modulo P of a deterministic signal f [n] of zero mean, P⫺1
n⫽0 f [n] ⫽ 0:
F [n] ⫽ f [(n ⫺ Q) mod P].
(9.27)
Shift Q is an integer random variable with a probability distribution that is uniform
on [0, P ⫺ 1]:
1
Pr(Q ⫽ p) ⫽
for 0 ⭐ p ⬍ P.
P
This process has a zero mean:
1 f [(n ⫺ p) mod P] ⫽ 0,
P p⫽0
P⫺1
E{F [n]} ⫽
and its covariance is
1 f [(n ⫺ p) mod P] f [(k ⫺ p) mod P]
P p⫽0
P⫺1
RF [n, k] ⫽ E{F [n]F [k]} ⫽
⫽
1
f
f̄ [n ⫺ k]
P
with
f̄ [n] ⫽ f [⫺n].
Thus, RF [n, k] ⫽ RF [n ⫺ k] with
RF [k] ⫽
1
f
f̄ [k].
P
(9.28)
449
450
CHAPTER 9 Approximations in Bases
Since RF is P periodic, F is a circular stationary random vector, as defined in
Section A.6 in the Appendix. The covariance operator KF is a circular convolution
diagonalized in the discrete Fourier Karhunen-Loève basis
with RF , and
is therefore
.
The eigenvalues are given by the Fourier transform of RF :
{ √1 exp i2mn
}
0⭐m⬍P
P
P
2
m
⫽ R̂F [m] ⫽
1
| f̂ [m]|2 .
P
(9.29)
Theorem 9.8 proves that a linear approximation yields a minimum error in this
Fourier basis.To better understand this result,let us consider an extreme case where
f [n] ⫽ ␦[n] ⫺ ␦[n ⫺ 1]. Theorem 9.8 guarantees that the Fourier Karhunen-Loève
basis produces a smaller expected approximation error than does a canonical basis
of Diracs {gm [n] ⫽ ␦[n ⫺ m]}0⭐m⬍P . Indeed,we do not know a priori the abscissa of
the nonzero coefficients of F , so there is no particular Dirac that is better adapted
to perform the approximation. Since the Fourier vectors cover the whole support
of F , they always absorb part of the signal energy:
2 i2mn
1
⫽ R̂F [m] ⫽ 4 sin2 k .
E F [n], √ exp
P
P
P
P
Therefore, selecting N higher-frequency Fourier coefficients yields a better meansquare approximation than choosing a priori N Dirac vectors to perform the
approximation.
The linear approximation of F in a Fourier basis is not efficient because all the
eigenvalues R̂F [m] have the same order of magnitude. A simple nonlinear algorithm can improve this approximation. In a Dirac basis, F is exactly reproduced by
selecting the two Diracs corresponding to the largest-amplitude coefficients having
positions Q and Q ⫺ 1 that depend on each realization of F . A nonlinear algorithm
that selects the largest-amplitude coefficient for each realization of F is not efficient
in a Fourier basis. Indeed,the realizations of F do not have their energy concentrated
over a few large-amplitude Fourier coefficients. This example shows that when F is
not a Gaussian process, a nonlinear approximation may be much more precise than
a linear approximation, and the Karhunen-Loève basis is no longer optimal.
9.2 NONLINEAR APPROXIMATIONS
Digital images or sounds are signals discretized over spaces of a large dimension N ,
because linear approximation error has a slow decay. Digital camera images have
N ⭓ 106 pixels, whereas one second of a CD recording has N ⫽ 40 103 samples.
Sparse signal representations are obtained by projecting such signals over less vectors selected adaptively in an orthonormal basis of discrete signals in CN . This is
equivalent to performing a nonlinear approximation of the input analog signal in a
basis of L 2 [0, 1].
9.2 Nonlinear Approximations
Section 9.2.1 analyzes the properties of the resulting nonlinear approximation
error. Sections 9.2.2 and 9.2.3 prove that nonlinear wavelet approximations are
equivalent to adaptive grids,and can provide sparse representations of signals including singularities. Approximations of functions in Besov spaces and with bounded
variations are studied in Section 9.2.3.
9.2.1 Nonlinear Approximation Error
The discretization of an input analog signal f computes N sample values
{ f , n }0⭐n⬍N that specify the projection fN of f over an approximation space UN
of dimension N . A nonlinear approximation further approximates this projection
over a basis providing a sparse representation.
Let B ⫽ {gm }m∈N be an orthonormal basis of L 2 [0, 1] or L 2 [0, 1]2 , with the first
N vectors defining a basis of UN . The orthogonal projection in UN can be written as
fN (x) ⫽
N
⫺1
f , gm gm (x),
m⫽0
and the linear approximation error is
f ⫺ fN 2 ⫽
⫹⬁
| f , gm |2 .
n⫽N
Let us reproject fN over a subset of M ⬍ N vectors {gm }m∈⌳ with ⌳ ⊂ [0, N ⫺ 1]:
f⌳ (x) ⫽
f , gm gm (x).
m∈⌳
The approximation error is the sum of the remaining coefficients:
fN ⫺ f⌳ 2 ⫽
| f , gm |2 .
(9.30)
m∈⌳
/
The approximation set that minimizes this error is the set ⌳T of M vectors corresponding to the largest inner-product amplitude | f , gm |, and thus above a
threshold T that depends on M:
⌳T ⫽ {m : 0 ⭐ m ⬍ N , | f , gm | ⭓ T }
with
|⌳T | ⫽ M.
(9.31)
The minimum approximation error is the energy of coefficients below T :
n (M, f ) ⫽ fN ⫺ f⌳T 2 ⫽
| f , gm |2 .
m∈⌳
/ T
In the following, we often write that fM ⫽ f⌳T is the best M-term approximation.
The overall error is the sum of the linear error when projecting f on UN and the
nonlinear approximation error:
n (M, f ) ⫽ f ⫺ fM 2 ⫽ f ⫺ fN 2 ⫹ fN ⫺ fM 2 .
(9.32)
451
452
CHAPTER 9 Approximations in Bases
If N is large enough so that all coefficients above T are in the first N ,
T ⭓ max | f , gm |
|m|⭓N
and hence N ⬎ arg max{| f , gm | ⭓ T },
m
(9.33)
then the M largest signal coefficients are among the first N and the nonlinear error
(9.32) is the minimum error obtained from M coefficients chosen anywhere in the
infinite basis B ⫽ {gm }m∈N . In this case, the linear approximation space UN and fN
do not play any explicit role in the error n (M, f ). If | f , gm | ⭐ C m⫺ for some
 ⬎ 0, then we can choose N ⭓ C  T  . In the following, this condition is supposed
to be satisfied.
Discrete Numerical Computations
The linear approximation space UN is important for discrete computations. A nonlinear approximation f⌳T is computed by calculating the nonlinear approximation
of the discretized signal a[n] ⫽ f , n for 0 ⭐ n ⬍ N , and performing a discrete-toanalog conversion.
Since both the discretization family {n }0⭐n⬍N and the approximation basis
{gm }0⭐n⬍N are orthonormal bases of UN , {hm [n] ⫽ gm , n }0⭐m⬍N is an orthonormal basis of CN . Analog signal inner products in L 2 [0, 1] and their discretization in
CN are then equal:
a[n], hm [n] ⫽ f (x), gm (x)
for
0⭐m⬍N.
(9.34)
The nonlinear approximation of the signal a[n] in the basis {hm }0⭐m⬍N of CN is
a⌳T [n] ⫽
a, hm hm [n] with ⌳T ⫽ {m : |a, hm | ⭓ T }.
m∈⌳T
It results from (9.34) that the analog conversion of this discrete signal is the
nonlinear analog approximation:
f⌳T (x) ⫽
N
⫺1
a⌳T [n] n (x) ⫽
n⫽0
N
⫺1
a⌳T [n] n (x) ⫽
n⫽0
f , gm gm (x).
m∈⌳T
The number of operations to compute a⌳T is dominated by the number of operations to compute the N signal coefficients {a, hm }0⭐n⬍N ,which takes O(N log2 N )
operations in a discrete Fourier basis, and O(N ) in a discrete wavelet basis. Thus,
reducing N decreases the number of operations and does not affect the nonlinear approximation error, as long as (9.33) is satisfied. Given this equivalence
between discrete and analog nonlinear approximations, we now concentrate on
analog functions to relate this error to their regularity.
Approximation Error
To evaluate the nonlinear approximation error n (M, f ), the coefficients
{| f , gm |}m∈N are sorted in decreasing order. Let fBr [k] ⫽ f , gmk be the coefficient
of rank k:
| fBr [k]| ⭓ | fBr [k ⫹ 1]|
with
k ⬎ 0.
9.2 Nonlinear Approximations
The best M-term nonlinear approximation computed from the M largest coefficients is:
fM ⫽
M
fBr [k] gmk .
(9.35)
k⫽1
The resulting error is
n (M, f ) ⫽ f ⫺ fM 2 ⫽
⫹⬁
| fBr [k]|2 .
k⫽M⫹1
Theorem 9.9 relates the decay of this approximation error as M increases to the
decay of | fBr [k]| as k increases.
Theorem 9.9. Let s ⬎ 1/2. If there exists C ⬎ 0 such that | fBr [k]| ⭐ C k⫺s , then
n (M, f ) ⭐
C2
M 1⫺2s .
2s ⫺ 1
(9.36)
Conversely, if n (M, f ) satisfies (9.36), then
1 ⫺s
C k⫺s .
| fBr [k]| ⭐ 1 ⫺
2s
(9.37)
Proof. Since
n (M, f ) ⫽
⫹⬁
⫹⬁
| fBr [k]|2 ⭐ C 2
k⫽M⫹1
k⫺2s ,
k⫽M⫹1
and
⫹⬁
k⫺2s ⭐
⫹⬁
M
k⫽M⫹1
x ⫺2s dx ⫽
M 1⫺2s
,
2s ⫺ 1
(9.38)
we derive (9.36).
Conversely, let ␣ ⬍ 1,
n (␣M, f ) ⭓
M
| fBr [k]|2 ⭓ (1 ⫺ ␣) M | fBr [M]|2 .
k⫽␣M⫹1
So if (9.36) is satisfied,
| fBr [M]|2 ⭐
n (␣M, f ) ⫺1
C 2 ␣1⫺2s ⫺2s
M ⭐
M .
1⫺␣
2s ⫺ 1 1 ⫺ ␣
For ␣ ⫽ 1 ⫺ 1/2s, we get (9.37) for k ⫽ M.
■
453
454
CHAPTER 9 Approximations in Bases
l p Spaces
Theorem 9.10 relates the decay of sorted inner products to their p norm
f B,p ⫽
⫹⬁
1/p
| f , gm |
p
.
m⫽0
It derives a decay upper bound of the error n (M, f ).
Theorem 9.10. Let p ⬍ 2. If f B,p ⬍ ⫹⬁, then
| fBr [k]| ⭐ f B,p k⫺1/p
(9.39)
and n (M, f ) ⫽ o(M 1⫺2/p ).
Proof. We prove (9.39) by observing that
p
f B,p ⫽
⫹⬁
| fBr [n]| p ⭓
n⫽1
k
| fBr [n]| p ⭓ k | fBr [k]| p .
n⫽1
To show that n (M, f ) ⫽ o(M 1⫺2/p ), we set
S[k] ⫽
2k⫺1
| fBr [n]| p ⭓ k | fBr [2k]| p .
n⫽k
Thus,
n (M, f ) ⫽
⫹⬁
| fBr [k]|2 ⭐
k⫽M⫹1
⭐ sup |S[k]|2/p
k⬎M/2
p
Since f B,p ⫽
⫹⬁
S[k/2]2/p (k/2)⫺2/p
k⫽M⫹1
⫹⬁
(k/2)⫺2/p.
k⫽M⫹1
⫹⬁
r
p
n⫽1 | fB [n]| ⬍ ⫹⬁, it follows that lim k→⫹⬁ supk⬎M/2 |S[k]| ⫽ 0. Thus,
■
we derive from (9.38) that n (M, f ) ⫽ o(M 1⫺2/p ).
This theorem specifies spaces of functions that are well approximated by a few
vectors of an orthogonal basis B. We denote
BB,p ⫽ f ∈ H : f B,p ⬍ ⫹⬁ .
(9.40)
If f ∈ BB,p , then Theorem 9.10 proves that n (M, f ) ⫽ o(M 1⫺2/p ). This is called a
Jackson inequality [20]. Conversely, if n (M, f ) ⫽ O(M 1⫺2/p ), then the Bernstein
inequality (9.37) for s ⫽ 1/p shows that f ∈ BB,q for any q ⬎ p. Section 9.2.3 studies
the properties of spaces BB,p for wavelet bases.
9.2 Nonlinear Approximations
9.2.2 Wavelet Adaptive Grids
A nonlinear approximation in a wavelet orthonormal basis keeps the largestamplitude coefficients. We saw in Section 6.1.3 that these coefficients occur near
singularities. Thus, wavelet nonlinear approximation defines an adaptive grid that
refines the approximation scale in the neighborhood of the signal sharp transitions.
Such approximations are particularly well adapted to piecewise regular signals. The
precision of nonlinear wavelet approximation is also studied for bounded variation
functions and more general Besov space functions [209].
We consider a wavelet basis adapted to L 2 [0, 1], constructed in Section 7.5.3
with compactly supported wavelets that are Cq with q vanishing moments:
B ⫽ {J ,n }0⭐n⬍2⫺J , {j,n }⫺⬁⬍j⭐J , 0⭐n⬍2⫺j .
To simplify notation, we write J ,n ⫽ J ⫹1,n .
If the analog signal f ∈ L 2 [0, 1] is approximated at the scale 2L with N ⫽
⫺L
2 samples { f , L,n }0⭐n⬍2⫺L , then the corresponding N wavelet coefficients
{ f , j,n }n, j⬎L are computed with O(N ) operations with the fast wavelet transform
algorithm of Section 7.3.1. The best nonlinear approximation of f ∈ L 2 [0, 1] from
M wavelet coefficients above T at scales 2 j ⬎ 2L is
fM ⫽
f , j,n j,n with ⌳T ⫽ {( j, n) : j ⬎ L , | f , j,n | ⭓ T }.
( j,n)∈⌳T
2
The approximation error is n (M, f ) ⫽ ( j,n)∈⌳
/ T | f , j,n | . Theorem 9.11 proves
that if f is bounded, then for a sufficiently large N the approximation support ⌳T
corresponds to all possible wavelet coefficients above T .
Theorem 9.11. If f is bounded, then all wavelets producing coefficients above T are in
an approximation space VL of dimension N ⫽ 2⫺L ⫽ O( f 2⬁ T ⫺2 ).
Proof. If f is bounded, then
1
| f , j,n | ⫽ f (t) 2⫺j/2 (2⫺j t ⫺ n) dt 0
⭐ 2 j/2 sup | f (t)|
t
1
0
(9.41)
|(t)| dt ⫽ 2 j/2 f ⬁ 1 .
⫺2
So, | f , j,n | ⭓ T implies that 2 j ⭓ T 2 f ⫺2
⬁ 1 , which proves the theorem for
⫺2
L
2
⫺2
2 ⫽ T f ⬁ 1 .
■
This theorem shows that for bounded signals, if the discretization 2L is sufficiently small, then the nonlinear approximation error computed from the first N ⫽
2⫺L wavelet coefficients is equal to the approximation error obtained by selecting
the M largest wavelet coefficients in the infinite-dimensional wavelet basis. In the
following, we suppose that this condition is satisfied, and thus do not have to worry
about the discretization scale.
455
456
CHAPTER 9 Approximations in Bases
Piecewise Regular Signals
Piecewise regular signals define a first simple model where nonlinear wavelet
approximations considerably outperform linear approximations. We consider signals with a finite number of singularities and that are uniformly regular between
singularities. Theorem 9.12 characterizes the linear and nonlinear wavelet approximation error decay for such signals.
Theorem 9.12. If f has K discontinuities on [0, 1] and is uniformly Lipschitz ␣ between
these discontinuities, with 1/2 ⬍ ␣ ⬍ q, then
l (M, f ) ⫽ O(K f 2C␣ M ⫺1 )
and n (M, f ) ⫽ O( f 2C␣ M ⫺2␣ ).
(9.42)
Proof. We distinguish type I wavelets j,n for n ∈ Ij , with a support including a point where
f is discontinuous, from type II wavelets for n ∈ IIj , with a support that is included in a
domain where f is uniformly Lipschitz ␣.
Let C be the support size of . At a scale 2 j , each wavelet j,n has a support of size
j
C2 ,translated by 2 j n.Thus,there are at most |Ij | ⭐ C K type I wavelets j,n with supports
that include at least one of the K discontinuities of f . Since f ⬁ ⭐ f C␣ , (9.41) shows
for ␣ ⫽ 0 that there exists B0 such that | f , j,n | ⭐ B0 f C␣ 2 j/2 .
At fine scales 2 j , there are much more type II wavelets n ∈ IIj , but this number |IIj | is
smaller than the total number 2⫺j of wavelets at this scale. Since f is uniformly Lipschitz
␣ on the support of j,n , the right inequality of (9.22) proves that there exists B such
that
| f , j,n | ⭐ B f C␣ 2 j(␣⫹1/2) .
(9.43)
This linear approximation error from M ⫽ 2⫺k wavelets satisfies
⎛
⎞
2
2
⎝
l (M, f ) ⫽
| f , j,n | ⫹
| f , j,n | ⎠
j⭐k
⭐
n∈Ij
n∈IIj
C K B02 f 2C␣ 2 j ⫹ 2⫺j B2 f 2C␣ 2(2␣⫹1)j
j⭐k
⭐ f 2C␣ 2 C K B02 2k ⫹ f 2C␣ (1 ⫺ 2⫺2␣ )⫺1 B2 22␣k .
This inequality proves that for ␣ ⬎ 1/2 the error term of type I wavelets dominates, and
l (M, f ) ⫽ O( f 2C␣ K M ⫺1 ).
To compute the nonlinear approximaton error (M, f ), we evaluate the decay of
ordered wavelet coefficients. Let fBr [k] ⫽ f , jk ,nk be the coefficient of rank k:| fBr [k]| ⭓
r [k] and f r [k] be the values of the wavelet coefficient of
| fBr [k ⫹ 1]| for k ⭓ 1. Let fB,1
B,II
rank k in the type I and type II wavelets.
For l ⬎ 0,there are at most l K C type I coefficients at scales 2 j ⬎ 2⫺l and type I wavelet
coefficients satisfy | f , j,n | ⭐ B0 f C␣ 2⫺l/2 at scales 2 j ⭐ 2⫺l . It results that
fBr ,I [lKC] ⭐ B0 f C␣ 2⫺l/2 ,
r [k] ⫽ O( f ␣ 2⫺k/(2KC) ) has an exponential decay.
so fB,I
C
9.2 Nonlinear Approximations
For l ⭓ 0,there are at most 2l type II wavelet coefficients at scales 2 j ⬎ 2⫺l ,and type II
wavelet coefficients satisfy | f , j,n | ⭐ B f C␣ 2⫺l(␣⫹1/2) at scales 2 j ⭐ 2⫺l . It results that
fBr ,II [2⫺l ] ⭐ B f C␣ 2⫺l(␣⫹1/2) .
r [k] ⫽ O( f ␣ k⫺␣⫺1/2 ) for all k ⬎ 0.
It follows that fB,II
C
Since type I coefficients have a much faster decay than type II coefficients,
putting them together gives fBr [k] ⫽ O( f C␣ k⫺␣⫺1/2 ). From the inequality (9.36) of
Theorem 9.9, it results that n (M, f ) ⫽ O( f 2C␣ M ⫺2␣ ).
■
Although there are few large wavelet coefficients created by the potential K
discontinuities,the theorem proof shows that the linear approximation error is dominated by these discontinuities. On the contrary,these few wavelet coefficients have
a negligible impact on the nonlinear approximation error. Thus, it decays as if there
were no such discontinuities and f was uniformly Lipschitz ␣ over its whole support.
Adaptive Grids
The approximation fM calculated from the M largest-amplitude wavelet coefficients
can be interpreted as an adaptive grid approximation where the approximation scale
is refined in the neighborhood of singularities.
A nonlinear approximation keeps all coefficients above a threshold | f , j,n | ⭓
T . In a region where f is uniformly Lipschitz ␣, since | f , j,n | ∼ A 2 j(␣⫹1/2) , the
coefficients above T are typically at scales
2/(2␣⫹1)
T
j
l
2 ⬎2 ⫽
.
A
Setting all wavelet coefficients below the scale 2l to zero is equivalent to computing
a local approximation of f at the scale 2l . The smaller the local Lipschitz regularity
␣, the finer the approximation scale 2l .
Figure 9.2 shows the nonlinear wavelet approximation of a piecewise regular
signal. Up and down Diracs correspond to positive and negative wavelet coefficients
with an amplitude above T .The largest-amplitude wavelet coefficients are in the cone
of influence of each singularity.The scale–space approximation support ⌳T specifies
the geometry of the signal-sharp transitions. Since the approximation scale is refined
in the neighborhood of each singularity, they are much better restored than in the
fixed-scale linear approximation shown in Figure 9.1. The nonlinear approximation
error in this case is 17 times smaller than the linear approximation error.
Nonlinear wavelet approximations are nearly optimal compared to adaptive
spline approximations. A spline approximation f̃M is calculated by choosing K
nodes t1 ⬍ t2 ⬍ · · · ⬍ tK inside [0, 1]. Over each interval [tk , tk⫹1 ], f is approximated
by the closest polynomial of degree r. This polynomial spline f̃M is specified by
M ⫽ K (r ⫹ 2) parameters,which are the node locations {tk }1⭐k⭐K plus the K (r ⫹ 1)
parameters of the K polynomials of degree r. To reduce f ⫺ f̃M , the nodes must
be closely spaced when f is irregular and farther apart when f is smooth. However, finding the M parameters that minimize f ⫺ f̃M is a difficult nonlinear
optimization.
457
458
CHAPTER 9 Approximations in Bases
f (t)
40
20
0
⫺20
t
0
0.2
0.4
0.6
0.8
1
0.6
0.8
1
(a)
2⫺5
2⫺6
2⫺7
2⫺8
2⫺9
(b)
fM (t)
40
20
0
⫺20
t
0
0.2
0.4
(c)
FIGURE 9.2
(a) Original signal f . (b) Each Dirac corresponds to one of the largest M ⫽ 0.15 N wavelet
coefficients, calculated with a symmlet 4. (c) Nonlinear approximation fM recovered from the
M largest wavelet coefficients shown in (b), f ⫺ fM / f ⫽ 5.1 10⫺3 .
9.2 Nonlinear Approximations
A spline wavelet basis of Battle-Lemarié gives nonlinear approximations that are
also spline functions,but nodes tk are restricted to dyadic locations 2 j n with a scale
2 j that is locally adapted to the signal regularity. It is computed with O(N ) operations
by projecting the signal in an approximation space of dimension N ⫽ O(T 2 ). For
large classes of signals,including balls of Besov spaces,the maximum approximation
errors with wavelets or with optimized splines have the same decay rate when
M increases [210]. Therefore, the computational overhead of an optimized spline
approximation is not worth it.
9.2.3 Approximations in Besov and Bounded Variation Spaces
Studying the performance of nonlinear wavelet approximations more precisely
requires introducing new spaces. As previously, we write the coarse-scale scaling
s [0, 1] is the set of functions f ∈ L 2 [0, 1]
functions J ,n ⫽ J ⫹1,n .The Besov space B,␥
such that
⎛
⎡
⎛ ⫺j
⎞1/ ⎤␥ ⎞1/␥
J
⫹1
2
⫺1
⎜
⎢ ⫺j(s⫹1/2⫺1/) ⎝
⎥ ⎟
f s,,␥ ⫽ ⎝
| f , j,n | ⎠ ⎦ ⎠ ⬍ ⫹ ⬁.
(9.44)
⎣2
n⫽0
j⫽⫺⬁
s [0, 1] does not depend
Frazier, Jawerth [260], and Meyer [375] proved that B,␥
on the particular choice of wavelet basis, as long as the wavelets in the basis have
s [0, 1] corresponds typically to
q ⬎ s vanishing moments and are in Cq . Space B,␥
functions that have a “derivative of order s” that is in L  [0, 1]. The index ␥ is a finetuning parameter, which is less important. We need q ⬎ s, because a wavelet with q
vanishing moments can test the differentiability of a signal only up to the order q.
When removing the coarsest-scale scaling functions J ,n ⫽ J ⫹1,n from (9.44), the
norm f s,,␥ is called an homogeneous Besov norm that we shall write as f ∗s,,␥
s [0, 1].
and the corresponding homogeneous Besov space is B̃,␥
s
If  ⭓ 2, then functions in B,␥ [0, 1] have a uniform regularity of order s. For
s [0, 1] ⫽ W s [0, 1] is the space of s times
 ⫽ ␥ ⫽ 2, Theorem 9.4 proves that B2,2
differentiable functions in the sense of Sobolev. Theorem 9.5 proves that this
space is characterized by the decay of the linear approximation error l (N , f ) and
that l (N , f ) ⫽ o(N ⫺2s ). Since n (M, f ) ⭐ l (M, f ), clearly n (M, f ) ⫽ o(M ⫺2s ). One
can verify (Exercise 9.11) that nonlinear approximations do not improve linear
approximations over Sobolev spaces. For  ⫽ ␥ ⫽ ⬁, Theorem 9.6 proves that the
homogeneous Besov norm is an homogeneous Hölder norm
f ∗s,⬁,⬁ ⫽ sup 2⫺j(␣⫹1/2) | f , j,n | ∼ f C̃s ,
j⭓J ,n
(9.45)
s [0, 1] is the homogeneous Hölder space of
and the corresponding space B̃⬁,⬁
functions that are uniformly Lipschitz s on [0, 1].
s [0, 1] are not necessarily uniformly regular.Therefore,
For  ⬍ 2,functions in B,␥
the adaptivity of nonlinear approximations significantly improves the decay rate of
459
460
CHAPTER 9 Approximations in Bases
the error. In particular, if p ⫽  ⫽ ␥ and s ⫽ 1/2 ⫹ 1/p, then the Besov norm is a
simple p norm:
⎛
⎞1/p
⫺j ⫺1
J ⫹1 2
f s,,␥ ⫽ ⎝
| f , j,n | p ⎠ .
j⫽⫺⬁ n⫽0
s [0, 1], then (M, f ) ⫽ o(M 1⫺2/p ). The smaller
Theorem 9.10 proves that if f ∈ B,␥
n
p is, the faster the error decay. The proof of Theorem 9.12 shows that although f
may be discontinuous, if the number of discontinuities is finite and if f is uniformly
Lipschitz ␣ between these discontinuities, then its sorted wavelet coefficients sats [0, 1] for 1/p ⬍ ␣ ⫹ 1/2. This shows that these
isfy | fBr [k]| ⫽ O(k⫺␣⫺1/2 ), so f ∈ B,␥
spaces include functions that are not s times differentiable at all points. The lins [0, 1] can decrease arbitrarily slowly
ear approximation error l (M, f ) for f ∈ B,␥
because the M wavelet coefficients at the largest scales may be arbitrarily small.
A nonlinear approximation is much more efficient in these spaces.
Bounded Variation
Bounded variation functions are important examples of signals for which a nonlinear
approximation yields a much smaller error than a linear approximation. The total
variation norm is defined in (2.57) by
1
f V ⫽
| f ⬘(t)| dt.
0
The derivative f ⬘ must be understood in the sense of distributions in order to
include discontinuous functions. To compute the linear and nonlinear wavelet
approximation error for bounded variation signals, Theorem 9.13 computes an
upper and a lower bound of f V from the modulus of wavelet coefficients.
Theorem 9.13. Consider a wavelet basis constructed with such that V ⬍ ⫹⬁. There
exist A, B ⬎ 0 such that for all f ∈ L 2 [0, 1],
⫺j
J ⫹1 2
⫺1
2⫺j/2 | f , j,n | ⫽ B f 1,1,1 ,
(9.46)
⎛ ⫺j
⎞
2
⫺1
2⫺j/2 | f , j,n |⎠ ⫽ A f ∗1,1,⬁ .
f V ⭓ A sup ⎝
(9.47)
f V ⭐ B
j⫽⫺⬁ n⫽0
and
j⭐J
n⫽0
Proof. By decomposing f in the wavelet basis
⫺j
f⫽
J 2
⫺1
j⫽⫺⬁ n⫽0
f , j,n j,n ⫹
2⫺J
⫺1
n⫽0
f , J ,n J ,n ,
9.2 Nonlinear Approximations
we get
⫺j
f V ⭐
J 2
⫺1
| f , j,n | j,n V ⫹
2⫺J
⫺1
j⫽⫺⬁ n⫽0
| f , J ,n | J ,n V .
(9.48)
n⫽0
The wavelet basis includes wavelets, with supports that are inside (0, 1), and border
wavelets, which are obtained by dilating and translating a finite number of mother
wavelets.To simplify notations we write the basis as if there were a single mother wavelet:
j,n (t) ⫽ 2⫺j/2 (2⫺j t ⫺ n). Thus, we verify with a change of variable that
j,n V ⫽
1
2⫺j/2 2⫺j |⬘(2⫺j t ⫺ n)| dt ⫽ 2⫺j/2 V .
0
Since J ,n (t) ⫽ 2⫺J /2 (2⫺J t ⫺ n), we also prove that J ,n V ⫽ 2⫺J /2 V . Thus, the
inequality (9.46) is derived from (9.48).
Since has at least one vanishing moment, its primitive is a function with the same
support, which we suppose is included in [⫺K /2, K /2]. To prove (9.47) for j ⭐ J , we
make an integration by parts:
⫺j ⫺1
2
| f , j,n | ⫽
n⫽0
⫺j ⫺1 2
1
n⫽0
⫽
0
⫺j ⫺1 2
1
n⫽0
⭐ 2 j/2
0
f (t) 2⫺j/2 (2⫺j t ⫺ n) dt f ⬘(t) 2 j/2 (2⫺j t ⫺ n) dt ⫺j ⫺1 2
1
| f ⬘(t)| |(2⫺j t ⫺ n)| dt.
0
n⫽0
Since has a support in [⫺K /2, K /2],
⫺j ⫺1
2
| f , j,n | ⭐ 2 j/2 K sup |(t)|
t∈R
n⫽0
1
| f ⬘(t)| dt ⭐ A⫺1 2 j/2 f V .
(9.49)
0
This inequality proves (9.47).
■
This theorem shows that the total variation norm is bounded by two Besov norms:
A f ∗1,1,⬁ ⭐ f V ⭐ B f 1,1,1 .
The lower bound is an homogeneous norm because the addition of a constant to f
does not modify f V . Space BV[0, 1] of bounded variation functions is therefore
embedded in the corresponding Besov spaces:
1
1
B1,1
[0, 1] ⊂ BV[0, 1] ⊂ BB1,⬁
[0, 1].
Theorem 9.14 derives linear and nonlinear wavelet approximation errors for
bounded variation signals.
461
462
CHAPTER 9 Approximations in Bases
Theorem 9.14. For all f ∈ BV[0, 1] and M ⬎ 2q,
l (M, f ) ⫽ O( f 2V M ⫺1 ),
(9.50)
n (M, f ) ⫽ O( f 2V M ⫺2 ).
(9.51)
and
Section 7.5.3 shows a wavelet basis of L 2 [0, 1] with boundary wavelets keeping their
Proof.
q vanishing moments that has 2q different boundary wavelets and scaling functions, so
the largest wavelet scale can be 2 J ⫽ (2q)⫺1 but not smaller.
There are 2⫺j wavelet coefficients at a scale 2 j , so for any L ⭐ J , there are 2⫺L wavelet
and scaling coefficients at scales 2 j ⬎ 2L . The resulting linear wavelet approximation
error is
⫺j
l (2
⫺L
, f )⫽
L 2
⫺1
| f , j,n |2 .
(9.52)
j⫽⫺⬁ n⫽0
We showed in (9.47) that
⫺j ⫺1
2
| f , j,n | ⭐ A⫺1 2 j/2 f V ,
n⫽0
and thus that
⫺j ⫺1
2
| f , j,n |2 ⭐ A⫺2 2 j f 2V .
n⫽0
It results from (9.52) that
l (2⫺L , f ) ⭐ 2A⫺2 2L f 2V .
Setting M ⫽ 2⫺L , we derive (9.50).
Let us now prove the nonlinear approximation error bound (9.51). Let fBr [k] be the
wavelet coefficient of rank k, excluding all the scaling coefficients f , J ,n , since we
cannot control their value with f V . We first show that there exists B0 such that for all
f ∈ BV[0, 1],
| fBr [k]| ⭐ B0 f V k⫺3/2 .
(9.53)
To take into account the fact that (9.53) does not apply to the 2 J scaling coefficients
f , J ,n , an upper bound of n (M, f ) is obtained by selecting the 2 J scaling coefficients
plus the M ⫺ 2 J biggest wavelet coefficients; thus,
n (M, f ) ⭐
⫹⬁
| fBr [k]|2 .
(9.54)
k⫽M⫺2 J ⫹1
For M ⬎ 2q ⫽ 2⫺J , inserting (9.53) in (9.54) proves (9.51).
The upper bound (9.53) is proved by computing an upper bound of the number
of coefficients larger than an arbitrary threshold T . At scale 2 j , we denote by fBr [ j, k]
9.3 Sparse Image Representations
the coefficient of rank k among { f , j,n }0⭐n⭐2⫺j . The inequality (9.49) proves that for
all j ⭐ J ,
⫺j ⫺1
2
| f , j,n | ⭐ A⫺1 2 j/2 f V .
n⫽0
Thus, it follows from (9.39) that
| fBr [ j, k]| ⭐ A⫺1 2 j/2 f V k⫺1 ⫽ C 2 j/2 k⫺1 .
Thus, at scale 2 j , the number kj of coefficients larger than T satisfies
kj ⭐ min(2⫺j , 2 j/2 C T ⫺1 ).
The total number k of coefficients larger than T is
k⫽
J
j⫽⫺⬁
kj ⭐
2 j ⭓(C ⫺1 T )2/3
2⫺j ⫹
2 j/2 CT ⫺1
2 j ⬎(C ⫺1 T )2/3
⭐ 6 (CT ⫺1 )2/3 .
By choosing T ⫽ | fBr [k]|, since C ⫽ A⫺1 f V , we get
| fBr [k]| ⭐ 63/2 A⫺1 f V k⫺3/2 ,
which proves (9.53).
■
The asymptotic decay rate of linear and nonlinear approximation errors in
Theorem 9.14 cannot be improved. If f ∈ BV[0, 1] has discontinuities,then l (M, f )
decays like M ⫺1 and n (M, f ) decays like M ⫺2 . One can also prove [207] that this
error-decay rate for all bounded variation functions cannot be improved by any
type of nonlinear approximation scheme. In this sense, wavelets are optimal for
approximating bounded variation functions.
9.3 SPARSE IMAGE REPRESENTATIONS
Approximation of images is more complex than one-dimensional signals, because
singularities often belong to geometrical structures such as edges or textures.
Nonlinear wavelet approximation defines adaptive approximation grids that are
numerically highly effective. These approximations are optimal for bounded variation images, but not for images having edges that are geometrically regular.
Section 9.3.2 introduces a piecewise regular image model with regular edges, and
study adaptive triangulation approximations. Section 9.3.3 proves that curvelet
frames yield asymptotically optimal approximation errors for such piecewise C2
regular images.
463
464
CHAPTER 9 Approximations in Bases
9.3.1 Wavelet Image Approximations
Linear and nonlinear approximations of functions in L 2 [0, 1]d can be calculated
in separable wavelet bases. We concentrate on the two-dimensional case for image
processing, and compute approximation errors for bounded variation images.
Section 7.7.4 constructs a separable wavelet basis of L 2 [0, 1]2 from a wavelet
basis of L 2 [0, 1], with separable products of wavelets and scaling functions. We
suppose that all wavelets of the basis of L 2 [0, 1] are Cq with q vanishing moments.
The wavelet basis of L 2 [0, 1]2 includes three mother wavelets { l }1⭐l⭐3 that are
dilated by 2 j and translated over a square grid of interval 2 j in [0, 1]2. As modulo
modifications near the borders, these wavelets can be written as
x1 ⫺ 2 j n1 x2 ⫺ 2 j n2
1
l
j,n
.
(9.55)
(x) ⫽ j l
,
2
2j
2j
They have q vanishing moments in the sense that they are orthogonal to twodimensional polynomials of a degree strictly smaller than q. If we limit the scales
to 2 j ⭐ 2 J , we must complete the wavelet family with two-dimensional scaling
functions
1
x1 ⫺ 2 J n1 x2 ⫺ 2 J n2
2J ,n (x) ⫽ J 2
,
2
2J
2J
to obtain the orthonormal basis of L 2 [0, 1]2 :
l
B ⫽ {2J ,n }2 J n∈[0,1)2 ∪ {j,n
}j⭐J , 2 j n∈[0,1)2 , 1⭐l⭐3 .
Linear Image Approximation
The linear discretization of an analog image in L 2 [0, 1]2 can be defined by N ⫽ 2⫺2L
samples { f , 2L,n }2L n∈[0,1]2 , which characterize the orthogonal projection of f in
the approximation space VL . The precision of such linear approximations depends
on the uniform image regularity.
Local image regularity can be measured with Lipschitz exponents. A function f
is uniformly Lipschitz ␣ over a domain ⍀ ⊂ R2 if there exists K ⬎ 0, such that for
any v ∈ ⍀ one can find a polynomial pv of degree ␣ such that
᭙x ∈ ⍀,
| f (x) ⫺ pv (x)| ⭐ K |x ⫺ v|␣ .
(9.56)
The infimum of K ,which satisfies (9.56),is the homogeneous Hölder ␣ norm f C̃␣ .
The Hölder ␣ norm of f also imposes that f is bounded: f C␣ ⫽ f C̃␣ ⫹ f ⬁ . We
write C␣ [0, 1]2 as the Hölder space of functions for which f C␣ ⬍ ⫹⬁. Similar to
Theorem 9.15 in one dimension,Theorem 9.6 computes the linear approximation
error decay of such functions, with Cq wavelets having q vanishing moments.
Theorem 9.15. There exists B ⭓ A ⬎ 0 such that
A f C̃␣ ⭐
sup
1⭐l⭐3,j⭓J ,0⭐n⬍2⫺j
l
2⫺j(␣⫹1) | f , j,n
| ⭐ B f C̃␣ .
(9.57)
9.3 Sparse Image Representations
Proof. The proof is essentially the same as the proof of Theorem 9.6 in one dimension. We
l ,
shall only prove the right inequality. If f is uniformly Lipschitz ␣ on the support of j,n
l
j
since j,n is orthogonal to the polynomial p2 j n approximating f at v ⫽ 2 n, we get
l
l
2⫺j | l (2⫺j (x ⫺ 2 j n))| |x ⫺ 2 j n|␣ dx
| f , j,n
| ⫽ | f ⫺ p2 j n , j,n
| ⭐ f C̃␣
| l (x)| |x|␣ dx,
⭐ f C̃␣ 2(␣⫹1)j
which proves the right inequality of (9.57). The wavelet regularity is not used to prove
this inequality. The left inequality requires that the wavelets are Cq .
■
Theorem 9.16 computes the linear wavelet approximation error decay for
images that are uniformly Lipschitz ␣. It requires that wavelets have q vanishing
moments, but no regularity condition is needed.
Theorem 9.16. If f is uniformly Lipschitz 0 ⬍ ␣ ⭐ q over [0, 1]2 , then l (N , f ) ⫽
O( f 2C̃␣ N ⫺␣ ).
Proof. There are 3 2⫺2j wavelet coefficients at scale 2 j and 2⫺2k wavelet coefficients and
scaling coefficients at scales 2 j ⬎ 2k . The right inequality of (9.57) proves that
l
| f , j,n
| ⭐ B f C̃␣ 2(␣⫹1)j .
As a result,
l (2⫺2k , f ) ⫽
3
k | f , j,n |2 ⭐ 3B2 f 2C̃␣
j⫽⫺⬁ l⫽1 2 j n∈[0,1]2
k
2⫺2j 2 j(2␣⫹2)
j⫽⫺⬁
3B2 f 2C̃␣ 22␣k
⫽
.
1 ⫺ 2⫺2␣
For 2k ⫽ ⫺ log2 N , we derive that l (N , f ) ⫽ O( f 2 ␣ 22␣k ) ⫽ O( f 2 ␣ N ⫺␣ ).
C̃
C̃
■
One can prove that this decay rate is optimal in the sense that no approximation
scheme can improve decay rate N ⫺␣ over all uniformly Lipschitz ␣ functions [20].
NonLinear Approximation of Piecewise Regular Images
If an image has singularities, then linear wavelet approximations introduce large
errors. In one dimension, an isolated discontinuity creates a constant number of
large wavelet coefficients at each scale. As a result, nonlinear wavelet approximations are marginally influenced by a finite number of isolated singularities.
Theorem 9.12 proves that if f is uniformly Lipschitz ␣ between these singularities, then the asymptotic error decay behaves as if there was no singularity. In two
dimensions, if f is uniformly Lipschitz ␣, then Theorem 9.16 proves that the linear
approximation error from M wavelets satisfies l (M, f ) ⫽ O(M ⫺␣ ). Thus, we can
465
466
CHAPTER 9 Approximations in Bases
hope that piecewise regular images yield a nonlinear approximation error having
the same asymptotic decay. Regretfully, this is wrong.
A piecewise regular image has discontinuities along curves of dimension 1,which
create a nonnegligible number of high-amplitude wavelet coefficients. As a result,
even though the function may be infinitely differentiable between discontinuities,
the nonlinear approximation error n (M, f ) decays only like M ⫺1 .
As an example, suppose that f ⫽ C 1⍀ is the indicator function of a set ⍀ the
border ⭸⍀ of which has a finite length, as shown in Figure 9.3. If the support of
l does not intersect the border ⭸⍀, then f , l ⫽ 0 because f is constant over
j,n
j,n
l . The wavelets l have a square support of size proportional to
the support of j,n
j,n
2 j , which is translated on a grid of interval 2 j . Since ⭸⍀ has a finite length L, there
are on the order of L 2⫺j wavelets with supports that intersect ⭸⍀. Figure 9.3(b)
illustrates the position of these coefficients.
l | ⫽ O(C 2 j ).
Since f is bounded, the result from (9.57) for ␣ ⫽ 0 is that | f , j,n
l | ∼
Along the border, wavelet coefficients typically have an amplitude of | f , j,n
j
j
C 2 .Thus,the M largest coefficients are typically at scales 2 ⭓ L/M. Selecting these
M largest coefficients yields an error of
log2 (L/M)⫺1
n (M, f ) ∼
L2⫺j C 2 22j ⫽ (C L)2 M ⫺1 .
(9.58)
j⫽⫺⬁
⍀
(a)
( b)
FIGURE 9.3
l have a square support of width
(a) Image f ⫽ 1⍀ . (b) At the scale 2 j , the wavelets j,n
j
proportional to 2 . This support is translated on a grid of interval 2 j , which is indicated by the
smaller dots. The darker dots correspond to wavelets with support that intersect the frontier
l ⫽ 0.
of ⍀, for which f , j,n
9.3 Sparse Image Representations
Thus, the large number of wavelet coefficients produced by the edges of f limit the
error decay to M ⫺1 .
Two questions then arise. Is the class of images that have a wavelet approximation
error that decays like M ⫺1 sufficiently large enough to incorporate interesting image
models? The following section proves that this class includes all bounded variation
images. The second question is: Is it possible to find sparse signal representations
that are better than wavelets to approximate images having regular edges z. We
address this question in Section 9.3.2.
Bounded Variation Images
Bounded variation functions provide good models for large classes of images that
do not have irregular textures. The total variation of f is defined in Section 2.3.3 by
1 1
(x)| dx.
f V ⫽
|ⵜf
(9.59)
0
0
must be taken in the general sense of distributions in
The partial derivatives of ⵜf
order to include discontinuous functions. Let ⭸⍀t be the level set defined as the
boundary of
⍀t ⫽ {x ∈ R2 : f (x) ⬎ t}.
Theorem 2.9 proves that the total variation depends on length H 1 (⭸⍀t ) of level sets:
1 1
⫹⬁
(x)| dx ⫽
|ⵜf
H 1 (⭸⍀t ) dt.
(9.60)
0
⫺⬁
0
It results that if f ⫽ C 1⍀ ,then f V ⫽ C L where L is the length of the boundary of ⍀.
Thus, indicator functions of sets have a bounded total variation that is proportional
to the length of their “edges.”
Linear and nonlinear approximation errors of bounded variation images are computed by evaluating the decay of their wavelet coefficients across scales. We denote
with fBr [k] the rank k wavelet coefficient of f , without including the 22J scaling
coefficients f , 2J ,n . Theorem 9.17 gives upper and lower bounds on f V from
wavelet coefficients. Wavelets are supposed to have a compact support and need
only one vanishing moment.
Theorem 9.17: Cohen, DeVore, Pertrushev, Xu. There exist A, B1 , B2 ⬎ 0 such that if
f V ⬍ ⫹⬁, then
J
3
l
| f , j,n
| ⫹
j⫽⫺⬁ l⫽1 2 j n∈[0,1]2
| f , 2J ,n | ⭓ A f V ,
(9.61)
2 J n∈[0,1]2
⎛
sup ⎝
⫺⬁⬍j⭐J
1⭐l⭐3
2 j n∈[0,1]2
⎞
l
| f , j,n
|⎠ ⭐ B1 f V ,
(9.62)
467
468
CHAPTER 9 Approximations in Bases
and
| fBr [k]| ⭐ B2 f V k⫺1 .
(9.63)
Proof. In two dimensions, a wavelet total variation does not depend on scale and position.
Indeed, with a change of variable x⬘ ⫽ 2⫺j x ⫺ n, we get
l
j,n
V ⫽
l (x)| dx ⫽
|ⵜ
j,n
l (x⬘)| dx⬘ ⫽ l V .
|ⵜ
Similarly, 2J ,n V ⫽ 2 V . The inequalities (9.61) and (9.62) are proved with the same
derivation steps as in Theorem 9.13 for one-dimensional bounded variation functions.
The proof of (9.63) is technical and can be found in [175].The inequality (9.62) proves
that wavelet coefficients have a bounded l1 norm at each scale 2 j . It results from (9.39)
in Theorem 9.10 that ranked wavelet coefficients at each scale 2 j have a decay bounded
by B1 f V k⫺1 . The inequality (9.63) is finer since it applies to the ranking of wavelet
coefficients at all scales.
■
The Lena image is an example of finite-resolution approximation of a bounded
variation image. Figure 9.4 shows that its sorted wavelet coefficients log2 | fBr [k]|
decays with a slope that reaches ⫺1 as log2 k increases, which verifies that
| fBr [k]| ⫽ O(k⫺1 ). In contrast,the Mandrill image shown in Figure 10.7 does not have
a bounded total variation because of the fur texture. As a consequence, log2 | fBr [k]|
decays more slowly, in this case with a slope that reaches ⫺0.65.
A function with finite total variation does not necessarily have a bounded amplitude, but images do have a bounded amplitude. Theorem 9.18 incorporates this
hypothesis to compute linear approximation errors.
log2| fBr[k]|
12
10
8
(b)
6
(a)
4
2
6
8
10
12
14
log2 k
16
FIGURE 9.4
Sorted wavelet coefficients log2 | fBr [k]| as a function of log2 k for two images. (a) Lena image
shown in Figure 9.5(a). (b) Mandrill image shown in Figure 10.7.
9.3 Sparse Image Representations
Theorem 9.18. If f V ⬍ ⫹⬁ and f ⬁ ⬍ ⫹⬁, then
l (M, f ) ⫽ O( f V f ⬁ M ⫺1/2 )
(9.64)
n (M, f ) ⫽ O( f 2V M ⫺1 ).
(9.65)
and
Proof. The linear approximation error from M ⫽ 2⫺2m wavelets is
l (2⫺2m , f ) ⫽
3
m l
| f , j,n
|2 .
(9.66)
j⫽⫺⬁ l⫽1 2 j n∈[0,1]2
We shall verify that there exists B ⬎ 0 such that for all j and l,
l
| f , j,n
|2 ⭐ B f V f ⬁ 2 j .
(9.67)
2 j n∈[0,1]2
Applying this upper bound to the sum (9.66) proves that
l (2⫺2m , f ) ⭐ 6 B1 f V f ⬁ 2m ,
from which (9.64) is derived.
The upper bound (9.67) is calculated with (9.62),which shows that there exists B2 ⬎ 0
such that for all j and l,
l
| f , j,n
| ⭐ B2 f V .
(9.68)
2 j n∈[0,1]2
The amplitude of a wavelet coefficient can also be bounded:
l
l
| f , j,n
| ⭐ f ⬁ j,n
1 ⫽ f ⬁ 2 j l 1 ,
where l 1 is the L 1 [0, 1]2 norm of l . If B3 ⫽ max1⭐l⭐3 l 1 , this yields
l
| f , j,n
| ⭐ B3 2 j f ⬁ .
(9.69)
Since n |an |2 ⭐ supn |an | n |an |, we get (9.67) from (9.68) and (9.69).
The nonlinear approximation error is a direct consequence of the sorted coefficient
decay (9.63). It results from (9.63) and (9.36) that n (M, f ) ⫽ O( f 2V M ⫺1 ).
■
This theorem shows that nonlinear wavelet approximations of bounded variation
images can yield much smaller errors than linear approximations.The decay bounds
of this theorem are tight in the sense that one can find functions, for which l (M, f )
and n (M, f ) decay, respectively, like M ⫺1/2 and M ⫺1 . This is typically the case
for bounded variation images including discontinuities along edges, for example,
f ⫽ C 1⍀ . The Lena image in Figure 9.5(a) is another example.
Figure 9.5(b) is a linear approximation calculated with M ⫽ N /16 largest-scale
wavelet coefficients.This approximation produces a uniform blur and creates Gibbs
oscillations in the neighborhood of contours. Figure 9.5(c) gives the nonlinear
469
470
CHAPTER 9 Approximations in Bases
(a)
( b)
(c)
(d )
FIGURE 9.5
(a) Lena image f of N ⫽ 2562 pixels. (b) Linear approximations fM calculated from the
M ⫽ N /16 symmlet 4 wavelet coefficients at the largest scales: f ⫺ fM / f ⫽ 0.036.
(c) The support of the M ⫽ N /16 largest-amplitude wavelet coefficients are shown in black.
(d) Nonlinear approximation fM calculated from the M largest-amplitude wavelet coefficients:
f ⫺ fM / f ⫽ 0.011.
approximation support ⌳T of M ⫽ N /16 largest-scale wavelet coefficients. Largeamplitude coefficients are located where the image intensity varies sharply, in
particular along the edges. The resulting nonlinear approximation is shown in
Figure 9.5(d). The nonlinear approximation error is much smaller than the linear
9.3 Sparse Image Representations
approximation error—n (M, f ) ⭐ l (M, f )/10—and the image quality is indeed better. As in one dimension, this nonlinear wavelet approximation can be interpreted
as an adaptive grid approximation,which refines the approximation resolution near
edges and textures by keeping wavelet coefficients at smaller scales.
9.3.2 Geometric Image Models and Adaptive Triangulations
Bounded variation image models correspond to images that have level sets with a
finite average length, but they do not imply any geometrical regularity of these level
sets. The level sets and “edges” of many images such as Lena are often piecewise
regular curves. This geometric regularity can be used to improve the sparsity of
image representations.
When an image is uniformly Lipschitz ␣,Theorem 9.16 proves that wavelet nonlinear approximations have an error l (M, f ) ⫽ O(M ⫺␣ ),which is optimal. However,
as soon as the image is discontinuous along an edge, then the error decay rate drops
to n (M, f ) ⫽ O(M ⫺1 ), because edges create a number of large wavelet coefficients
that is proportional to their length. This decay rate is improved by representations
taking advantage of edge geometric regularities. We introduce a piecewise regular
image model that incorporates the geometric regularity of edges. Approximations
of piecewise regular images are studied with adaptive triangulations.
Piecewise C␣ Image Models
Piecewise regular image models include edges that are also piecewise regular.These
edges are typically occlusion contours of objects in images. The regularity is measured in the sense of uniform Lipschitz regularity with Hölder norms f C␣ ,defined
in (9.20) and (9.56) for one- and two-dimensional functions, respectively. Edges
are supposed to be a finite union of curves ek that are uniformly Lipschitz ␣ in
[0, 1]2 ; between edges, the image is supposed to be uniformly Lipschitz ␣. To model
the blur introduced by the optics or by diffraction phenomena, the image model
incorporates a convolution by an unknown regular kernel hs , with scale s that is a
parameter.
Definition 9.1. A function f ∈ L 2 [0, 1]2 is said to be a piecewise C␣ with a blurring
scale s ⭓ 0, if f ⫽ f̄ hs where f̄ is uniformly Lipschitz ␣ on ⍀ ⫽ [0, 1]2 ⫺ {ek }1⭐k⬍K . If
s ⬎ 0, then hs (x) ⫽ s⫺2 h(s⫺1 x) where h is a uniformly Lipschitz ␣ kernel with a support in
[⫺1, 1], and if s ⫽ 0, then h0 ⫽ ␦. Curves ek are uniformly Lipschitz ␣ and do not intersect
tangentially.
When the blurring scale s ⫽ 0,then f ⫽ f̄ h0 ⫽ f̄ is typically discontinuous along
the edges. When s ⬎ 0, then f̄ hs is blurred and discontinuities along edges are
diffused on a neighborhood of size s, as shown in Figure 9.6.
Approximations with Adapted Triangulations
Wavelet approximations are inefficient to recover regular edges because it takes
many wavelets to cover the edge at each scale, as shown by Figure 9.3. To improve
471
472
CHAPTER 9 Approximations in Bases
f
(a)
f*h
(b)
FIGURE 9.6
Adaptive triangulations for piecewise linear approximations of a piecewise C2 image, without
(a) and with (b) a blurring kernel.
this approximation, it is necessary to use elongated approximation elements. For
C2 piecewise regular images, it is proved that piecewise linear approximations over
M adapted triangles can reach the optimal O(M ⫺2 ) error decay, as if the image had
no singularities.
For the solution of partial differential equations, the local optimization
of anisotropic triangles was introduced by Babuška and Aziz [90]. Adaptive
triangulations are indeed useful in numerical analysis, where shocks or boundary
layers require anisotropic refinements of finite elements [77, 410, 434]. A planar
triangulation (V, T ) of [0, 1]2 is composed of vertices V ⫽ {xi }0⭐i⬍p and disjoint
triangular faces T ⫽ {Tk }0⭐k⬍M that cover the image domain
M⫺1
&
k⫽0
Tk ⫽ [0, 1]2 .
9.3 Sparse Image Representations
Let M be the number of triangles.We consider an image approximation f̃M obtained
with linear interpolations of the image values at the vertices. For each xi ∈ V,
f̃M (xi ) ⫽ f (xi ), and f̃M (x) is linear on each triangular face Tk . A function f is well
approximated with M triangles if the shapes of these triangles are optimized to
capture the regularity of f .
Theorem 9.19 proves that adapted triangulations for piecewise C2 images leads
to the optimal decay rate O(M ⫺2 ) by sketching the construction of a well-adapted
triangulation.
Theorem 9.19. If f is a piecewise C2 image, then there exists C such that for any M
one can construct a triangulation (V , T ) with M triangles over which the piecewise linear
interpolation f̃M satisfies
f ⫺ f̃M 2 ⭐ C M ⫺2 .
(9.70)
Proof. A sketch of the proof is given. Let us first consider the case where the blurring scale is
s ⫽ 0. Each edge curve ek is C2 and can be covered by a band k of width 2 . This band is
a union of straight tubes of width 2 and length /C where C is the maximum curvature
of the edge curves. Each tube is decomposed in two elongated triangles as illustrated in
Figure 9.6.The number of such triangles is 2CLk ⫺1 where Lk is the length of ek
.Triangle
vertices should also be appropriately adjusted at junctions or corners. Let L ⫽ k Lk be
the total length of edges and  ⫽ ∪k k be the band covering all edges.This band is divided
in 2CL⫺1 triangles. Since f is bounded, f ⫺ f̃M 2L2 () ⭐ area() f 2⬁ ⭐ L 2 f 2⬁ .
The complementary c ⫽ [0, 1]2 ⫺  is covered at the center by nearly equilateral
triangles of surface of the order of , as shown in Figure 9.6. There are O(⫺1 ) such
triangles. Performing this packing requires us to use a boundary layer of triangles that
connect the large nearly isotropic triangles of width 1/2 to the anisotropic triangles of
size ∼ 2 ⫻ , along edges. One can verify that such a boundary layer can be constructed
with O(LC⫺1 ) triangles. Since f is C2 on ⍀ ⫺ , the approximation error of a linear
interpolation f̃ over this triangulation is f ⫺ f̃M L⬁ (c ) ⫽ O( f 2C2 (c ) 2 ). Thus, the total
error satisfies
f ⫺ f̃M 2 ⫽ O(L f 2⬁ 2 ⫹ f 2C2 2 )
with a number of triangles M ⫽ O((C L ⫹ 1) ⫺1 ), which verifies (9.70) for s ⫽ 0.
Suppose now that s ⬎ 0. According to Definition 9.1, f ⫽ f̄ hs , so edges are diffused
into sharp transitions along a tube of width s. Since h is C2 , within this tube f is C2 , but
it has large-amplitude derivatives if s is small. This defines an overall band  of surface L s
where the derivatives of f are potentially large. The triangulation of domain [0, 1]2 ⫺ 
can be treated similarly to the case s ⫽ 0. Thus, we concentrate on the triangulation of
band  and show that one can find a triangulation with O(⫺1 ) triangles that yields an
error in O(2 ) over the band.
Band  has a surface Ls and can thus be covered by L ⫺1 triangles of surface s. Let
us compute the aspect ratio of these triangles to minimize the resulting error. A blurred
piecewise C2 image f ⫽ f̄ hs has an anisotropic regularity at a point x close to an edge
curve ek where f̄ is discontinuous. Let 1 (x) be the unit vector that is tangent to ek at
point x, and 2 (x) be the perpendicular vector. In the system of coordinates (1 , 2 ), for
473
474
CHAPTER 9 Approximations in Bases
any u ⫽ u1 , 1 ⫹ u2 2 in the neighborhood of x, one can prove that [342, 365]
⭸i1 ⫹i2 f
i1 i2 (u) ⫽ O(s⫺i1 /2⫺i2 ).
⭸u ⭸u
1
2
(9.71)
Thus, for s small the derivatives are much larger along 2 than along 1 . The error of local
linear approximation can be computed with a Taylor decomposition
1
f (x ⫹ ⌬) ⫽ f (x) ⫹ ⵜx f , h ⫹ Hx ( f ) ⌬, ⌬ ⫹ O(⌬2 ),
2
where Hx ( f ) ∈ R2⫻2 is the symmetric Hessian tensor of second derivatives. Let us
decompose ⌬ ⫽ ⌬1 1 ⫹ ⌬2 2 :
| f (x ⫹ ⌬) ⫺ ( f (x) ⫹ ⵜx f , ⌬)| ⫽ O(s⫺1 |⌬1 |2 ⫹ s⫺2 |⌬2 |2 ⫹ s⫺3/2 |⌬1 | |⌬2 |).
(9.72)
For a triangle of width and length ⌬1 and ⌬2 , we want
√ to minimize the error for a given
surface s ∼ ⌬1 ⌬2 , which is obtained with ⌬2 /⌬1 ⫽ s. It results that ⌬1 ∼ s1/3 1/2 and
⌬2 ∼ s3/4 1/2 , which yields a linear approximation error (9.72) on this triangle:
| f (x ⫹ ⌬) ⫺ ( f (x) ⫹ ⵜx f , ⌬)| ⫽ O(s⫺1/2 ).
This gives
f ⫺ f̃M 2L2 () ⭐ area() f ⫺ f̃M 2⬁ ⫽ O(L2 ).
It proves that the O(⫺1 ) triangles produce an error of O(2 ) on band . Since the same
result is valid on [0, 1]2 ⫺ , it yields (9.70).
■
This theorem proves that an adaptive triangulation can yield an optimal decay
rate O(M ⫺2 ) for a piecewise C2 image. The decay rate of the error is independent
from the blurring scale s, which may be zero or not. However, the triangulation
depends on s and on the edge geometry.
Wherever f is uniformly C2 , it is approximated over large, nearly isotropic triangles of size O(M 1/2 ). In the neighborhood of edges, to introduce an error O(M ⫺2 ),
triangles must be narrow in the direction of the discontinuity and as long as possible
to cover the discontinuity with a minimum number of triangles. Since edges are C2 ,
they can be covered with triangles of width and length proportional to M ⫺2 and
M ⫺1 , as illustrated in Figure 9.7.
If the image is blurred at a scale s, then the discontinuities are a diffused neighborhood of size s where the image has sharp transitions. Theorem 9.19 shows that
the tube of width s around each edge should be covered with triangles of a width
and length proportional to s3/4 M ⫺1/2 and s1/4 M ⫺1/2 , as illustrated in Figure 9.8.
Algorithms to Design Adapted Triangulations
It is difficult to adapt the triangulation according to Theorem 9.19 because the
geometry of edges and the blurring scale s are unknown.There is currently no adaptive triangulation algorithm that guarantees finding an approximation with an error
decay of O(M ⫺2 ) for all piecewise C2 images. Most algorithms use greedy strategies
9.3 Sparse Image Representations
M22
M2a
M21
M21
(a)
(b)
FIGURE 9.7
(a) Adapted triangle and (b) finite element to approximate a discontinuous function around a
singularity curve.
s3/4M21/2
S
s1/4M21/2
FIGURE 9.8
Aspect ratio of triangles for the approximation of a blurred contour.
that iteratively construct a triangulation by progressively reducing the approximation error. Some algorithms progressively increase the number of triangles, while
others decimate a dense triangulation.
Delaunay refinement algorithms, introduced by Ruppert [421] and Chew [160],
proceed by iteratively inserting points to improve the triangulation precision. Each
point is added at a circumcenter of one triangle,and is chosen to reduce the approximation error as much as possible.These algorithms control the shape of triangles and
are used to compute isotropic triangulations of images where the size of the triangles
varies to match the local image regularity. Extensions of these vertex insertion methods capture anisotropic regularity by locally modifying the notion of circumcenter
[337] or with a local optimization of the vertex location [77].
Triangulation-thinning algorithms start with a dense triangulation of the domain
and progressively remove either a vertex, an edge, or a face to increase the
approximation error as slowly as possible until M triangles remain [238, 268, 305].
These methods do not control the shape of the resulting triangles, but can create effective anisotropic approximation meshes. They have been used for image
compression [205, 239].
475
476
CHAPTER 9 Approximations in Bases
These adaptive triangulations are most often used to mesh the interior of a two- or
three-dimensional domain having a complex boundary,or to mesh a two-dimensional
surface embedded in three-dimensional space.
Approximation of Piecewise C␣ Images
Adaptive triangulations could be generalized to higher-order approximations of
piecewise C␣ images to obtain an error in O(M ⫺␣ ) with M finite elements for ␣ ⬎ 2.
This would require computing polynomial approximations of order p ⫽ ␣ ⫺ 1 on
each finite element, and the support of these finite elements should also approximate the geometry of edges at an order p.To produce an error f ⫺ fM 2 ⫽ O(M ⫺␣ ),
it is indeed necessary to cover edge curves with O(M) finite elements of width
O(M ⫺␣ ), as illustrated in Figure 9.7. However, this gets extremely complicated
and has never been implemented numerically. Section 12.2.4 introduces bandlet
approximations, which reach the O(M ⫺␣ ) error decay for piecewise C␣ images,
by choosing a best basis in a dictionary of orthonormal bases.
9.3.3 Curvelet Approximations
Candès and Donoho [135] showed that a simple thresholding of curvelet coefficients yields nearly optimal approximations of piecewise C2 images, as opposed to
a complex triangulation optimization.
Section 5.5.2 introduced curvelets, defined with a translation, rotation, and
anisotropic stretching
c2 j ,u (x1 , x2 ) ⫽ c2 j (R (x ⫺ u))
where c2 j (x1 , x2 ) ≈ 2⫺3j/4 c(2⫺j/2 x1 , 2⫺j x2 ), (9.73)
is elongated in the direcwhere R is the planar rotation of angle . A curvelet cj,m
2
tion , with a width proportional to its length . This parabolic scaling corresponds
to the scaling of triangles used by Theorem 9.19 to approximate a discontinuous
image along C2 edges.
( j,)
}
A tight frame D ⫽ {cj,m
j,m, of curvelets cj,m (x) ⫽ c2 j (x ⫺ um ) is obtained with
2⫺ j/2 ⫹2 equispaced angles at each scale 2 j , and a translation grid defined by
᭙ m ⫽ (m1 , m2 ) ∈ Z2 ,
( j,)
um
⫽ R (2 j/2 m1 , 2 j m2 ).
(9.74)
This construction yields a tight frame of L 2 [0, 1]2 with periodic boundary conditions [135].
An M-term thresholding curvelet approximation is defined by
fM ⫽
f , cj,m
cj,m
with ⌳T ⫽ ( j, , m) : | f , cj,m
| ⬎ T ,
(,j,m)∈⌳T
where M ⫽ |⌳T | is the number of approximation curvelets. Since curvelets define
a tight frame with a frame bound of A ⬎ 0,
f ⫺ fM 2 ⭐ A⫺1
| f , cj,m
|2 .
(9.75)
(,j,m)∈⌳
/ T
9.3 Sparse Image Representations
Although the frame is tight, this is not an equality because the thresholded curvelet
coefficients of f are not the curvelet coefficients of fM due to the frame redundancy.
Theorem 9.20 shows that a thresholding curvelets approximation of a piecewise C2
image has an error with a nearly optimal asymptotic decay.
Theorem 9.20: Candès, Donoho. Let f be a piecewise C2 image. An M-term curvelet
approximation satisfies
f ⫺ fM 2 ⫽ O(M ⫺2 (log M)3 ).
(9.76)
Proof. The detailed proof can be found in [135]. We give the main ideas by analyzing how
curvelet atoms interact with regular parts and with edges in a piecewise regular image.
The blurring f ⫽ f̄ ∗ hs is nearly equivalent to translating curvelet coefficients from a scale
2 j to a scale 2 j ⫹ s, and thus does not introduce any difficulty in the approximation. In
the following, we suppose that s ⫽ 0.
The approximation error (9.76) is computed with an upper bound on the energy
of curvelet coefficients (9.75). This sum is divided in three sets of curvelet coefficients.
Type I curvelets have a support mostly concentrated where the image is uniformly C2 ;
these coefficients are small. Type II curvelets are in a neighborhood of an edge, and are
elongated along the direction of the tangent to the edge; these coefficients are large.
Type III curvelets are in a neighborhood of an edge with an angle different from the
tangent angle; these coefficients get quickly small as the difference of angle increases. An
upper bound of the error is computed by selecting the largest M/3 curvelet coefficients
for each type of curvelet coefficient and by computing the energy of the leftover curvelet
coefficients for each type of coefficient.
( j,)
A type I curvelet is located at a position um at a distance larger than K 2 j/2 from
edges. Since curvelets have vanishing moments, keeping type I curvelets at scales larger
than 2 j is equivalent to implementing a linear wavelet approximation at a scale 2 j .
Theorem 9.16 shows that for linear wavelet approximations of C2 images, keeping M/3
larger-scale coefficients yields an error that decreases like O(M ⫺2 ).
For type II curvelets in the neighborhood of an edge, with an angle aligned with the
local direction of the tangent to the edge, the sampling interval in this direction is also
2 j/2 . Thus, there are O(L2⫺j/2 ) type II curvelets along the edge curves of length L in the
image. These curvelet coefficients are bounded:
| f , cj,m
| ⭐ f ⬁ c2 j 1 ⫽ f ⬁ O(23j/4 )
because c2 j (x1 , x2 ) ≈ 2⫺3j/4 c(2⫺j x1 , 2⫺j/2 x2 ). If we keep the M/3 larger-scale type II
curvelets, since there are O(L2⫺j/2 ) such curvelets at each scale, it amounts to keeping them at scales above 2l ⫽ O(L2 M ⫺2 ). The leftover type II curvelets have an energy
O(L 2⫺l/2 ) f 2⬁ O(23l/2 ) ⫽ O(M ⫺2 ).
| are located in the neighborhood of an edge
Type III curvelet coefficients | f , cj,m
with an angle that deviates from the local orientation 0 of the edge tangent. The
coefficients have a fast decay as | ⫺ 0 | increases. This is shown in the Fourier domain
by verifying that the Fourier domain where the energy of the localized edge patch is
as | ⫺ |
concentrated becomes increasingly disjoint from the Fourier support of ĉj,m
0
increases. The error analysis of these curvelet coefficients is the most technical aspect of
the proof. One can prove that selecting the M/3 type III largest curvelets yields an error
477
478
CHAPTER 9 Approximations in Bases
among type III curvelets that is O(M ⫺2 (log2 M)3 ) [135]. This error dominates the overall
approximation error and yields (9.76).
■
The curvelet approximation rate is optimal up to a (log2 M)3 factor. The beauty
of this result comes from its simplicity. Unlike an optimal triangulation that must
adapt the geometry of each element, curvelets define a fixed frame with coefficients that are selected by a simple thresholding. However, this simplicity has
a downside. Curvelet approximations are optimal for piecewise C␣ images for
␣ ⫽ 2, but they are not optimal if ␣ ⬎ 2 or ␣ ⬍ 1. In particular, curvelets are
not optimal for bounded variation functions and their nonlinear approximation
error does not have the M ⫺1 decay of wavelets. Irregular geometric structures
and isolated singularities are approximated with more curvelets than wavelets.
Section 12.2.4 studies approximations in bandlet dictionaries, which are adapted to
the unknown geometric image regularity.
In most images, curvelet frame approximations are not as effective as wavelet
orthonormal bases because they have a redundancy factor A that is at least 5, and
because most images include some structures that are more irregular than just
piecewise C2 elements. Section 11.3.2 describes curvelet applications to noise
removal.
9.4 EXERCISES
9.1
2 Suppose that {g
and {g̃m }m∈Z are two dual frames of L 2 [0, 1].
m}
N ⫺1 m∈Z
(a) Let fN ⫽ m⫽0 f , gm g̃m . Prove that the result (9.4) of Theorem 9.1
remains valid.
(b) Let f , gmk be the coefficient of rank k: | f , gmk | ⭓ | f , gmk⫹1 |. Prove
that if | f , gmk | ⫽ O(k⫺s ) with s ⬎ 1/2, then the best approximation fM
of f with M frame vectors satisfies f ⫺ fM 2 ⫽ O(M 1⫺2s ).
9.2
1 Color images. A color pixel is represented by red, green, and blue com-
ponents (r, g, b), which are considered as orthogonal coordinates in a
three-dimensional color space. The red r[n1 , n2 ], green g[n1 , n2 ], and blue
b[n1 , n2 ] image pixels are modeled as values taken by, respectively, three
random variables R, G, and B, which are the three coordinates of a color
vector. Estimate numerically the 3 ⫻ 3 covariance matrix of this color random vector from several images and compute the Karhunen-Loève basis
that diagonalizes it. Compare the color images reconstructed from the two
Karhunen-Loève color channels of highest variance with a reconstruction
from the red and green channels.
9.3
2 Suppose that B ⫽ {g }
2
m m∈Z and B̃ ⫽ {g̃m }m∈Z are two dual frames of L [0, 1].
N ⫺1
Let fN ⫽ m⫽0 f , gm g̃m . Prove that the result (9.4) ofTheorem 9.1 remains
valid even though B is not an orthonormal basis.
9.4 Exercises
9.4
2 Let f ⫽ ( f )
be a multichannel signal where each fk is a signal of size
k 0⭐k⬍K ⫺1
2
N . We write f2F ⫽ K
k⫽0 fk . Let fM ⫽ ( fk,M )0⭐k⬍K be the multichannel
signal obtained by projecting all fk on the same M vectors of an orthonormal
basis B ⫽ {gm }0⭐m⬍N . Prove that the best M-term approximation fM that
minimizes f ⫺ fM 2F is obtained by selecting the M vectors gm ∈ B that
⫺1
2
maximize K
k⫽0 | fk , gm | .
9.5
1 Verify that for f ⫽ C 1
[0,1/2] ,a linear approximation with the N largest scale
wavelets over [0, 1] produces an error that satisfies l (N , f ) ∼ f 2V N ⫺1 .
9.6
2 Prove that for any f ∈ L 2 [0, 1], if f ⬍ ⫹⬁, then f ⬍ ⫹⬁. Verify that
V
⬁
one can find an image f ∈ L 2 [0, 1]2 such that f V ⬍ ⫹⬁ and f ⬁ ⫽ ⫹⬁.
9.7
2 Prove that if f ∈ W s (R) with s ⬎ p ⫹ 1/2, then f ∈ Cp .
9.8
2 The family of discrete polynomials { p [n] ⫽ nk }
N
k
0⭐k⬍N is a basis of C .
(a) Implement numerically a Gram-Schmidt algorithm that orthogonalizes
{ pk }0⭐k⬍N .
(b) Let f be a signal of size N . Compute the polynomial fk of degree k that
minimizes f ⫺ fk . Perform numerical experiments on signals f that are
uniformly smooth and piecewise smooth. Compare the approximation
error with the error obtained by approximating f with the k lowerfrequency Fourier coefficients.
9.9
3 Let f be a function with a finite total variation f V on [0, 1]. For a quantization step ⌬, [0, 1] is divided into consecutive intervals [tk , tk⫹1 ] with
m ⌬ ⭐ f (t) ⭐ (m ⫹ 1) ⌬ for t ∈ [tk , tk⫹1 ]. In' each [tk , tk⫹1 ], f (t) is approxit
mated by its average ak ⫽ (tk⫹1 ⫺ tk )⫺1 tkk⫹1 f (t) dt. Let M be the total
number of such intervals and fM (t) ⫽ ak for t ∈ [tk , tk⫹1 ] be a piecewise
constant approximation of f . Prove that there exists a constant C such that
f ⫺ fM 2 ⭐ C f 2V M ⫺2 .
9.10
3 Let ␣[M] be a decreasing sequence such that lim
M→⫹⬁ ␣[M] ⫽ 0. By using
(9.61) prove that there exists a bounded variation function f ∈ L 2 [0, 1]2 such
that l ( f , M) ⭓ ␣[M] (the amplitude of f is not bounded).
9.11
1 Consider a wavelet basis of L 2 [0, 1] constructed with wavelets having
q ⬎ s vanishing moments and that are Cq . Construct functions f ∈ W s [0, 1]
for which the linear and nonlinear approximation errors in this basis are
identical: l ( f , M) ⫽ n ( f , M) for any M ⭓ 0.
9.12
2 Let f (t) be a piecewise polynomial signal of degree 3 defined on [0, 1],with
K discontinuities. We denote by fl,M and fn,M , respectively, the linear and
nonlinear approximations of f from M vectors chosen from a Daubechies
wavelet basis of L 2 [0, 1], with four vanishing moments.
(a) Give upper bounds as a function of K and M of f ⫺ fl,M 2 and f ⫺
fn,M 2 .
479
480
CHAPTER 9 Approximations in Bases
(b) The Piece-polynomial signal f in WAVELAB is piecewise polynomial with
degree 3. Decompose it in a Daubechies wavelet basis with four vanishing moments, and compute f ⫺ fK and f ⫺ f̃K as a function of K .
Verify your analytic formula.
9.13
f [n] be defined over [0, N ]. We denote by fp,k [n] the signal that is
piecewise constant on [0, k], takes at most p different values, and minimizes
2 Let
p,k ⫽ f ⫺ fp,k 2[0,k] ⫽
k
| f [n] ⫺ fp,k [n]|2 .
n⫽0
(a) Compute as a function of f [n] the value al,k that minimizes cl,k ⫽
k
2
n⫽l | f [n] ⫺ al,k | .
(b) Prove that
p,k ⫽ min {p⫺1,l ⫹ cl,k }.
l∈[0,k⫺1]
Derive a bottom-up algorithm that computes progressively fp,k for
0 ⭐ k ⭐ N and 1 ⭐ p ⭐ K , and obtains fK ,N with O(K N 2 ) operations.
Implement this algorithm in WAVELAB.
(c) Compute the nonlinear approximation of f with the K largest-amplitude
Haar wavelet coefficients and the resulting approximation error. Compare this error with f ⫺ fK ,N as a function of K for the lady and the
Piece-polynomial signals in WAVELAB. Explain your results.
9.14
2 Approximation of oscillatory functions:
(a) Let f (t) ⫽ a(t) exp[i(t)]. If a(t) and ⬘(t) remain nearly constant on
the support of j,n , then show with an approximate calculation that
√
f , j,n ≈ a(2 j n) 2 j ˆ 2 j ⬘(2 j n) .
(9.77)
(b) Let f (t) ⫽ sin t ⫺1 1[⫺1/,1/] (t). Show that the p norm of the wavelet
coefficients of f is finite if and only if p ⬍ 1. Use the approximate formula.
(c) Compute an upper bound of the nonlinear approximation error
n ( f , M) of sin t ⫺1 from M wavelet coefficients. Verify your theoretical
estimate with a numerical calculation in WAVELAB.
CHAPTER
Compression
10
Reducing a liter of orange juice to a few grams of concentrated powder is what lossy
compression is about. The taste of the restored beverage is similar to the taste of
orange juice but has often lost some subtlety. In this book we are more interested in
sounds and images,but we face the same trade-off between quality and compression.
Saving money for data storage and improving transmissions through channels with
limited bandwidth are major compression applications.
A transform coder decomposes a signal in an orthogonal basis and quantizes the
decomposition coefficients. The distortion of the restored signal is minimized by
optimizing the quantization, the basis, and the bit allocation. The basic information
theory necessary for understanding quantization properties is introduced. Distortion rate theory is first studied at high bit rates, in a Bayes framework, where signals
are realizations of a random vector that has a probability distribution that is known a
priori. This applies to audio coding, where signals are often modeled with Gaussian
processes.
High signal-compression factors are obtained in sparse representations, where
few nonzero coefficients are kept. Most of the bits are devoted to code the geometry
of these nonzero coefficients,and the distortion is dominated by the resulting nonlinear approximation term. Wavelet image transform codes illustrate these properties.
JPEG and JPEG-2000 image-compression standards are described.
10.1 TRANSFORM CODING
When production models are available,as in speech signals,compression algorithms
can code parameters that produce a signal approximation. When no such model
is available, as in general audio signals or images, then transform codes provide
efficient compression algorithms with sparse representations in orthonormal bases.
Section 10.1.1 reviews different types of compression algorithms and Section 10.1.2
concentrates on transform codes.
481
482
CHAPTER 10 Compression
10.1.1 Compression State of the Art
Speech
Speech coding is used for telephony, where it may be of limited quality but good
intelligibility, and for higher-quality teleconferencing. Telephone speech is limited
to the frequency band of 200–3400 Hz and is sampled at 8 kHz. A pulse code
modulation (PCM), which quantizes each sample on 8 bits, produces a code with
64 kb/s (64 103 bits per second). This can be considerably reduced by removing
some of the speech redundancy.
The production of speech signals is well understood. Model-based analysis–
synthesis codes give intelligible speech at 2 kb/s. This is widely used for defense
telecommunications [316, 460]. Digital cellular telephony uses 8 kb/s or less to
reproduce more natural voices. Linear predictive codes (LPCs) restore speech signals by filtering white noise or a pulse train with linear filters defined by parameters
that are estimated and coded. For higher bit rates,the quality of LPC speech production is enhanced by exciting the linear filters with waveforms chosen from a larger
family. These code-excited linear prediction (CELP) codes provide nearly perfect
telephone quality at 16 kb/s.
Audio
Audio signals include speech but also music and all types of sounds. On a compact
disc, the audio signal is limited to a maximum frequency of 20 kHz. It is sampled
at 44.1 kHz and each sample is coded on 16 bits. The bit rate of the resulting PCM
code is 706 kb/s. For compact discs and digital audio tapes, signals must be coded
with hardly any noticeable distortion. This is also true for multimedia CD-ROM and
digital television sounds.
No models are available for general audio signals. At present, the best compression is achieved by transform coders that decompose the signal in a local
time-frequency basis.To reduce perceived distortion,perceptual coders [317] adapt
the quantization of time-frequency coefficients to our hearing sensitivity. Compact
disc–quality sounds are restored with 128 kb/s; nearly perfect audio signals are
obtained with 64 kb/s.
Images
A gray-level image typically has 512 ⫻ 512 pixels, each coded with 8 bits. Like audio
signals,images include many types of structures that are difficult to model. Currently,
the best image-compression algorithms are the JPEG and JPEG-2000 compression
standards, which are transform codes in cosine bases and wavelet bases.
The efficiency of these bases comes from their ability to construct precise
nonlinear image approximations with few nonzero vectors. With fewer than 1
bit/pixel, visually perfect images are reconstructed. At 0.25 bit/pixel, the image
remains of good quality.
10.1 Transform Coding
Video
Applications of digital video range from low-quality videophones, teleconferencing,
and Internet video browsing, to high-resolution television. The most effective compression algorithms remove time redundancy with a motion compensation. Local
image displacements are measured from one frame to the next, and are coded as
motion vectors. Each frame is predicted from a previous one by compensating for
the motion. An error image is calculated and compressed with a transform code.
The MPEG video-compression standards are based on such motion compensation
[344] with a JPEG-type compression of prediction error images.
Standard-definition television (SDTV) format has interlaced images of 720 ⫻ 426
(NTSC) or 720 ⫻ 576 pixels (PAL) with, respectively, 50 and 60 images per second.
MPEG-2 codes these images with typically 5 Mb/s. Internet videos are often smaller
images of 320 ⫻ 240 pixels with typically 20 or 30 images per second, and are often
coded with 200 to 300 kb/s for real-time browsing.The full high-definition television
(HDTV) format corresponds to images of 1920 ⫻ 1080 pixels. MPEG-2 codes these
images with 12 to 24 Mb/s. With MPEG-4, the bit rate goes down to 12 Mb/s or less.
10.1.2 Compression in Orthonormal Bases
A transform coder decomposes signals in an orthonormal basis B ⫽ {gm }0⭐m⬍N and
optimizes the compression of the decomposition coefficients. The performance of
such a transform code is first studied from a Bayes point of view, by supposing
that the signal is the realization of a random process F [n] of size N , which has a
probability distribution that is known a priori.
Let us decompose F over B:
F⫽
N
⫺1
FB [m] gm .
m⫽0
Each coefficient FB [m] is a random variable defined by
FB [m] ⫽ F , gm ⫽
N
⫺1
∗
F [n] gm
[n].
n⫽0
To center the variations of FB [m] at zero, we code FB [m] ⫺ E{FB [m]} and store the
mean value E{FB [m]}. This is equivalent to supposing that FB [m] has a zero mean.
Quantization
To construct a finite code, each coefficient FB [m] is approximated by a quantized
variable F̃B [m], which takes its values over a finite set of real numbers. A scalar
quantization approximates each FB [m] independently. If the coefficients FB [m]
are highly dependent, quantizer performance is improved by vector quantizers
that approximate the vector of N coefficients {FB [m]}0⭐m⬍N together [27]. Scalar
quantizers require fewer computations and are thus more often used. If the basis
483
484
CHAPTER 10 Compression
{gm }0⭐m⬍N can be chosen so that the coefficients FB [m] are nearly independent,
the improvement of a vector quantizer becomes marginal. After quantization, the
reconstructed signal is
F̃ ⫽
N
⫺1
F̃B [m] gm .
m⫽0
Distortion Rate
Let us evaluate the distortion introduced by this quantization. Ultimately,we want to
restore a signal that is perceived as nearly identical to the original signal. Perceptual
transform codes are optimized with respect to our sensitivity to degradations in
audio signals and images [317]. However, distances that evaluate perceptual errors
are highly nonlinear and thus difficult to manipulate mathematically. A mean-square
norm often does not properly quantify the perceived distortion,but reducing a meansquare distortion generally enhances the coder performance.Weighted mean-square
distances can provide better measurements of perceived errors and are optimized
like a standard mean-square norm.
In the following,we try to minimize the average coding distortion,evaluated with
a mean-square norm. Since the basis is orthogonal, this distortion can be written as
d ⫽ E{F ⫺ F̃ 2 } ⫽
N
⫺1
E{|FB [m] ⫺ F̃B [m]|2 }.
m⫽0
The average number of bits allocated to encode a quantized coefficient F̃B [m] is
denoted as Rm . For a given Rm ,a scalar quantizer is designed to minimize E{|FB [m] ⫺
F̃B [m]|2 }.The total mean-square distortion d depends on the average total bit budget
R⫽
N
⫺1
Rm .
m⫽0
The function d(R) is called the distortion rate. For a given R, the bit allocation
{Rm }0⭐m⬍N must be adjusted in order to minimize d(R).
Choice of Basis
The distortion rate of an optimized transform code depends on the orthonormal
basis B. We see in Section 10.3.2 that the Karhunen-Loève basis minimizes d(R) for
high-resolution quantizations of signals that are realizations of a Gaussian process.
However, this is not true when the process is non-Gaussian.
To achieve a high compression rate, the transform code must produce many
zero-quantized coefficients, and thus define a sparse signal representation. Section
10.4 shows that d(R) then depends on the precision of nonlinear approximations
in the basis B.
10.2 Distortion Rate of Quantization
10.2 DISTORTION RATE OF QUANTIZATION
Quantized coefficients take their values over a finite set and can thus be coded with
a finite number of bits. Section 10.2.1 reviews entropy codes of random sources.
Section 10.2.2 studies the optimization of scalar quantizers, in order to reduce the
mean-square error for a given bit allocation.
10.2.1 Entropy Coding
Let X be a random source that takes its values among a finite alphabet of K symbols
A ⫽ {xk }1⭐k⭐K . The goal is to minimize the average bit rate needed to store the
values of X. We consider codes that associate to each symbol xk a binary word wk
of length lk . A sequence of values produced by the source X is coded by aggregating
the corresponding binary words.
All symbols xk can be coded with binary words of the same size lk ⫽ log2 K bits. However, the average code length may be reduced with a variable-length
code using smaller binary words for symbols that occur frequently. Let us denote
with pk the probability of occurrence of a symbol xk :
pk ⫽ Pr{X ⫽ xk }.
The average bit rate to code each symbol emitted by the source X is
RX ⫽
K
lk p k .
(10.1)
k⫽1
We want to optimize the code words {wk }1⭐k⭐K in order to minimize RX .
Prefix Code
Codes with words of varying lengths are not always uniquely decodable. Let us
consider the code that associates to {xk }1⭐k⭐4 the code words
{w1 ⫽ 0, w2 ⫽ 10, w3 ⫽ 110, w4 ⫽ 101}.
(10.2)
The sequence 1010 can either correspond to w2 w2 or to w4 w1 . To guarantee that
any aggregation of code words is uniquely decodable, the prefix condition imposes
that no code word may be the prefix (beginning) of another one. The code (10.2)
does not satisfy this condition since w2 is the prefix of w4 . The code
{w1 ⫽ 0, w2 ⫽ 10, w3 ⫽ 110, w4 ⫽ 111}
satisfies this prefix condition. Any code that satisfies the prefix condition is clearly
uniquely decodable.
A prefix code is characterized by a binary tree that has K leaves corresponding
to the symbols {xk }1⭐k⭐K . Figure 10.1 shows an example for a prefix code of K ⫽ 6
symbols. The left and right branches of the binary tree are, respectively, coded by
0 and 1. The binary code word wk associated to xk is the succession of 0 and 1
485
486
CHAPTER 10 Compression
1
0
0
1
w6
00
w5
1
1 0
0
01
w4 w3 w2 w1
100 101 110 111
0
1
FIGURE 10.1
Prefix tree corresponding to a code with six symbols. The code word wk of each leaf is
indicated below it.
corresponding, respectively, to the left and right branches along the path from the
root to the leaf xk . The binary code produced by such a binary tree is always a
prefix code. Indeed, wm is a prefix of wk if and only if xm is an ancestor of xk in
the binary tree. This is not possible since both symbols correspond to a leaf of the
tree. Conversely, we can verify that any prefix code can be represented by such a
binary tree.
The length lk of the code word wk is the depth in the binary tree of the
corresponding leaf. Thus, the optimization of a prefix code is equivalent to the
construction of an optimal binary tree that distributes the depth of the leaves in
order to minimize
RX ⫽
K
lk p k .
(10.3)
k⫽1
Therefore, higher-probability symbols should correspond to leaves higher in the
tree.
Shannon Entropy
The Shannon theorem [429] proves that entropy is a lower bound for the average
bit rate RX of any prefix code.
Theorem 10.1: Shannon. Let X be a source with symbols {xk }1⭐k⭐K that occur with
probabilities {pk }1⭐k⭐K . The average bit rate RX of a prefix code satisfies
RX ⭓ H(X) ⫽ ⫺
K
pk log2 pk .
(10.4)
k⫽1
Moreover, there exists a prefix code such that
RX ⭐ H(X) ⫹ 1
and H(X) is called the entropy of X.
Proof. This theorem is based on the Kraft inequality given by Lemma 10.1.
(10.5)
10.2 Distortion Rate of Quantization
Lemma 10.1: Kraft. Any prefix code satisfies
K
2⫺lk ⭐ 1.
(10.6)
k⫽1
Conversely, if {lk }1⭐k⭐K is a positive sequence that satisfies (10.6), then a sequence of
binary words {wk }1⭐k⭐K of length {lk }1⭐k⭐K exists that satisfies the prefix condition.
To prove (10.6), we construct a full binary tree T the leaves of which are at the depth
m ⫽ max{l1 , l2 , . . . , lK }. Inside this tree, we can locate node nk at depth lk that codes
the binary word wk . We denote Tk to the subtree with the root of nk , as illustrated in
Figure 10.2. This subtree has a depth m ⫺ lk and thus contains 2m⫺lk nodes at the level
m of T . There are 2m nodes at the depth m of T and the prefix condition implies that the
subtrees T1 , . . . , TK have no node in common, so
K
2m⫺lk ⭐ 2m ,
k⫽1
which proves (10.6).
Conversely,we consider {lk }1⭐k⭐K that satisfies (10.6),with l1 ⭐ l2 ⭐ · · · ⭐ lK and m ⫽
max{l1 , l2 , . . . , lK }. Again, we construct a full binary tree T with leaves at depth m. Let S1
be the 2m⫺l1 first nodes
m, S2 be the next 2m⫺l2 nodes, and so on, as illustrated
K at level
m⫺l
k
in Figure 10.2. Since k⫽1 2
⭐ 2m , the sets {Sk }1⭐k⭐K have fewer than 2m elements
and can thus be constructed at level m of the tree. The nodes of a set Sk are the leaves of
a subtree Tk of T . The root nk of Tk is at depth lk and corresponds to a binary word wk .
By construction, all these subtrees Tk are distinct, so {wk }1⭐k⭐K is a prefix code where
each code word wk has a length lk . This finishes the lemma proof.
To prove the two inequalities (10.4) and (10.5) of the theorem, we consider the
minimization of
RX ⫽
K
p k lk
k⫽1
T
l1
n1
T1
2m2l1
S1
n2
T2
m
n3
T3
2m2l2
S2
2m2l3
S3
FIGURE 10.2
The leaves at depth m of tree T are regrouped as sets Sk of 2m⫺lk nodes that are the leaves of
tree Tk , having its root nk at depth lk . Here, m ⫽ 4 and l1 ⫽ 2, so S1 has 22 nodes.
487
488
CHAPTER 10 Compression
under the Kraft inequality constraint
K
2⫺lk ⭐ 1.
k⫽1
If we admit noninteger values for lk , we can verify with Lagrange multipliers that the
minimum is reached for lk ⫽ ⫺ log2 pk . The value of this minimum is the entropy lower
bound:
RX ⫽
K
p k lk ⫽ ⫺
k⫽1
K
pk log2 pk ⫽ H(X),
k⫽1
which proves (10.4).
To guarantee that lk is an integer, the Shannon code is defined by
lk ⫽ ⫺ log2 pk ,
(10.7)
where x is the smallest integer larger than x. Since lk ⭓ ⫺ log2 pk , the Kraft inequality
is satisfied:
K
2⫺lk ⭐
k⫽1
K
2log2 pk ⫽ 1.
k⫽1
Lemma 10.1 proves that there exists a prefix code with binary words wk that have length
wk . For this code,
RX ⫽
K
k⫽1
p k lk ⭐
K
pk (⫺ log2 pk ⫹ 1) ⫽ H(X) ⫹ 1,
k⫽1
■
which proves (10.5).
The entropy H(X) measures the uncertainty as to the outcome of the random
variable X, and
0 ⭐ H(X) ⭐ log2 K .
The maximum value log2 K corresponds to a sequence with a uniform probability
distribution pk ⫽ 1/K for 1 ⭐ k ⭐ K . Since no value is more probable than any other,
the uncertainty as to the outcome of X is maximum. The minimum entropy value
H(X) ⫽ 0 corresponds to a source where one symbol xk occurs with probability 1.
There is no uncertainty as to the outcome of X because we know in advance that it
will be equal to xk .
Huffman Code
The entropy lower bound H(X) is nearly reachable with an optimized prefix code.
The Huffman algorithm is a dynamical programming
algorithm that constructs a
binary tree that minimizes the average bit rate RX ⫽ K
k⫽1 pk lk . This tree is called
10.2 Distortion Rate of Quantization
an optimal prefix–code tree. Theorem 10.2 gives an induction rule that constructs
the tree from bottom up by aggregating lower-probability symbols.
Theorem 10.2: Huffman. Let us consider K symbols with their probability of occurrence
sorted in increasing order pk ⭐ pk⫹1 :
(10.8)
(x1 , p1 ), (x2 , p2 ), (x3 , p3 ), . . . , (xK , pK ) .
We aggregate the two lower-probability symbols x1 and x2 in a single symbol x1,2 of
probability
p1,2 ⫽ p1 ⫹ p2 .
An optimal prefix–code tree for the K symbols (10.8) is obtained by constructing an
optimal prefix–code tree for the K ⫺ 1 symbols,
{(x1,2 , p1,2 ), (x3 , p3 ), . . . , (xK , pK )},
(10.9)
and by dividing the leaf x1,2 into two children nodes corresponding to x1 and x2 .
The proof of this theorem [27, 307] is left to the reader. The Huffman rule
reduces the construction of an optimal prefix–code tree of K symbols (10.8) to
the construction of an optimal code of K ⫺ 1 symbols (10.9) plus an elementary
operation. The Huffman algorithm iterates this regrouping K ⫺ 1 times to grow an
optimal prefix–code tree progressively from bottom to top. The Shannon theorem
(10.1) proves that the average bit rate of the Huffman optimal prefix code satisfies
H(X) ⭐ RX ⭐ H(X) ⫹ 1.
(10.10)
As explained in the proof of Theorem 10.1, the bit rate may be up to 1 bit more than
the entropy lower bound because this lower bound is obtained with lk ⫽ ⫺ log2 pk ,
which is generally not possible since lk must be an integer. In particular, lower bit
rates are achieved when one symbol has a probability close to 1.
EXAMPLE 10.1
We construct the Huffman code with six symbols {xk }1⭐k⭐6 of probabilities
p1 ⫽ 0.05, p2 ⫽ 0.1, p3 ⫽ 0.1, p4 ⫽ 0.15, p5 ⫽ 0.2, p6 ⫽ 0.4 .
The symbols x1 and x2 are the lower-probability symbols, which are regrouped in a symbol
x1,2 with a probability of p1,2 ⫽ p1 ⫹ p2 ⫽ 0.15. At the next iteration, the lower probabilities are
p3 ⫽ 0.1 and p1,2 ⫽ 0.15, so we regroup x1,2 and x3 in a symbol x1,2,3 with a probability of
0.25. The next two lower-probability symbols are x4 and x5 , which are regrouped in a symbol
x4,5 with a probability of 0.35. We then group x4,5 and x1,2,3 , which yields x1,2,3,4,5 with
a probability of 0.6, which is finally aggregated with x6 . This finishes the tree, as illustrated
in Figure 10.3. The resulting average bit rate (10.3) is RX ⫽ 2.35, whereas the entropy is
489
490
CHAPTER 10 Compression
0
1
x6
0.4
0
1
x1, 2, 3
0
x5
0.2
x4, 5
1
1
0
x3
x4
0.15 0.1
x1, 2
0
1
x2
0.1
x1
0.05
FIGURE 10.3
Prefix tree grown with the Huffman algorithm for a set of K ⫽ 6 symbols xk , with probabilities
pk indicated at the leaves of the tree.
H(X) ⫽ 2.28. This Huffman code is better than the prefix code in Figure 10.1, which has an
average bit rate of RX ⫽ 2.4.
Block Coding
As mentioned earlier, the inequality (10.10) shows that a Huffman code may require
1 bit above the entropy because the length lk of each binary word must be an
integer, whereas the optimal value ⫺log2 pk is generally a real number. To reduce
this overhead the symbols are coded together in blocks of size n.
⫽ X1 , . . . , Xn ,
Let us consider the block of n independent random variables X
where each Xk takes its values in the alphabet A ⫽ {xk }1⭐k⭐K with the same proba can be considered as a random variable taking
bility distribution as X. The vector X
n
its values in the alphabet A of size K n . To each block of symbols s ∈ An , we associate a binary word of length l(s). The average number of bits per symbol for such
a code is
1 RX ⫽
p(s) l(s).
n
n
s∈A
Theorem 10.3 proves that the resulting Huffman code has a bit rate that converges
to the entropy of X as n increases.
Theorem 10.3. The Huffman code for a block of size n requires an average number of
bits per symbol that satisfies
1
H(X) ⭐ RX ⭐ H(X) ⫹ .
(10.11)
n
considered as a random variable is
Proof. The entropy of X
⫽
H(X)
s∈An
p(s) log2 p(s).
10.2 Distortion Rate of Quantization
Applying (10.10) shows
Denote by RX the average number of bits to code each block X.
that with a Huffman code, RX satisfies
⭐R
X ⭐ H(X)
⫹ 1.
H(X)
(10.12)
are independent,
Since the random variables Xi that compose X
p(s) ⫽ p(s1 , . . . , sn ) ⫽
n
p(si ).
i⫽1
⫽ n H(X), and since R ⫽ R
/n, we obtain (10.11) from
Thus, we derive that H(X)
(10.12).
■
Coding together the symbols in blocks is equivalent to coding each symbol xk
with an average number of bits lk that is not an integer. This explains why block
coding can nearly reach the entropy lower bound. The Huffman code can also be
adaptively modified for long sequences in which the probability of occurrence of the
symbols may vary [18].The probability distribution is computed from the histogram
(cumulative distribution) of the N most recent symbols that were decoded. The
next N symbols are coded with a new Huffman code calculated from the updated
probability distribution. However, recomputing the Huffman code after updating
the probability distribution is computationally expensive. Arithmetic codes have a
causality structure that makes it easier to adapt the code to a varying probability
distribution.
Arithmetic Code
Like a block Huffman code,an arithmetic code [411] records the symbols {xk }1⭐k⭐K
in blocks to be coded. However, an arithmetic code is more structured; it progressively constructs the code of a whole block as each symbol is taken into account.
When the probability pk of each symbol xk is not known, an adaptive arithmetic
code progressively learns the probability distribution of the source and adapts the
encoding.
We consider a block of symbols s ⫽ s1 , s2 , . . . , sn produced by a random vector
⫽ X1 , . . . , Xn of n independent random variables. Each Xk has the same probabilX
ity distribution p(x) as the source X, with p(xj ) ⫽ pj . An arithmetic code represents
each s by an interval [an , an ⫹ bn ] included in [0, 1], with a length equal to the
probability of occurrence of this sequence:
bn ⫽
n
p(sk ).
k⫽1
This interval is defined by induction as follows. We initialize a0 ⫽ 0 and b0 ⫽ 1.
Let [ai , ai ⫹ bi ] be the interval corresponding to the first i symbols s1 , . . . , si . Suppose that the next symbol si⫹1 is equal to xj so that p(si⫹1 ) ⫽ pj . The new interval
[ai⫹1 , ai⫹1 ⫹ bi⫹1 ] is a subinterval of [ai , ai ⫹ bi ] with a size reduced by pj :
ai⫹1 ⫽ ai ⫹ bi
j⫺1
k⫽1
pk
and
bi⫹1 ⫽ bi pj .
491
492
CHAPTER 10 Compression
The final interval [an , an ⫹ bn ] characterizes the sequence s1 , . . . , sn unambiguously because the K n different blocks of symbols s correspond to K n different
intervals that make a partition of [0, 1]. Since these intervals are nonoverlapping,
[an , an ⫹ bn ] is characterized by coding a number cn ∈ [an , an ⫹ bn ] in binary form.
The binary expression of the chosen numbers cn for each of the K n intervals defines
a prefix code so that a sequence of such numbers is uniquely decodable.The value of
cn is progressively calculated by adding refinement bits when [ai , ai ⫹ bi ] is reduced
in the next subinterval [ai⫹1 , ai⫹1 ⫹ bi⫹1 ] until [an , an ⫹ bn ].
There are efficient implementations that avoid numerical errors caused by the
finite precision of arithmetic calculations when calculating cn [488]. The resulting
binary number cn has dn digits with
⫺log2 bn ⭐ dn ⭐ ⫺log2 bn ⫹ 2.
n
Since log2 bn ⫽ i⫽1 log2 p(si ) and H(X) ⫽ E{log2 X}, one can verify that the
average number of bits per symbol of this arithmetic code satisfies
H(X) ⭐ RX ⭐ H(X) ⫹
2
.
n
(10.13)
When the successive values Xk of the blocks are not independent, the upper and
lower bounds (10.13) remain valid because the successive symbols are encoded as
if they were independent.
An arithmetic code has a causal structure in the sense that the first i symbols of
a sequence s1 , . . . , si , si⫹1 , . . . , sn are specified by an interval [ai , ai ⫹ bi ] that does
not depend on the value of the last n ⫺ i symbols. Since the sequence is progressively coded and decoded,one can implement an adaptive version that progressively
learns the probability distribution p(x) [377, 412]. When coding si⫹1 , this probability distribution can be approximated by the histogram (cumulative distribution)
pi (x) of the first i symbols.The subinterval of [ai , ai ⫹ bi ] associated to si⫹1 is calculated with this estimated probability distribution. Suppose that si⫹1 ⫽ xj ; we denote
pi (xj ) ⫽ pi,j . The new interval is defined by
ai⫹1 ⫽ ai ⫹ bi
j⫺1
pi,k
and
bi⫹1 ⫽ bi pi,j .
(10.14)
k⫽1
The decoder is able to recover si⫹1 by recovering the first i symbols of the sequence
and computing the cumulative probability distribution pi (x) of these symbols. The
interval [ai⫹1 , ai⫹1 ⫹ bi⫹1 ] is then calculated from [ai , ai ⫹ bi ] with (10.14). The
initial distribution p0 (x) can be set to be uniform.
If the symbols of the block are produced by independent random variables,
then as i increases, the estimated probability distribution pi (x) converges to the
probability distribution p(x) of the source. As the total block size n increases to ⫹⬁,
one can prove that the average bit rate of this adaptive arithmetic code converges
to the entropy of the source. Under weaker Markov random-chain hypotheses this
result also remains valid [412].
10.2 Distortion Rate of Quantization
Noise Sensitivity
Huffman and arithmetic codes are more compact than a simple fixed-length code
of size log2 K , but they are also more sensitive to errors. For a constant-length code,
a single bit error modifies the value of only one symbol. In contrast, a single bit
error in a variable-length code may modify the whole symbol sequence. In noisy
transmissions where such errors might occur, it is necessary to use an error correction code that introduces a slight redundancy in order to suppress the transmission
errors [18].
10.2.2 Scalar Quantization
If the source X has arbitrary real values, it cannot be coded with a finite number of
bits. A scalar quantizer Q approximates X by X̃ ⫽ Q(X), which takes its values over
a finite set. We study the optimization of such a quantizer in order to minimize the
number of bits needed to code X̃ for a given mean-square error
d ⫽ E{(X ⫺ X̃)2 }.
Suppose that X takes its values in [a, b], which may correspond to the whole
real axis. We decompose [a, b] in K intervals {( yk⫺1 , yk ]}1⭐k⭐K of variable length,
with y0 ⫽ a and yK ⫽ b. A scalar quantizer approximates all x ∈ ( yk⫺1 , yk ] by xk :
᭙x ∈ ( yk⫺1 , yk ],
Q(x) ⫽ xk .
The intervals ( yk⫺1 , yk ] are called quantization bins. Rounding off integers is a
simple example where the quantization bins ( yk⫺1 , yk ] ⫽ (k ⫺ 12 , k ⫹ 12 ] have size 1
and xk ⫽ k for any k ∈ Z.
High-Resolution Quantizer
Let p(x) be the probability density of the random source X. The mean-square
quantization error is
⫹⬁ 2
2
d ⫽ E{(X ⫺ X̃) } ⫽
(10.15)
x ⫺ Q(x) p(x) dx.
⫺⬁
A quantizer is said to have a high resolution if p(x) is approximately constant on
each quantization bin ( yk⫺1 , yk ] of size ⌬k ⫽ yk ⫺ yk⫺1 . This is the case if the sizes
⌬k are sufficiently small relative to the rate of variation of p(x), so that one can
neglect these variations in each quantization bin. We then have
p(x) ⫽
pk
⌬k
for x ∈ ( yk⫺1 , yk ],
(10.16)
where
pk ⫽ Pr{X ∈ ( yk⫺1 , yk ]}.
Theorem 10.4 computes the mean-square error under this high-resolution
hypothesis.
493
494
CHAPTER 10 Compression
Theorem 10.4. For a high-resolution quantizer, the mean-square error d is minimized
when xk ⫽ ( yk ⫹ yk⫺1 )/2, which yields
1 pk ⌬2k .
12 k⫽1
K
d⫽
(10.17)
Proof. The quantization error (10.15) can be rewritten as
d⫽
K yk
(x ⫺ xk )2 p(x) dx.
k⫽1 yk⫺1
Replacing p(x) by its expression (10.16) gives
d⫽
K
pk yk
⌬
k⫽1 k
(x ⫺ xk )2 dx.
(10.18)
yk⫺1
One can verify that each integral is minimum for xk ⫽ ( yk ⫹ yk⫺1 )/2, which yields
(10.17).
■
Uniform Quantizer
The uniform quantizer is an important special case where all quantization bins have
the same size
yk ⫺ yk⫺1 ⫽ ⌬ for
1⭐k⭐K.
For a high-resolution uniform quantizer, the average quadratic distortion (10.17)
becomes
⌬2 ⌬2
pk ⫽ .
12 k⫽1
12
K
d⫽
(10.19)
It is independent of the probability density p(x) of the source.
Entropy Constrained Quantizer
We want to minimize the number of bits required to code the quantized values
X̃ ⫽ Q(X) for a fixed distortion d ⫽ E{(X ⫺ X̃)2 }.The Shannon theorem (10.1) proves
that the minimum average number of bits to code X̃ is the entropy H(X̃). Huffman
and arithmetic codes produce bit rates close to this entropy lower bound. Thus, we
design a quantizer that minimizes H(X̃).
The quantized source X̃ takes K possible values {xk }1⭐k⭐K with probabilities
yk
pk ⫽ Pr(X̃ ⫽ xk ) ⫽ Pr(X ∈ ( yk⫺1 , yk ]) ⫽
p(x) dx.
yk⫺1
Its entropy is
H(X̃) ⫽ ⫺
K
k⫽1
pk log2 pk .
10.2 Distortion Rate of Quantization
For a high-resolution quantizer, Theorem 10.5 by Gish and Pierce [273] relates
H(X̃) to the differential entropy of X defined by
⫹⬁
Hd (X) ⫽ ⫺
p(x) log2 p(x) dx.
(10.20)
⫺⬁
Theorem 10.5: Gish, Pierce. If Q is a high-resolution quantizer with respect to p(x),
then
1
H(X̃) ⭓ Hd (X) ⫺ log2 (12 d).
(10.21)
2
This inequality is an equality if and only if Q is a uniform quantizer.
Proof. By definition, a high-resolution quantizer satisfies (10.16), so pk ⫽ p(x)⌬k for x ∈
( yk⫺1 , yk ]. Thus,
H(X̃) ⫽ ⫺
K
pk log2 pk
k⫽1
⫽⫺
K yk
k⫽1 yk⫺1
p(x) log2 p(x) dx ⫺
K
pk log2 ⌬k
k⫽1
1
pk log2 ⌬2k .
2 k⫽1
K
⫽ Hd (X) ⫺
(10.22)
KThe Jensen inequality for a concave function (x) proves that if pk ⭓ 0 with
k⫽1 pk ⫽ 1, then for any {ak }1⭐k⭐K ,
K
K
pk (ak ) ⭐
p k ak .
k⫽1
(10.23)
k⫽1
If (x) is strictly concave, the inequality is an equality if and only if all ak are equal when
pk ⫽ 0. Since log2 (x) is strictly concave, we derive from (10.17) and (10.23) that
K
K
1
1
1
pk log2 (⌬2k ) ⭐ log2
pk ⌬2k ⫽ log2 (12 d).
2 k⫽1
2
2
k⫽1
Inserting this in (10.22) proves that
H(X̃) ⭓ Hd (X) ⫺
1
log2 (12 d).
2
This inequality is an equality if and only if all ⌬k are equal,which corresponds to a uniform
quantizer.
■
This theorem proves that for a high-resolution quantizer, the minimum average
bit rate RX ⫽ H(X̃) is achieved by a uniform quantizer and
1
RX ⫽ Hd (X) ⫺ log2 (12 d).
(10.24)
2
In this case, d ⫽ ⌬2 /12, so
RX ⫽ Hd (X) ⫺ log2 ⌬.
(10.25)
495
496
CHAPTER 10 Compression
The distortion rate is obtained by taking the inverse of (10.24):
d(RX ) ⫽
1 2Hd (X) ⫺2RX
2
.
2
12
(10.26)
10.3 HIGH BIT RATE COMPRESSION
Section 10.3.1 studies the distortion rate performance of a transform coding computed with high-resolution quantizers. For Gaussian processes, Section 10.3.2
proves that the optimal basis is the Karhunen-Loève basis. An application to audio
compression is studied in Section 10.3.3.
10.3.1 Bit Allocation
Let us optimize the transform code of a random vector F [n] decomposed in an
orthonormal basis {gm }0⭐m⬍N :
F⫽
N
⫺1
FB [m] gm .
m⫽0
Each FB [m] is a zero-mean source that is quantized into F̃B [m] with an average
bit budget Rm . For a high-resolution quantization, Theorem 10.5 proves that the
error dm ⫽ E{|FB [m] ⫺ F̃B [m]|2 } is minimized with a uniform scalar quantization,
and Rm ⫽ Hd (X) ⫺ log2 ⌬m where ⌬m is the bin size.
In many applications, the overall bit budget R is fixed by some memory or
transmission bandwidth constraints. Thus, we need to optimize the choice of the
quantization steps {⌬m }0⭐m⬍N to minimize the total distortion
d⫽
N
⫺1
dm
m⫽0
for a fixed-bit budget
R⫽
N
⫺1
Rm .
m⫽0
The following bit allocation theorem (10.6) proves that the transform code is
optimized when all ⌬m are equal, by minimizing the distortion rate Lagrangian
L(R, d) ⫽ d ⫹ R ⫽
N
⫺1
(dm ⫹ Rm ).
(10.27)
m⫽0
Theorem 10.6. For a fixed-bit budget R with a high-resolution quantization, the total
distortion d is minimum for
⌬2m ⫽ 22Hd 2⫺2R̄
for
0⭐m⬍N
(10.28)
10.3 High Bit Rate Compression
with
N ⫺1
R
N
R⫽
Hd ⫽
and
1 Hd (FB [m]).
N m⫽0
The resulting distortion rate is
d(R̄) ⫽
N 2Hd ⫺2R̄
2
.
2
12
(10.29)
Proof. For uniform high-resolution quantizations, (10.26) proves that
dm (Rm ) ⫽
1 2Hd (FB [m]) ⫺2Rm
2
2
12
(10.30)
is a convex function of Rm , and the bit budget condition can be written as
R⫽
N
⫺1
Rm ⫽
m⫽0
N
⫺1
Hd (FB [m]) ⫺
m⫽0
N
⫺1
1
log2 (12 dm ).
2
m⫽0
(10.31)
⫺1
N ⫺1
The minimization of d ⫽ N
m⫽0 dm under the equality constraint
m⫽0 Rm ⫽ R is thus a
convex minimization obtained
by
finding
the
multiplier
,which
minimizes
the distortion
⫺1
rate Lagrangian L(R, d) ⫽ N
m⫽0 (dm (Rm ) ⫹ Rm ). At the minimum,the relative variation
of the distortion rate for each coefficient is constant and equal to the Lagrange multiplier:
⫽⫺
⭸dm (Rm )
⫽ 2dm loge 2
⭸Rm
for
0⭐m⬍N.
Thus,
⌬2m /12 ⫽ dm ⫽
d
⫽
.
N 2 loge 2
(10.32)
Since dm ⫽ d/N , the bit budget condition (10.31) becomes
R⫽
N
⫺1
Hd (FB [m]) ⫺
m⫽0
N
log2
2
12 d
N
.
Inverting this equation gives the expression (10.29) of d(R) and inserting this result in
(10.32) yields (10.28).
■
This theorem shows that the transform code is optimized if it introduces the
same expected error dm ⫽ ⌬2m /12 ⫽ d/N along each direction gm of the basis B.
Then, the number of bits Rm used to encode FB [m] depends only on its differential
entropy:
Rm ⫽ Hd (FB [m]) ⫺
1
log2
2
12d
.
N
(10.33)
2 be the variance of F [m], and let F [m]/ be the normalized random
Let m
m
B
B
variable of variance 1. A simple calculation shows that
Hd (FB [m]) ⫽ Hd (FB [m]/m ) ⫹ log2 m .
497
498
CHAPTER 10 Compression
The “optimal bit allocation” Rm in (10.33) may become negative if the variance
m is too small, which is clearly not an admissible solution. In practice, Rm must
be a positive integer, but with this condition the resulting optimal solution has no
simple analytic expression (Exercise 10.8). If we neglect the integer bit constraint,
(10.33) gives the optimal bit allocation as long as Rm ⭓ 0.
Weighted Mean-Square Error
We mentioned that a mean-square error often does not measure the perceived distortion of images or audio signals very well. When the vectors gm are localized in
time and frequency, a mean-square norm sums the errors at all times and frequencies with equal weights. Thus, it hides the temporal and frequency properties of the
error F ⫺ F̃ . Better norms can be constructed by emphasizing certain frequencies
more than others in order to match our audio or visual sensitivity,which varies with
the signal frequency. A weighted mean-square norm is defined by
d⫽
N
⫺1
dm
,
w2
m⫽0 m
(10.34)
2}
where {wm
0⭐m⬍N are constant weights.
Theorem 10.6 applies to weighted mean-square errors by observing that
d⫽
N
⫺1
w
dm
,
m⫽0
w ⫽ d /w2 is the quantization error of F w [m] ⫽ F [m]/w .Theorem 10.6
where dm
m
m
B
m
B
proves that bit allocation is optimized by uniformly quantizing all F w [m] with the
B
same bin size ⌬. This implies that coefficients FB [m] are uniformly quantized with
2 d/N . As expected, larger weights
a bin size ⌬m ⫽ ⌬ wm ; it follows that dm ⫽ wm
increase the error in the corresponding direction. The uniform quantization Q⌬m
with bins of size ⌬m can be computed from a quantizer Q that associates to any real
number its closest integer:
FB [m]
FB [m]
Q⌬m FB [m] ⫽ ⌬m Q
⫽ ⌬ wm Q
.
(10.35)
⌬m
⌬ wm
10.3.2 Optimal Basis and Karhunen-Loève
Transform code performance depends on the choice of an orthonormal basis B.
For high-resolution quantizations, (10.29) proves that the distortion rate d(R̄) is
optimized by choosing a basis B that minimizes the average differential entropy
N ⫺1
Hd ⫽
1 Hd (FB [m]).
N m⫽0
In general, we do not know how to compute this optimal basis because the
probability density of FB [m] ⫽ F , gm may depend on gm in a complicated way.
10.3 High Bit Rate Compression
Gaussian Process
If F is a Gaussian random vector, then the coefficients FB [m] are Gaussian random
variables in any basis. In this case, the probability density of FB [m] depends only
2:
on the variance m
pm (x) ⫽
m
1
√
2
exp
⫺x 2
.
2
2m
With a direct integration, we verify that
⫹⬁
√
Hd FB [m] ⫽ ⫺
pm (x) log2 pm (x) dx ⫽ log2 m ⫹ log2 2e.
⫺⬁
Inserting this expression in (10.29) yields
d(R̄) ⫽ N
e 2 ⫺2R̄
2
,
6
(10.36)
where 2 is the geometrical mean of all variances:
⫽
2
N
⫺1
1/N
2
m
.
m⫽0
Therefore, the basis must be chosen to minimize 2 .
Theorem 10.7. The geometrical mean variance 2 is minimized in a Karhunen-Loève
basis of F .
Proof. Let K be the covariance operator of F ,
2
m
⫽ Kgm , gm .
Observe that
N ⫺1
log2 2 ⫽
1 log (Kgm , gm ).
N m⫽0 2
(10.37)
⫺1
Lemma 10.2 proves that since C(x) ⫽ log2 (x) is strictly concave, N
m⫽0 log2 (Kgm , gm )
is minimum if and only if {gm }0⭐m⬍N diagonalizes K , and thus if it is a Karhunen-Loève
basis.
Lemma 10.2. Let K be a covariance
and {gm }0⭐m⬍N be an orthonormal basis.
operator
⫺1
If C(x) is strictly concave, then N
m⫽0 C(Kgm , gm ) is minimum if and only if K is
diagonal in this basis.
To prove this lemma, let us consider a Karhunen-Loève basis {hm }0⭐m⬍N that
diagonalizes K . As in (9.26), by decomposing gm in the basis {hi }0⭐i⬍N , we obtain
Kgm , gm ⫽
N
⫺1
i⫽0
| gm , hi |2 K hi , hi .
(10.38)
499
500
CHAPTER 10 Compression
⫺1
2
Since N
i⫽0 | gm , hi | ⫽ 1, applying the Jensen inequality (A.2) to the concave function
C(x) proves that
⫺1
N
C Kgm , gm ⭓
| gm , hi |2 C (K hi , hi ).
(10.39)
i⫽0
Thus,
N
⫺1
⫺1
⫺1 N
N
C Kgm , gm ⭓
| gm , hi |2 C (K hi , hi ).
m⫽0
Since
N ⫺1
m⫽0 i⫽0
m⫽0 | gm , hi |
2 ⫽ 1, we derive that
N
⫺1
⫺1
N
C Kgm , gm ⭓
C (K hi , hi ).
m⫽0
i⫽0
This inequality is an equality if and only if for all m (10.39) is an equality. Since C(x) is
strictly concave,this is possible only if all values K hi , hi are equal as long as gm , hi ⫽ 0.
Thus, we derive that gm belongs to an eigenspace of K and is also an eigenvector of K .
Thus, {gm }0⭐m⬍N diagonalizes K as well.
■
Together with the distortion rate (10.36), this result proves that a high bit rate
transform code of a Gaussian process is optimized in a Karhunen-Loève basis. The
Karhunen-Loève basis diagonalizes the covariance matrix, which means that the
decomposition coefficients FB [m] ⫽ F , gm are uncorrelated. If F is a Gaussian
random vector, then the coefficients FB [m] are jointly Gaussian. In this case, being
uncorrelated implies that they are independent.The optimality of a Karhunen-Loève
basis is therefore quite intuitive since it produces coefficients FB [m] that are independent. The independence of the coefficients justifies using a scalar quantization
rather than a vector quantization.
Coding Gain
The Karhunen-Loève basis {gm }0⭐m⬍N of F is a priori not well structured. The
decomposition coefficients { f , gm }0⭐m⬍N of a signal f are thus computed with N 2
multiplications and additions,which is often too expensive in real-time coding applications. Transform codes often approximate this Karhunen-Loève basis by a more
structured basis that admits a faster decomposition algorithm. The performance of
a basis is evaluated by the coding gain [34]
G⫽
E{F 2 }
⫽
N 2
N ⫺1
2
m⫽0 m
N
N ⫺1
2
m⫽0 m
1/N
.
Theorem 10.7 proves that G is maximum in a Karhunen-Loève basis.
(10.40)
10.3 High Bit Rate Compression
Non-Gaussian Processes
When F is not Gaussian, the coding gain G no longer measures the coding performance of the basis. Indeed, the distortion rate (10.29) depends on the average
differential entropy factor 22Hd , which is not proportional to 2 . Therefore, the
Karhunen-Loève basis is not optimal.
Circular stationary processes with piecewise smooth realizations are examples
of non-Gaussian processes that are not well compressed in their Karhunen-Loève
basis,which is the discrete Fourier basis. In Section 10.4 we show that wavelet bases
yield better distortion rates because they can approximate these signals with few
nonzero coefficients.
10.3.3 Transparent Audio Code
The compact disc standard samples high-quality audio signals at 44.1 kHz. Samples
are quantized with 16 bits, producing a PCM of 706 kb/s. Audio codes must be
“transparent,” which means that they should not introduce errors that can be heard
by an “average” listener.
Sounds are often modeled as realizations of Gaussian processes.This justifies the
use of a Karhunen-Loève basis to minimize the distortion rate of transform codes.
To approximate the Karhunen-Loève basis, we observe that many audio signals are
locally stationary over a sufficiently small time interval. This means that over this
time interval,the signal can be approximated by a realization of a stationary process.
One can show [192] that the Karhunen-Loève basis of locally stationary processes
is well approximated by a local cosine basis with appropriate window sizes. The
local stationarity hypothesis is not always valid, especially for attacks of musical
instruments,but bases of local time-frequency atoms remain efficient for most audio
segments.
Bases of time-frequency atoms are also well adapted to matching the quantization errors with our hearing sensitivity. Instead of optimizing a mean-square
error as in Theorem 10.6, perceptual coders [317] adapt the quantization so that
errors fall below an auditory threshold, which depends on each time-frequency
atom gm .
Audio Masking
A large-amplitude stimulus often makes us less sensitive to smaller stimuli of a similar
nature. This is called a masking effect. In a sound, a small-amplitude quantization
error may not be heard if it is added to a strong signal component in the same
frequency neighborhood. Audio masking takes place in critical frequency bands
[c ⫺ ⌬/2, c ⫹ ⌬/2] that have been measured with psychophysical experiments
[425]. A strong narrow-band signal having a frequency energy in the interval [c ⫺
⌬/2, c ⫹ ⌬/2] decreases the hearing sensitivity within this frequency interval.
However, it does not influence the sensitivity outside this frequency range. In the
frequency interval [0, 20 kHz], there are approximately 25 critical bands. Below
700 Hz, the bandwidths of critical bands are on the order of 100 Hz. Above 700 Hz
501
502
CHAPTER 10 Compression
the bandwidths increase proportionally to the center frequency c :
100 for c ⭐ 700
⌬ ≈
0.15 c for 700 ⭐ c ⭐ 20, 000.
(10.41)
The masking effect also depends on the nature of the sound, particularly its
tonality.A tone is a signal with a narrow-frequency support as opposed to a noiselike
signal with a frequency spectrum that is spread out. A tone has a different masking
influence than a noise-type signal; this difference must be taken into account [442].
Adaptive Quantization
To take advantage of audio masking, transform codes are implemented in orthogonal bases of local time-frequency atoms {gm }0⭐m⬍N , with frequency supports
inside critical bands. To measure the effect of audio masking at different times, the
signal energy is computed in each critical band. This is done with an FFT over
short time intervals, on the order of 10 ms, where signals are considered to be
approximately stationary. The signal tonality is estimated by measuring the spread
of its Fourier transform.The maximum admissible quantization error in each critical
band is estimated depending on both the total signal energy in the band and the
signal tonality. This estimation is done with approximate formulas that are established with psychophysical experiments [335]. For each vector gm having a Fourier
transform inside a given critical band, the inner product f , gm is uniformly quantized according to the maximum admissible error. Quantized coefficients are then
entropy coded.
Although the SNR may be as low as 13 db, such an algorithm produces a nearly
transparent audio code because the quantization error is below the perceptual
threshold in each critical band. The most important degradations introduced by
such transform codes are pre-echoes. During a silence,the signal remains zero,but it
can suddenly reach a large amplitude due to a beginning speech or a musical attack.
In a short time interval containing this attack,the signal energy may be quite large in
each critical band. By quantizing the coefficients f , gm we introduce an error both
in the silent part and in the attack. The error is not masked during the silence and
will clearly be heard. It is perceived as a“pre-echo”of the attack.This pre-echo is due
to the temporal variation of the signal, which does not respect the local stationarity
hypothesis. However, it can be detected and removed with postprocessings.
Choice of Basis
The MUSICAM (Masking-pattern Universal Subband Integrated Coding and Multiplexing) coder [203] used in the MPEG-I standard [121] is the simplest perceptual
subband coder. It decomposes the signal in 32 equal frequency bands of 750-Hz
bandwidth, with a filter bank constructed with frequency-modulated windows of
512 samples. This decomposition is similar to a signal expansion in a local cosine
basis, but the modulated windows used in MUSICAM are not orthogonal. The quantization levels are adapted in each frequency band in order to take into account the
masking properties of the signal. Quantized coefficients are not entropy coded.This
10.3 High Bit Rate Compression
system compresses audio signals up to 128 kb/s without audible impairment. It is
often used for digital radio transmissions where small defects are admissible.
MP3 is the standard MPEG-1 layer 3, which operates on a large range of audio
quality:Telephone quality is obtained at 8 kb/s on signals sampled at 2.5 kHz; FM
radio quality is obtained at 60 kb/s for a stereo signal sampled at 11 kHz; and CD
quality is obtained for bit rates going from 112 to 128 kb/s for stereo signals sampled
at 22.1 kHz. To maintain a compatibility with the previous standards, the signal is
decomposed both with a filter bank and a local cosine basis.The size of the windows
can be adapted as in the MPEG-2 AAC (Advanced Audio Coding) standard described
next. The size of the quantization bins is computed with a perceptual masking
model. The parameters of the models are not specified in the standard and are thus
transmitted with the signal. A Huffman code stores the quantized coefficients. For a
pair of stereo signals f1 and f2 , the compression is improved by coding an average
signal a ⫽ ( f1 ⫹ f2 )/2 and a difference signal d ⫽ ( f1 ⫺ f2 )/2, and by adapting the
perceptual model for the quantization of each of these signals.
MPEG-2 AAC is a standard that offers a better audio quality for a given compression ratio. The essential difference with MP3 is that it directly uses a decomposition
in a local cosine basis. In order to reduce pre-echo distortions,and to adapt the basis
to the stationarity intervals of the signal, the size 2 j of the windows can vary from
256 to 2048. However, on each interval of 2048 samples, the size of the window
must remain constant, as illustrated in Figure 10.4. Each window has a raising and a
decaying profile that is as large as possible, while overlapping only the two adjacent
windows. The profile used by the standard is
(t) ⫽ sin
(1 ⫹ t) .
4
Section 8.4.1 explains in (8.85) how to construct the windows gp (t) on each
interval from this profile.The discrete windows of the local cosine basis are obtained
with a uniform sampling: gp [n] ⫽ gp (n). The choice of windows can also be interpreted as a best-basis choice,further studied in Section 12.2.3. However,as opposed
to the bases of the local cosine trees from Section 8.5, the windows have raising
and decaying profiles of varying sizes, which are best adapted to the segmentation.
The strategy to choose the window sizes is not imposed by the standard, and the
1
0
FIGURE 10.4
Succession of windows of various sizes on intervals of 2048 samples, which satisfy the
orthogonality constraints of local cosine bases.
503
504
CHAPTER 10 Compression
code transmits the selected window size. A best-basis algorithm must measure the
coding efficiency for windows of different sizes for a given audio segment. In the
neighborhood of an attack, it is necessary to choose small windows in order to
reduce the pre-echo. Like in MP3, the local cosine coefficients are quantized with
a perceptual model that depends on the signal energy in each critical frequency
band, and an entropy code is applied.
The AC systems produced by Dolby are similar to MPEG-2 AAC. The signal is
decomposed in a local cosine basis, and the window size can be adapted to the
local signal content. After a perceptual quantization, a Huffman entropy code is
used. These coders operate on a variety of bit rates from 64 kb/s to 192 kb/s.
To best match human perception, transform code algorithms have also been
developed in wavelet packet bases, with a frequency decomposition that matches
the critical frequency bands [483]. Sinha and Tewfik [442] propose the wavelet
packet basis shown in Figure 10.5, which is an M ⫽ 4 wavelet basis. The properties
of M-band wavelet bases are explained in Section 8.1.3. These four wavelets have
a bandwidth of 1/4, 1/5, 1/6, and 1/7 octaves, respectively. The lower-frequency
interval [0, 700] is decomposed with eight wavelet packets of the same bandwidth in order to match the critical frequency bands (10.41). These wavelet packet
coefficients are quantized with perceptual models and are entropy coded. Nearly
transparent audio codes are obtained at 70 kb/s.
0
0.69 0.69
1.4
2.8
5.5
[0, 22]
11
1.4
2.8
5.5
11
22
FIGURE 10.5
Wavelet packet tree that decomposes the frequency interval [0, 22 kHz] in 24 frequency bands
covered by M ⫽ 4 wavelets dilated over six octaves, plus 8 low-frequency bands of the same
bandwidth. The frequency bands are indicated at the leaves in kHz.
10.3 High Bit Rate Compression
Wavelets produce smaller pre-echo distortions compared to local cosine bases.
At the sound attack, the largest wavelet coefficients appear at fine scales. Because
fine-scale wavelets have a short support, a quantization error creates a distortion
that is concentrated near the attack. However, these bases have the disadvantage
of introducing a bigger coding delay than local cosine bases. The coding delay is
approximately equal to the maximum time support of the vector used in the basis.
It is typically larger for wavelets and wavelet packets than for local cosine vectors.
Choice of Filter
Wavelet and wavelet packet bases are constructed with a filter bank of conjugate
mirror filters. For perceptual audio coding, the Fourier transform of each wavelet
or wavelet packet must have its energy well concentrated in a single critical band.
Second-order lobes that may appear in other frequency bands should have a negligible amplitude. Indeed,a narrow-frequency tone creates large-amplitude coefficients
for all wavelets with a frequency support covering this tone,as shown in Figure 10.6.
Quantizing the wavelet coefficients is equivalent to adding small wavelets with
amplitude equal to the quantization error. If the wavelets excited by the tone have
important second-order lobes in other frequency intervals, the quantization errors
introduce some energy in these frequency intervals that is not masked by the energy
of the tone, thereby introducing audible distortion.
To create wavelets and wavelet packets with small second-order frequency lobes,
the transfer function of the corresponding conjugate mirror filter ĥ() must have a
zero of high order at ⫽ . Theorem 7.7 proves that conjugate mirror filters with
p zeros at ⫽ have at least 2p nonzero coefficients, and correspond to wavelets
of size 2p ⫺ 1. Thus, increasing p produces a longer coding delay. Numerical experiments [442] show that increasing p up to 30 can enhance the perceptual quality
of the audio code, but the resulting filters have at least 60 nonzero coefficients.
Tone
Wavelet
c
⌬
FIGURE 10.6
A high-energy, narrow-frequency tone can excite a wavelet having a Fourier transform with
second-order lobes outside the critical band of width ⌬. The quantization then creates
audible distortion.
505
506
CHAPTER 10 Compression
10.4 SPARSE SIGNAL COMPRESSION
Sparse representations provide high signal-compression factors by only coding a
few nonzero coefficients. The high-resolution quantization hypothesis is not valid
anymore. Section 10.4.1 shows that the distortion is dominated by the nonlinear
approximation term, and most of the bits are devoted to coding the position of
nonzero coefficients. These results are illustrated by a wavelet image transform
code. Section 10.4.2 refines such transform codes with an embedding strategy with
progressive coding capabilities.
The performance of transform codes was studied from a Bayes point of view, by
considering signals as realizations of a random vector that has a known probability
distribution. However, there is no known stochastic model that incorporates the
diversity of complex signals such as nonstationary textures and edges in images.
Classic processes and, in particular, Gaussian processes or homogeneous Markov
random fields are not appropriate. This section introduces a different framework
where the distortion rate is computed with deterministic signal models.
10.4.1 Distortion Rate and Wavelet Image Coding
The signal is considered as a deterministic vector f ∈ CN that is decomposed in an
orthonormal basis B ⫽ {gm }0⭐m⬍N :
f⫽
N
⫺1
fB [m] gm
with
fB [m] ⫽ f , gm .
m⫽0
A transform code quantizes all coefficients and reconstructs
f̃ ⫽
N
⫺1
Q( fB [m]) gm .
(10.42)
m⫽0
Let R be the number of bits used to code the N quantized coefficients Q( fB [m]).
The coding distortion is
d(R, f ) ⫽ f ⫺ f̃ 2 ⫽
N
⫺1
| fB [m] ⫺ Q( fB [m])|2 .
(10.43)
m⫽0
We
denote by p(x) the histogram of the N coefficients fB [m], normalized so
that p(x) dx ⫽ 1. The quantizer approximates each x ∈ ( yk⫺1 , yk ] by Q(x) ⫽ xk .
The proportion of quantized coefficients equal to xk is
yk
pk ⫽
p(x) dx.
(10.44)
yk⫺1
Suppose that the quantized coefficients of f can take at most K different quantized
values xk . A variable-length code represents the quantized values equal to xk with
an average of lk bits, where the lengths lk are specified independently from f . It
is implemented with a prefix code or an arithmetic code over blocks of quantized
10.4 Sparse Signal Compression
values that are large enough so that
be assumed to take any real values that
the lk can
⫺lk ⭐ 1. Encoding a signal with K symbols
satisfy the Kraft inequality (10.6) K
2
k⫽1
requires a total number of bits
R⫽N
K
pk lk .
(10.45)
k⫽1
The bit budget R reaches its minimum for lk ⫽ ⫺ log2 pk and thus,
R ⭓ H( f ) ⫽ ⫺N
K
pk log2 pk .
(10.46)
k⫽1
In practice, we do not know in advance the values of pk , which depend on the
signal f .An adaptive variable-length code,as explained in Section 10.2.1,computes
an estimate p̃k of pk with an empirical histogram of the already coded coefficients. It
sets lk ⫽ ⫺ log2 p̃k and updates the estimate p̃k and thus lk as the coding progresses.
Under appropriate ergodicity assumptions, the estimated p̃k converge to pk and R
to the entroy H.The N quantized signal coefficients can be modeled as values taken
by a random variable X with a probability distribution equal to the histogram p(x).
The distortion (10.43) can then be rewritten as
d(R, f ) ⫽ N E{|X ⫺ Q(X)|2 },
and the bit budget of an adaptive variable-length code converges to entropy R ⫽
H(Q(X)) ⫽ H( f ).
If the high-resolution quantization assumption is valid, which means that p(x) is
nearly constant over quantization intervals, then Theorem 10.5 proves that d(R, f )
is minimum if and only if Q is a uniform quantizer. The resulting distortion rate
computed in (10.26) is
d(R, f ) ⫽
N 2Hd ( f ) ⫺2R/N
2
,
2
12
(10.47)
with Hd ( f ) ⫽ ⫺ p(x) log2 p(x) dx. It predicts an exponential distortion rate decay.
The high-resolution quantization assumption is valid if the quantization bins are small
enough, which means that R/N is sufficiently large.
Coded sequences of quantized coefficients are often not homogeneous and
ergodic sources. Adaptive variable-length codes can then produce a bit budget that
is below the entropy (10.46). For example, the wavelet coefficients of an image
often have a larger amplitude at large scales. When coding coefficients from large
to fine scales, an adaptive arithmetic code progressively adapts the estimated probability distribution. Thus, it produces a total bit budget that is often smaller than
the entropy H( f ) obtained with a fixed code globally optimized for the N wavelet
coefficients.
Wavelet Image Code
A simple wavelet image code is introduced to illustrate the properties of low bit
rate transform coding in sparse representations. The image is decomposed in a
507
508
CHAPTER 10 Compression
(a)
(b)
(c)
(d)
FIGURE 10.7
Result of a wavelet transform code with an adaptive arithmetic coding using R̄ ⫽ 0.5 bit/pixel for
images of N ⫽ 5122 pixels: (a) Lena, (b) GoldHill, (c) boats, and (d) mandrill.
separable wavelet basis. All wavelet coefficients are uniformly quantized and coded
with an adaptive arithmetic code. Figure 10.7 shows examples of coded images
with R/N ⫽ 0.5 bit/pixel.
The peak signal-to-noise ratio (PSNR) is defined by
PSNR(R, f ) ⫽ 10 log10
N 2552
.
d(R, f )
10.4 Sparse Signal Compression
PSNR
55
PSNR
55
50
50
45
45
40
40
35
35
30
30
25
25
20
0
1
2
3
4
R
20
0
1
2
(a)
3
4
509
R
(b)
FIGURE 10.8
PSNR as a function of R/N : (a) Lena image (solid line) and boats image (dotted line);
(b) GoldHill image (solid line) and mandrill image (dotted line).
The high-resolution distortion rate formula (10.47) predicts that there exists a
constant K such that
PSNR(R, f ) ⫽ (20 log10 2) R̄ ⫹ K
with
R̄ ⫽ R/N .
Figure 10.8 shows that PSNR(R, f ) has indeed a linear growth for R̄ ⭓ 1, but not for
R̄ ⬍ 1.
At low bit rates R̄ ⭐ 1, the quantization interval ⌬ is relatively large. The normalized histogram p(x) of wavelet coefficients in Figure 10.9 has a narrow peak in
the neighborhood of x ⫽ 0. Thus, p(x) is poorly approximated by a constant in the
zero bin [⫺⌬/2, ⌬/2] where Q(x) ⫽ 0. The high-resolution quantization hypothesis is not valid in this zero bin, which explains why the distortion rate formula
(10.47) is incorrect. For the mandrill image, the high-resolution hypothesis remains
p (x)
p (x)
0.15
p (x)
0.15
p (x)
0.15
0.15
0.1
0.1
0.1
0.1
0.05
0.05
0.05
0.05
0
250
0
(a)
50
x
0
250
0
(b)
50
x
0
250
0
(c)
50
x
0
250
0
(d)
50
x
FIGURE 10.9
Normalized histograms of orthogonal wavelet coefficients for each image: (a) Lena, (b) boats, (c) GoldHill,
and (d) mandrill.
510
CHAPTER 10 Compression
valid up to R̄ ⫽ 0.5 because the histogram of its wavelet coefficients is wider in the
neighborhood of x ⫽ 0.
Geometric Bit Budget
If the basis B is chosen so that many coefficients fB [m] ⫽ f , gm are close to zero,
then the histogram p(x) has a sharp high-amplitude peak at x ⫽ 0, as in the wavelet
histograms shown in Figure 10.9. The distortion rate is calculated at low bit rates,
where the high-resolution quantization does not apply.
The bit budget R is computed by considering separately the set of significant
coefficients
⌳⌬/2 ⫽ {m : Q(| fB [m]|) ⫽ 0} ⫽ {m : | fB [m]| ⭓ ⌬/2}.
This set is the approximation support that specifies the geometry of a sparse transform coding. Figure 10.10 shows the approximation support of the quantized
wavelet coefficients that code the four images in Figure 10.7. The total bit budget R to code all quantized coefficients is divided into the number of bits R0 needed
to code the coefficients quantized to zero, plus the number of bits R1 to code
significant coefficients:
R ⫽ R 0 ⫹ R1 .
The bit budget R0 can also be interpreted as a geometric bit budget that codes the
position of significant coefficients and thus ⌳⌬/2 . Let M ⫽ |⌳⌬/2 | ⭐ N be the number
N different sets
of significant coefficients that is coded with log2 N bits. There are M
of M coefficients chosen among N .To code an approximation support ⌳⌬/2 without
any other prior information requires a number of bits
R0 ⫽ log2 N ⫹ log2
N
N
.
∼ M 1 ⫹ log2
M
M
Theorem 10.8 shows that the number of bits R1 to code M significant is typically
proportional to M with a variable-length code. If M N , then the overall bit budget
R ⫽ R0 ⫹ R1 is dominated by the geometric bit budget R0 .
This approximation support ⌳⌬/2 can be coded with an entropy coding of the
binary significance map
0 if Q( fB [m]) ⫽ 0
b[m] ⫽
(10.48)
1 if Q( fB [m]) ⫽ 0.
The proportions of 0 and 1 in the significance map are, respectively, p0 ⫽ (N ⫺ M)
/N and p1 ⫽ M/N . An arithmetic code of this significance map yields a bit budget of
R0 ⭓ H0 ⫽ ⫺N p0 log2 p0 ⫹ p1 log2 p1 ,
(10.49)
N which is of the same order as log2 M
(Exercise 10.5).
10.4 Sparse Signal Compression
(a)
(b)
(c)
(d)
FIGURE 10.10
Significance maps of quantized wavelet coefficients for images coded with R̄ ⫽ 0.5 bit /pixel:
(a) Lena, (b) GoldHill, (c) boats, and (d) mandrill.
Distortion and Nonlinear Approximation
The distortion d(R, f ) is calculated by separating the significant coefficients in ⌳⌬/2
from other coefficients for which Q( fB [m]) ⫽ 0:
N
⫺1
d(R, f ) ⫽
| fB [m] ⫺ Q( fB [m])|2
(10.50)
m⫽0
⫽
m∈⌳
/ ⌬/2
| fB [m]|2 ⫹
m∈⌳⌬/2
| fB [m] ⫺ Q( fB [m])|2 .
(10.51)
511
512
CHAPTER 10 Compression
Let fM ⫽ m∈⌳⌬/2 fB [m] gm be the best M-term approximation of f from the M
significant coefficients above ⌬/2. The first sum of d(R, f ) can be rewritten as a
nonlinear approximation error:
f ⫺ fM 2 ⫽
| fB [m]|2 .
m∈⌳
/ ⌬/2
Since 0 ⭐ |x ⫺ Q(x)| ⭐ ⌬/2, (10.50) implies that
f ⫺ fM 2 ⭐ d(R, f ) ⭐ f ⫺ fM 2 ⫹ M
⌬2
.
4
(10.52)
Theorem 10.8 shows that the nonlinear approximation error f ⫺ fM 2 dominates
the distortion rate behavior [208].
In Section 9.2.1 we prove that nonlinear approximation errors depend on
the decay of the sorted coefficients of f in B. We denote by fBr [k] ⫽ fB [mk ]
the coefficient of rank k, defined by | fBr [k]| ⭓ | fBr [k ⫹ 1]| for 1 ⭐ k ⭐ N . We write
| fBr [k]| ∼ C k⫺s if there exist two constants A, B ⬎ 0 independent of C,k,and N such
that A C k⫺s ⭐ | fBr [k]| ⭐ B C k⫺s . Theorem 10.8 computes the resulting distortion
rate [363].
Theorem 10.8: Falzon, Mallat. Let Q be a uniform quantizer. There exists a variablelength code such that for all s ⬎ 1/2 and C ⬎ 0, if | fBr [k]| ∼ C k⫺s , then
d(R, f ) ∼ C 2 R1⫺2s
1 ⫹ log2
N
R
2s⫺1
for
R⭐N.
(10.53)
Proof. Since the sorted coefficients satisfy | fBr [k]| ∼ C k⫺s and | fBr [M]| ∼ ⌬, we derive that
M ∼ C 1/s ⌬⫺1/s .
(10.54)
Since s ⬎ 1/2, the approximation error is
f ⫺ fM 2 ⫽
N
| fBr [k]|2 ∼
k⫽M⫹1
N
C 2 k⫺2s ∼ C 2 M 1⫺2s .
(10.55)
k⫽M⫹1
But (10.54) shows that M ⌬2 ∼ C 2 M 1⫺2s , so (10.52) yields
d(R, f ) ∼ C 2 M 1⫺2s .
(10.56)
We now relate the bit budget R ⫽ R0 ⫹ R1 to M. The number of bits R0 to code M and
the significance set ⌳⌬/2 of size M is
R0 ⫽ log2 N ⫹ log2
N
∼M
M
1 ⫹ log2
N
.
M
Let us decompose R1 ⫽ Ra ⫹ Rs , where Ra is the number of bits that code the amplitude of the M significant coefficients of f , and Rs is the number of bits that code
their sign. Clearly, 0 ⭐ Rs ⭐ M. The amplitude of coefficients is coded with a logarithmic
variable-length code, which does not depend on the distribution of these coefficients.
10.4 Sparse Signal Compression
Let pj be the fraction of M significant coefficients such that |Q( fB0 [k])| ⫽ j⌬, and thus
r
| fB [k]| ∈ [( j ⫺ 1/2)⌬, ( j ⫹ 1/2)⌬). Since | fBr [k]| ∼ C k⫺s ,
pj M ∼ C 1/s ⌬⫺1/s ( j ⫺ 1/2)⫺1/s ⫺ C 1/s ⌬⫺1/s ( j ⫹ 1/2)⫺1/s ∼ s⫺1 C 1/s ⌬⫺1/s j ⫺1/s⫺1 .
But M ∼∼ C 1/s ⌬⫺1/s so
pj ∼ s⫺1 j ⫺1/s⫺1 .
(10.57)
Let us consider a logarithmic variable-length code
lj ⫽ log2 ( 2 /6) ⫹ 2 log2 j.
which satisfies the Kraft inequality (10.6) because
⫹⬁
2⫺lj ⫽
j⫽1
⫹⬁
6 ⫺2
j ⫽ 1.
2 j⫽1
This variable-length code produces a bit budget
Ra ⫽ ⫺M
⫹⬁
pj lj ∼ M s⫺1
j⫽1
⫹⬁
j ⫺1⫺1/s (log2 ( 2 /6) ⫹ 2 log2 j) ∼ M.
(10.58)
j⫽1
As a result, R1 ⫽ Rs ⫹ Ra ∼ M, and thus
R ⫽ R 0 ⫹ R1 ∼ M
1 ⫹ log2
N
.
M
(10.59)
Inverting this equation gives
M ∼R
1 ⫹ log2
N
R
⫺1
,
and since d(R, f ) ∼ C 2 M 1⫺2s in (10.56), it implies (10.53).
■
The equivalence sign ∼ means that lower and upper bounds of d(R, f ) are
obtained by multiplying the right expression of (10.53) by two constants A, B ⬎ 0
that are independent of C, R, and N . Thus, it specifies the increase of d(R, f ) as R
decreases.Theorem 10.8 proves that at low bit rates,the distortion is proportional to
R1⫺2s , as opposed to 2⫺2R/N in the high bit rate distortion formula (10.47). The bit
budget is dominated by the geometric bit budget R0 , which codes the significance
map. At low bit rates, to minimize the distortion one must find a basis B that yields
the smallest M-term approximation error.
Twice-Larger Zero Bin
Since the high-resolution quantization hypothesis does not hold, a uniform quantizer does not minimize the distortion rate. The wavelet coefficient histograms in
Figure 10.9 are highly peaked and can be modeled in a first approximation by
513
514
CHAPTER 10 Compression
Laplacian distributions having an exponential decay p(x) ⫽ /2 e⫺|x| . For Laplacian distributions, one can prove [394] that optimal quantizers that minimize the
distortion rate with an entropy coder have a zero bin [⫺⌬, ⌬] that is twice larger
than other quantization bins, which must have a constant size ⌬.
Doubling the size of the zero bin often improves the distortion at low bit rates.
This is valid for wavelet transform codes but also for image transform codes in block
cosine bases [363]. It reduces the proportion of significant coefficients and improves
the bit budget by a factor that is not offset by the increase of the quantization error.
A larger zero bin increases the quantization error too much, degrading the overall
distortion rate. Thus, the quantizer becomes
0
if |x| ⬍ ⌬
Q(x) ⫽
(10.60)
sign(x) (x/⌬ ⫹ 1/2) ⌬ if |x| ⭓ ⌬
and the significance map becomes ⌳⌬ ⫽ {m : | fB [m]| ⭐ ⌬}. Theorem 10.8 remains
valid for this quantizer with a twice-larger zero bin. Modifying the size of other
quantization bins has a marginal effect. One can verify that the distortion rate equivalence (10.53) also holds for a nonuniform quantizer that is adjusted to minimize
the distortion rate.
Bounded Variation Images
Section 2.3.3 explains that large classes of images have a bounded total variation that is proportional to the length of their contours. Theorem 9.17 proves
that the sorted wavelet coefficients fBr [k] of bounded variation images satisfy
| fBr [k]| ⫽ O( f V k⫺1 ). If the image is discontinuous along an edge curve, then
it creates large-amplitude wavelet coefficients and | fBr [k]| ∼ f V k⫺1 . This decay
property is verified by the wavelet coefficients of the Lena and boat images, which
can be considered as discretizations of bounded variation functions. Theorem 10.8
derives for s ⫽ 1 that
N
d(R, f ) ∼ f 2V R⫺1 1 ⫹ log2
.
(10.61)
R
Figure 10.11(a) shows the PSNR computed numerically from a wavelet transform
code of the Lena and boats images. Since PSNR ⫽ 10 log10 d(R, f ) ⫹ K , it results
from (10.61) that it increases almost linearly as a function of log2 R̄, with a slope of
10 log10 2 ≈ 3 db/bit for R̄ ⫽ R/N .
More Irregular Images
The mandrill and GoldHill are examples of images that do not have a bounded
variation. This appears in the fact that their sorted wavelet coefficients satisfy
| fBr [k]| ∼ C k⫺s for s ⬍ 1. Since PSNR ⫽ 10 log10 d(R, f ) ⫹ K , it results from (10.53)
that it increases with a slope of (2s ⫺ 1) 10 log10 2 as a function of log2 R̄. For the
GoldHill image, s ≈ 0.8, so the PSNR increases by 1.8 db/bit. Mandrill is even more
irregular,with s ≈ 2/3,so at low bit rates R̄ ⬍ 1/4 the PSNR increases by only 1 db/bit.
Such images can be modeled as the discretization of functions in Besov spaces with
a regularity index s/2 ⫹ 1/2 smaller than 1.
10.4 Sparse Signal Compression
40
PSNR
40
35
35
30
30
25
25
20
25
24
23
22
(a)
21
0
log2 (R )
PSNR
20
25
24
23
22
(b)
21
0
log2 (R )
FIGURE 10.11
PSNR as a function of log2 (R̄). (a) Lena image (solid line) and boats image (dotted line)
(b) GoldHill image (solid line) and mandrill image (dotted line).
Distortion Rate for Analog Signals
The input signal f [n] is most often the discretization of an analog signal f̄ (x),and can
be written as f [n] ⫽ f̄ , n where {n }0⭐n⬍N is a Riesz basis of an approximation
space U N . To simplify explanations, we suppose that this basis is orthonormal. Let
f̄N be the orthogonal projection of f in UN that can be recovered from f [n]. At
the end of the processing chain, the coded signal f̃ [n] is converted into an analog
signal in UN :
f̄˜N (x) ⫽
N
⫺1
f̃ [n] n (x).
n⫽0
Since {n }0⭐n⬍N is orthonormal, the norms over analog and discrete signals are
equal: f̄N ⫺ f̄˜N ⫽ f ⫺ f̃ .
The overall analog distortion rate d(R, f̄ ) ⫽ f̄ ⫺ f̄˜N 2 satisfies
d(R, f̄ ) ⫽ f̄N ⫺ f̄ 2 ⫹ f̄N ⫺ f̄˜N 2 ⫽ f̄N ⫺ f̄ 2 ⫹ f ⫺ f̃ 2 .
The sampling resolution N can be chosen so that the linear approximation error
f̄N ⫺ f̄ 2 is smaller or of the same order as the compression error f ⫺ f̃ 2 . For
most functions, such as bounded variation functions or Besov space functions,
the linear approximation error satisfies f̄N ⫺ f̄ 2 ⫽ O(N ⫺ ) for some  ⬎ 0. If discrete distortion rate f ⫺ f̃ 2 satisfies the decay (10.53) of Theorem 10.8, then for
N ⫽ R(2s⫺1)/ , we get
d(R, f̄ ) ⫽ O R1⫺2s | log2 R|2s⫺1 .
(10.62)
This result applies to bounded variation images coded in wavelet bases. Donoho
[215] proved that for such functions the decay exponent R⫺1 cannot be improved
515
516
CHAPTER 10 Compression
by any other signal coder. In that sense, a wavelet transform coding is optimal for
bounded variation images. Functions in Besov spaces coded in wavelet bases also
have a distortion rate that satisfies (10.62) for an exponent s that depends on the
space. The optimality of wavelet transform codes in Besov spaces is also studied
in [173].
10.4.2 Embedded Transform Coding
For rapid transmission or fast browsing from a database, a coarse signal approximation should be quickly provided, and progressively enhanced as more bits are
transmitted. Embedded coders offer this flexibility by grouping the bits in order
of significance. The decomposition coefficients are sorted and the first bits of the
largest coefficients are sent first. A signal approximation can be reconstructed at
any time from the bits already transmitted.
Embedded coders store geometric bit planes and can thus take advantage of any
prior information about the location of large versus small coefficients. Such prior
information is available for natural images decomposed on wavelet bases. Section
10.5.2 explains how JPEG-2000 uses this prior information to optimize wavelet
coefficients coding.
Embedding
The decomposition coefficients fB [m] ⫽ f , gm are partially ordered by grouping
them in index sets ⌰k defined for any k ∈ Z by
⌰k ⫽ {m : 2k ⭐ | fB [m]| ⬍ 2k⫹1 } ⫽ ⌳2k ⫺ ⌳2k⫹1 ,
which are the difference between two significance maps for twice-larger quantization steps. The set ⌰k is coded with a binary significance map bk [m]:
bk [m] ⫽
0
1
if m ∈
/ ⌰k
if m ∈ ⌰k .
(10.63)
An embedded algorithm quantizes fB [m] uniformly with a quantization step
⌬ ⫽ 2n that is progressively reduced. Let m ∈ ⌰k with k ⭓ n. The amplitude
|Q( fB [m])| of the quantized number is represented in base 2 by a binary string
with nonzero digits between bit k and bit n. Bit k is necessarily 1 because
2k ⭐ |Q( fB [m])| ⬍ 2k⫹1 . Thus, k ⫺ n bits are sufficient to specify this amplitude, to
which is added 1 bit for the sign.
The embedded coding is initiated with the largest quantization step that produces
at least one nonzero quantized coefficient. Each coding iteration with a reduced
quantization step 2n is called a bit plane coding pass. In the loop, to reduce the
quantization step from 2n⫹1 to 2n , the algorithm first codes the significance map
bn [m]. It then codes the sign of fB [m] for m ∈ ⌰n . Afterwards, the code stores the
nth bit of all amplitudes |Q( fB [m])| for m ∈ ⌰k with k ⬎ n. If necessary, the coding
10.4 Sparse Signal Compression
precision is improved by decreasing n and continuing the encoding. The different
steps of the algorithm can be summarized as follows [422]:
1. Initialization: Store the index n of the first nonempty set ⌰n :
n ⫽ sup log2 | fB [m]| .
(10.64)
m
2. Significance coding: Store the significance map bn [m] for m ∈
/ ⌰n⫹1 .
3. Sign coding: Code the sign of fB [m] for m ∈ ⌰n .
4. Quantization refinement:Store the nth bit of all coefficients | fB [m]| ⬎ 2n⫹1 .
These are coefficients that belong to some set ⌰k for k ⬎ n, having a position
already stored. Their nth bit is stored in the order in which its position was
recorded in the previous passes.
5. Precision refinement: Decrease n by 1 and go to step 2.
This algorithm may be stopped at any time in the loop, providing a code for
any specified number of bits. The decoder then restores the significant coefficients
up to a precision ⌬ ⫽ 2n . In general, only part of the coefficients are coded with a
precision 2n . Valid truncation points of a bit stream correspond to the end of the
quantization refinement step for a given pass, so that all coefficients are coded with
the same precision.
Distortion Rate
The distortion rate is analyzed when the algorithm is stopped at step 4. All
coefficients above ⌬ ⫽ 2n are uniformly quantized with a bin size ⌬ ⫽ 2n . The zeroquantization bin [⫺⌬, ⌬] is therefore twice as big as the other quantization bins,
which improves coder efficiency as previously explained.
Once the algorithm stops,we denote by M the number of significant coefficients
above ⌬ ⫽ 2n . The total number of bits of the embedded code is
R ⫽ Re0 ⫹ Re1 ,
where Re0 is the number of bits needed to code all significance maps bk [m] for k ⭓ n,
and Re1 the number of bits used to code the amplitude of the quantized significant
coefficients Q( fB [m]), knowing that m ∈ ⌰k for k ⬎ n.
To appreciate the efficiency of this embedding strategy, let us compare the bit
budget Re0 ⫹ Re1 to the number of bits R0 ⫹ R1 used by the direct transform code from
Section 10.4.1. The value R0 is the number of bits that code the overall significance
map
0 if | fB [m]| ⭐ ⌬
b[m] ⫽
(10.65)
1 if | fB [m]| ⬎ ⌬
and R1 is the number of bits that code the quantized significant coefficients.
517
518
CHAPTER 10 Compression
An embedded strategy codes Q( fB [m]) knowing that m ∈ ⌰k and thus that 2k ⭐
|Q( fB [m])| ⬍ 2k⫹1 , whereas a direct transform code knows only that |Q( fB [m])| ⬎
⌬ ⫽ 2n . Thus, fewer bits are needed for embedded codes:
Re1 ⭐ R1 .
(10.66)
However, this improvement may be offset by the supplement of bits needed to
code the significance maps {bk [m]}k⬎n of the sets {⌰k }k⬎n . A direct transform code
records a single significance map b[m], which specifies ⌳2n ⫽ ∪k⭓n ⌰k . It provides
less information and is therefore coded with fewer bits:
Re0 ⭓ R0 .
(10.67)
An embedded code brings an improvement over a direct transform code if
Re0 ⫹ Re1 ⭐ R0 ⫹ R1 .
This happens if there is some prior information about the position of large coefficients | fB [m]| versus smaller ones. An appropriate coder can then reduce the
number of bits needed to encode the partial sorting of all coefficients provided by
the significance maps {bk [m]}k⬎n . The use of such prior information produces an
overhead of Re0 relative to R0 that is smaller than the gain of Re1 relative to R1 . This
is the case for most images coded with embedded transform codes implemented in
wavelet bases [422] and for the block cosine I basis [492].
Figure 10.12 compares the PSNR of the SPIHT wavelet–embedded code by Said
and Pearlman [422] with the PSNR of the direct wavelet transform code that performs an entropy coding of the significance map and of the significance coefficients,
described in Section 10.4.1. For any quantization step, both transform codes yield
the same distortion but the embedded code reduces the bit budget:
Re0 ⫹ Re1 ⭐ R0 ⫹ R1 .
40
PSNR
40
35
35
30
30
25
25
20
26
log2 (R )
25
24
23 22
(a)
21
0
PSNR
20
26
log2 (R )
25
24
23 22
( b)
21
0
FIGURE 10.12
Comparison of the PSNR obtained with an embedded wavelet transform code (dotted line) and
a direct wavelet transform code (solid line): (a) Lena image, and (b) GoldHill image.
10.5 Image-Compression Standards
As a consequence, the PSNR curve of the embedded code is a translation to the
left of the PSNR of the direct transform code. For a fixed bit budget per pixel
1 ⭓ R/N ⭓ 2⫺8 , the embedded SPIHT coder gains about 1 db on the Lena image
and 1/2 db on the GoldHill image. The GoldHill is an image with more texture. Its
wavelet representation is not as sparse as the Lena image,and therefore its distortion
rate has a slower decay.
10.5 IMAGE-COMPRESSION STANDARDS
The JPEG image-compression standard is a transform code in an orthogonal block
cosine basis described in Section 10.5.1. It is still the most common image standard
used by digital cameras and for transmission of photographic images on the Internet
and cellular phones. JPEG-2000 is the most recent compression standard performing
a transform coding in a wavelet basis,summarized in Section 10.5.2. It is mostly used
for professional imaging applications.
10.5.1 JPEG Block Cosine Coding
The JPEG image-compression standard [478] is a transform coding in a block cosine
I basis. Its implementation is relatively simple, which makes it particularly attractive
for consumer products.
Theorem 8.12 proves that the following cosine I family is an orthogonal basis of
an image block of L ⫻ L pixels:
k 1
j
1
2
gk, j [n, m] ⫽ k j cos
(10.68)
n⫹
cos
m⫹
L
L
2
L
2
0⭐k,j⬍L
with
√
1/ 2
1
p ⫽
if p ⫽ 0
otherwise.
(10.69)
In the JPEG standard, images of N pixels are divided in N /64 blocks of 8 ⫻ 8 pixels.
Each image block is expanded in this separable cosine basis with a fast separable
DCT-I transform.
JPEG quantizes the block cosine coefficients uniformly. In each block of 64 pixels,
a significance map gives the position of zero versus nonzero quantized coefficients.
Lower-frequency coefficients are located in the upper right of each block, whereas
high-frequency coefficients are in the lower right,as illustrated in Figure 10.13. Many
image blocks have significant coefficients only at low frequencies and thus in the
upper left of each block. To take advantage of this prior knowledge, JPEG codes the
significance map with a run-length code. Each block of 64 coefficients is scanned in
zig-zag order as indicated in Figure 10.13. In this scanning order, JPEG registers the
size of the successive runs of coefficients quantized to zero, which are efficiently
coded together with the values of the following nonzero quantized coefficients.
519
520
CHAPTER 10 Compression
k5j50
k57
DC
j57
FIGURE 10.13
A block of 64 cosine coefficients has the zero-frequency (DC) coefficient at the upper left. The
run-length makes a zig-zag scan from low to high frequencies.
Insignificant high-frequency coefficients often produce a long sequence of zeros at
the end of the block, which is coded with an end-of-block (EOB) symbol.
i [n, m] of frequency zero, which
In each block i, there is one cosine vector g0,0
i is prois equal to 1/8 over the block and 0 outside. The inner product f , g0,0
i
i
portional to the average of the image over the block. Let DC ⫽ Q( f , g0,0 ) be the
quantized zero-frequency coefficient. Since the blocks are small, these averages are
often close for adjacent blocks, and JPEG codes the differences DC i ⫺ DC i⫺1 .
Weighted Quantization
Our visual sensitivity depends on the frequency of the image content. We are typically less sensitive to high-frequency oscillatory patterns than to low-frequency
variations. To minimize the visual degradation of the coded images, JPEG performs
a quantization with intervals that are proportional to weights specified in a table,
which is not imposed by the standard. This is equivalent to optimizing a weighted
mean-square error (10.34). Table 10.1 is an example of an 8 ⫻ 8 weight matrix that
is used in JPEG [478]. The weights at the lowest frequencies, corresponding to the
upper left portion of Table 10.1, are roughly 10 times smaller than at the highest
frequencies, corresponding to the bottom right portion.
Distortion Rate
At 0.25 to 0.5 bit/pixel, the quality of JPEG images is moderate. At 0.2 bit/pixel,
Figure 10.14 shows that there are blocking effects due to the discontinuities of the
square windows. At 0.75 to 1 bit/pixel, images coded with the JPEG standard are
of excellent quality. Above 1 bit/pixel, the visual image quality is perfect. The JPEG
standard is often used for R̄ ∈ [0.5, 1].
At low bit rates, the artifacts at the block borders are reduced by replacing
the block cosine basis by a local cosine basis [40, 87], designed in Section 8.4.4.
If the image is smooth over a block, a local cosine basis creates lower-amplitude,
high-frequency coefficients, which slightly improves the coder performance. The
quantization errors for smoothly overlapping windows also produce more regular
10.5 Image-Compression Standards
Table 10.1 Matrix of Weights wk, j
16
11
10
16
24
40
12
14
51
61
12
14
19
26
58
60
55
13
16
24
40
57
69
56
14
17
22
29
51
87
80
62
18
22
37
56
68
108
103
77
24
35
55
64
81
194
113
92
49
64
78
87
103
121
120
101
72
92
95
98
121
100
103
99
Note: These weights are used to quantize the block
cosine coefficient corresponding to each cosine vector
gk, j [69]. The order is the same as in Figure 10.13.
gray-level image fluctuations at the block borders. However, the improvement has
not been significant enough to motivate replacing the block cosine basis by a local
cosine basis in the JPEG standard.
Implementation of JPEG
The baseline JPEG standard [478] uses an intermediate representation that combines run-length and amplitude values. In each block, the 63 (nonzero frequency)
quantized coefficients indicated in Figure 10.13 are integers that are scanned in
zig-zag order. A JPEG code is a succession of symbols S1 ⫽ (L, B) of 8 bits followed
by symbols S2 . The L variable is the length of a consecutive run of zeros, coded on
4 bits. Thus, its value is limited to the interval [0, 15]. Actual zero-runs can have a
length greater than 15. The symbol S1 ⫽ (15, 0) is interpreted as a run length of size
16 followed by another run length. When the run of zeros includes the last 63rd
coefficient of the block, a special EOB symbol S1 ⫽ (0, 0) is used, which terminates
the coding of the block. For high compression rates, the last run of zeros may be
very long. The EOB symbol stops the coding at the beginning of this last run of
zeros.
The B variable of S1 is coded on 4 bits and gives the number of bits used to
code the value of the next nonzero coefficient. Since the image gray-level values
are in the interval [0, 28 ], one can verify that the amplitude of the block cosine
coefficients remains in [⫺210 , 210 ⫺ 1]. For any integers in this interval,Table 10.2
gives the number of bits used by the code. For example, 70 is coded on B ⫽ 7
bits. There are 27 different numbers that are coded with 7 bits. If B is nonzero,
after the symbol S1 , the symbol S2 of length B specifies the amplitude of the following nonzero coefficient. This variable-length code is a simplified entropy code.
High-amplitude coefficients appear less often and are thus coded with more bits.
For DC coefficients (zero frequency),the differential values DC i ⫺ DC i⫺1 remain
in the interval [⫺211 , 211 ⫺ 1].They are also coded with a succession of two symbols.
521
522
CHAPTER 10 Compression
FIGURE 10.14
Image compression with JPEG: left column, 0.5 bit/pixel; right column, 0.2 bit/pixel.
10.5 Image-Compression Standards
Table 10.2
on B Bits
The Value of Coefficients Coded
B
Range of Values
1
⫺1, 1
2
⫺3, ⫺2, 2, 3
3
⫺7 . . . ⫺4, 4 . . . 7
4
⫺15 . . . ⫺8, 8 . . . 15
5
⫺31 . . . ⫺16, 16 . . . 31
6
⫺63 . . . ⫺32, 32 . . . 63
7
⫺127 . . . ⫺64, 64 . . . 127
8
⫺255 . . . ⫺128, 128 . . . 255
9
⫺511 . . . ⫺256, 256 . . . 511
10
⫺1023 . . . ⫺512, 512 . . . 1023
Note: The values belong to sets of 2B values that are
indicated in the second column.
In this case, S1 is reduced to the variable B that gives the number of bits of the next
symbol S2 , which codes DC i ⫺ DC i⫺1 .
For both DC and the other coefficients, the S1 symbols are encoded with a
Huffman entropy code. JPEG does not impose the Huffman tables, which may vary
depending on the type of image. An arithmetic entropy code can also be used. For
coefficients that are not zero frequency, the L and B variables are lumped together
because their values are often correlated,and the entropy code of S1 takes advantage
of this correlation.
10.5.2 JPEG-2000 Wavelet Coding
The JPEG-2000 image-compression standard is a transform code in a wavelet basis.
It introduces typically less distortions than JPEG but this improvement is moderate above 1 bit/pixel. At low bit rates, JPEG-2000 degrades more progressively than
JPEG. JPEG-2000 is implemented with an embedded algorithm that provides scalable codes for progressive transmission. Region of interest can also be defined to
improve the resolution on specific image parts. Yet, the algorithmic complexity
overhead of the JPEG-2000 algorithm has mostly limited its applications to professional image processing, such as medical imaging, professional photography, or
digital cinema.
Although mostly inspired by the EBCOT algorithm by Taubman and Marcellin
[65], JPEG-2000 is the result of several years of research to optimize wavelet
image codes. It gives elegant solutions to key issues of wavelet image coding that will
be reviewed together with the description of the standard.Taubman and Marcellin’s
523
524
CHAPTER 10 Compression
book [65] explains all the details. JPEG-2000 brings an improvement of typically
more than 1 db relatively to an adaptive arithmetic entropy coding of wavelet coefficients, described in Section 10.4.1. Good visual–quality images are obtained in
Figure 10.15 with 0.2 bit/pixel,which considerably improves the JPEG compression
results shown in Figure 10.14. At 0.05 bit/pixel the JPEG-2000 recovers a decent
approximation, which is not possible with JPEG.
Choice of Wavelet
To optimize the transform code one must choose a wavelet basis that produces as
many zero-quantized coefficients as possible. A two-dimensional separable wavelet
basis is constructed from a one-dimensional wavelet basis generated by a mother
wavelet . The wavelet choice does not modify the asymptotic behavior of the
distortion rate (10.61) but it influences the multiplicative constant. Three criteria
may influence the choice of : number of vanishing moments, support size, and
regularity.
High-amplitude coefficients occur when the supports of the wavelets overlap a
brutal transition like an edge. The number of high-amplitude wavelet coefficients
created by an edge is proportional to the width of the wavelet support, which
should thus be as small as possible. For smooth regions, wavelet coefficients are
small at fine scales if the wavelet has enough vanishing moments to take advantage
of the image regularity. However, Theorem 7.9 shows that the support size of
increases proportionally to the number of vanishing moments. The choice of an
optimal wavelet is therefore a trade-off between the number of vanishing moments
and support size.
The wavelet regularity is important for reducing the visibility of artifacts. A quantization error adds a wavelet multiplied by the amplitude of the quantized error to
the image. If the wavelet is irregular, the artifact is more visible because it looks like
an edge or a texture patch [88]. This is the case for Haar wavelets. Continuously
differentiable wavelets produce errors that are less visible,but more regularity often
does not improve visual quality.
To avoid creating large-amplitude coefficients at the image border, it is best to
use the folding technique from Section 7.5.2, which is much more efficient than
the periodic extension from Section 7.5.1. However, it requires using wavelets that
are symmetric or antisymmetric. Besides Haar,there is no symmetric or antisymmetric wavelet of compact support that generates an orthonormal basis. Biorthogonal
wavelet bases that are nearly orthogonal can be constructed with symmetric or
antisymmetric wavelets. Therefore,they are used more often for image compression.
Overall, many numerical studies have shown that the symmetric 9/7 biorthogonal wavelets in Figure 7.15 give the best distortion rate performance for wavelet
image-transform codes. They provide an appropriate trade-off between the vanishing moments, support, and regularity requirements. This biorthogonal wavelet basis
is nearly orthogonal and thus introduces no numerical instability. They have an
efficient lifting implementation, described in Section 7.8.5. JPEG-2000 offers the
choice to use the symmetric 5/3 biorthogonal wavelets,which can be implemented
10.5 Image-Compression Standards
FIGURE 10.15
JPEG-2000 transform coding: left column, 0.2 bit/pixel; right column, 0.05 bit/pixel.
525
526
CHAPTER 10 Compression
with fewer lifting operations and exact integer operations. The 5/3 wavelet coefficients can be coded as scaled integers and thus can be restored exactly if the
quantization step is sufficiently small. This provides a lossless coding mode to JPEG2000, which means a coding with no error. Lossless coding yields much smaller
compression ratios, typically of about 1.7 for most images [65].
Intra- and Cross-Scale Correlation
The significance maps in Figure 10.10 show that significant coefficients tend to be
aggregated along contours or in textured regions. Indeed, wavelet coefficients have
a large amplitude where the signal has sharp transitions. At each scale and for each
direction, a wavelet image coder can take advantage of the correlation between
neighbor wavelet coefficient amplitude, induced by the geometric image regularity.
This was not done by the wavelet coder from Section 10.4.1, which makes a binary
encoding of each coefficient independently from its neighbors. Taking advantage
of this intrascale amplitude correlation is an important source of improvement for
JPEG-2000.
Figure 10.10 also shows that wavelet coefficient amplitudes are often correlated
across scales. If a wavelet coefficient is large and thus significant, the coarser scale
coefficient located at the same position is also often significant. Indeed, the wavelet
coefficient amplitude often increases when the scale increases. If an image f is
uniformly Lipschitz ␣ in the neighborhood of (x0 , y0 ), then (6.58) proves that for
l
wavelets j,p,q
located in this neighborhood, there exists A ⭓ 0 such that
l
| f , j,p,q
| ⭐ A 2j(␣⫹1) .
The worst singularities are often discontinuities, so ␣ ⭓ 0. This means that in the
neighborhood of singularities without oscillations, the amplitude of wavelet coefficients decreases when the scale 2 j decreases. This property is not always valid,
in particular for oscillatory patterns. High-frequency oscillations create coefficients
at large scales 2 j that are typically smaller than at the fine scale that matches the
period of oscillation.
To take advantage of such correlations across scales,wavelet zero-trees have been
introduced by Lewis and Knowles [348]. Shapiro [432] used this zero-tree structure
to code the embedded significance maps of wavelet coefficients by relating these
coefficients across scales with quad-trees. This was further improved by Said and
Pearlman [422] with a set partitioning technique. Yet, for general natural images,
the coding improvement obtained by algorithms using cross-scale correlation of
wavelet coefficient amplitude seems to be marginal compared to approaches that
concentrate on intrascale correlation due to geometric structures. This approach
was, therefore, not retained by the JPEG-2000 expert group.
Weighted Quantization and Regions of Interest
Visual distortions introduced by quantization errors of wavelet coefficients depend
on the scale 2 j . Errors at large scales are more visible than at fine scales [481].
This can be taken into account by quantizing the wavelet coefficients with intervals
10.5 Image-Compression Standards
⌬j ⫽ ⌬ wj that depend on the scale 2 j . For R̄ ⭐ 1 bit/pixel,wj ⫽ 2⫺j is appropriate for
the three finest scales. The distortion in (10.34) shows that choosing such weights
is equivalent to minimizing a weighted mean-square error.
Such a weighted quantization is implemented like in (10.35) by quantizing
weighted wavelet coefficients fB [m]/wj with a uniform quantizer. The weights
are inverted during the decoding process. JPEG-2000 supports a general weighting
scheme that codes weighted coefficients w[m]fB [m] where w[m] can be designed
to emphasize some region of interest ⍀ ⊂ [0, 1]2 in the image. The weights are set
l
where the supto w[m] ⫽ w ⬎ 1 for the wavelet coefficients fB [m] ⫽ f , j,p,q
l
port of j,p,q intersects ⍀. As a result, the wavelet coefficients inside ⍀ are given a
higher priority during the coding stage, and the region ⍀ is coded first within the
compressed stream. This provides a mechanism to more precisely code regions of
interest in images—for example, a face in a crowd.
Overview of the JPEG-2000 Coder
The JPEG-2000 compression standard [65] implements a generic embedded coder,
described in Section 10.4.2, and takes advantage of the intrascale dependencies
between wavelet coefficients of natural images. The three primitive operations of
the algorithm, which code the significance, the sign, and the amplitude refinement,
are implemented with a binary adaptive arithmetic coder. This coder exploits the
aggregation of large-amplitude wavelet coefficients by creating a small number of
context tokens that depend on the neighbors of each coefficient. The coding of a
binary symbol uses a conditional probability that depends on the context value of
this coefficient.
Wavelet coefficients are subdivided in squares (code blocks) that are coded independently. Each code block Sk is a square of L ⫻ L coefficients { fB [m]}m∈Sk , with
typical sizes L ⫽ 32 or L ⫽ 64. The coding of each code block Sk follows the generic
embedded coder detailed in Section 10.4.2 and generates a binary stream c k . This
stream is composed of substreams c k ⫽ (c1k , c2k , . . .) where each cnk corresponds to
the bit plane of the nth bit. The whole set of substreams {cnk }k,n is reorganized in
a global embedded binary stream c that minimizes the rate distortion for any bit
budget. The square segmentation improves the scalability of the bit stream, since
the generated code can be truncated optimally at a very large number of points in
the stream. It also provides more efficient, parallel, and memory friendly implementation of the wavelet coder. Figure 10.16 illustrates the successive coding steps of
JPEG-2000.
Conditional Coding of Bit Planes
The coefficients fB [m] in each code block are processed sequentially using an ordering of the positions m ∈ Sk . The specific choice of ordering used in the JPEG-2000
standard is depicted in Figure 10.17. It scans each square in bands of size 4 ⫻ L
coefficients.
The conditional coding of coefficients uses an instantaneous significance [m]
for each bit plane n. If the coefficient has not yet been processed,then [m] carries
527
528
CHAPTER 10 Compression
c1k
c2k
c3k
Code block Sk
Wavelet coefficients
Bit planes
Substreams
Output stream c
c1k
c2k
c3k
FIGURE 10.16
Overview of JPEG-2000 compression process.
3 3 3 Context window
Code block width
FIGURE 10.17
JPEG-2000 scans square blocks of L ⫻ L wavelet coefficients, as bands of size 4 ⫻ L.
the value of the previous pass, and [m] ⫽ 0 if | fB [m]| ⭓ 2n⫹1 , and [m] ⫽ 1 otherwise. If the coefficient has already been processed, then [m] ⫽ 0 if | fB [m]| ⭓ 2n ,
and [m] ⫽ 1 otherwise. Observe that [m] can be computed by both the coder
and the decoder from the information that has already been processed.
In the following, we explain how the three primitive operations—significance,
sign, and amplitude refinement coding—of the generic embedded coder are
implemented using a conditional arithmetic coder.
Significance Coding
If the coefficient of index m was insignificant at the previous bit planes, meaning
| fB [m]| ⭐ 2n⫹1 , JPEG-2000 encodes the significance bit bn [m] ∈ {0, 1}. JPEG-2000
10.5 Image-Compression Standards
takes advantage of redundancies between neighboring coefficients in the square
Sk . A conditional arithmetic coder stores the value of bn [m] by using a conditional
probability distribution,
p, ⫽ P bn [m] ⫽ | 0 [m] ⫽
for ∈ {0, 1} and ∈ {0, . . . , 0 max ⫺ 1},
l
where the context 0 [m] for a coefficient fB [m] ⫽ f , j,p,q
at position ( p, q) is
evaluated by both the compressor and decompressor from a set of fixed rules using
the significance [( p, q) ⫹ ] with ⫽ (⫾ 1, ⫾ 1) in a 3 ⫻ 3 context window around
the position ( p, q). These rules can be found in [65].
Sign Coding
If the coefficient of index m has been coded as significant,which means that bn [m] ⫽
1, then JPEG-2000 encodes its sign s[m] ⫽ sign( fB [m]) ∈ {⫹1, ⫺1} with an adaptive
arithmetic coder that also uses a conditional probability:
P s[m] ⫽ | s [m] ⫽
for ∈ {⫹1, ⫺1} and ∈ {0, . . . , s max ⫺ 1}.
The context depends on the 3 ⫻ 3 neighboring coefficients,which can be significant
and positive, significant and negative, or insignificant, thus allowing 81 unique configurations. JPEG-2000 uses a reduced set of only s max ⫽ 5 context values for s [m]
that are calculated from horizontal and vertical sign agreement indicators [65].
Amplitude Refinement Coding
If the coefficient is already significant from the previous bit plane, which means
that | fB [m]| ⬎ 2n⫹1 , then JPEG-2000 codes ␥n [m], which is the nth bit of | fB [m]|,
to refine the amplitude of the quantized coefficients, with an adaptive arithmetic
coder using a conditional probability:
P ␥n [m] ⫽ | a [m] ⫽
for ∈ {0, 1} and ∈ {0, . . . , a max ⫺ 1}.
Figure 10.9 shows that the histograms of wavelet coefficients of natural images
are usually highly peaked around zero. For a small quantized value Q(| fB [m]|), the
event ␥n [m] ⫽ 0 is thus more likely than the event ␥n [m] ⫽ 1. In contrast, for a large
quantized value Q(| fB [m]|), both events have approximately the same probability.
JPEG-2000 uses a context a [m] that discriminates these two cases by checking
whether | fB [m]| ⬎ 2n⫹2 (one already knows that | fB [m]| ⬎ 2n⫹1 ). This context also
takes into account the significance of the neighboring coefficients by using the
significance context 0 [m].
Optimal Truncation Points and Substream Packing
The bit plane coding algorithm generates an embedded bit stream c k for each code
block { fB [m]}m∈Sk of wavelet coefficients. Each bit stream c k has a set of valid
truncation points {Rknmax , Rknmax ⫺1 , . . .} that corresponds to the end of the coding
pass for each bit plane.The quantity Rkn ⫺ Rkn⫹1 is the number of bits of the substream
cnk generated by the coding of the bit plane of index n of the code block Sk .The goal
529
530
CHAPTER 10 Compression
of the rate distortion optimization is to pack together all the substreams {cnk }k,n in
an optimized order so as to minimize the resulting global distortion when the code
is interrupted at a truncation point.
JPEG-2000 offers a mechanism of fractional bit plane coding that further
increases the number of these valid truncation points. These fractional coding
passes do not code the whole significance map bn at once, but rather begin by coding only those coefficient positions ( p, q) that have at least one significant neighbor
( p, q) ⫹ such that [( p, q) ⫹ ] ⫽ 1. The sign and magnitude refinement pass are
then applied, and afterward the coefficients that have not been processed in the
current bit plane are encoded. In the following, we assume a set of valid truncation
points {Rkn }n for each code block Sk .
Assuming an orthogonal wavelet transform,the distortion at the truncation point
Rkn , after the coding of the bit plane of index n, is
dnk ⫽
|Q( fB [m]) ⫺ fB [m]|2 ,
(10.70)
m∈Sk
where Q is the quantizer of bin size 2n . This distortion can be computed by the
encoder from the already coded bit planes of indexes greater than n. JPEG-2000
uses the biorthogonal 9/7 wavelet transform, which is close to being orthogonal, so
the distortion computation 10.70,although not exact,is accurate enough in practice.
For a given number of bits R, we must find for each code block Sk an optimized
truncation point Rknk of index nk that solves
min
dnkk subject to
Rknk ⭐ R.
(10.71)
{nk }k
k
k
This optimal set {nk }k depends on the desired total number of bits R. To obtain
an embedded stream c, the rate distortion optimization (10.70) must be computed
for an increasing set of max bit budgets {R() }0⭐⬍max . For each number of bits
()
R() , (10.70) is solved with R ⫽ R() , defining a set nk of optimal truncation points.
The final stream c is obtained by successively appending the substreams cnk for
(⫺1)
()
⬍ n ⭐ nk .
nk
Rate Distortion Optimization
To build the final stream c, the optimization (10.70) is solved for a large number of
bit budgets R ⫽ R() .As in Section 10.3.1,following a classic distortion rate approach
[392, 435], the constraint minimization is replaced by a Lagrangian optimization.
The resulting rate distortion Lagrangian over all square blocks Sk is
k
L {dnkk }k , {Rknk }k ⫽
dnk ⫹ Rknk .
k
A set of indexes
{nk }k that minimizes L is necessarily a minimizer of (10.70) for a
bit budget R ⫽ k Rknk . The optimization (10.70) is performed by minimizing L for
many values of and by retaining the smallest value of that guarantees k Rknk ⭐ R.
10.6 Exercises
For a fixed , to minimize L, an independent minimization is first done over each
block Sk of the partial Lagrangian dnk ⫹ Rkn . In each block Sk , it gives truncation
points that are associated to a . These truncation points are then globally recombined across all blocks Sk by ordering them according to their Lagrange variable ,
which provides the global sequence of all truncation points and defines the final
embedded JPEG-2000 stream.
10.6 EXERCISES
10.1
Let X be a random variable that takes its values in {xk }1⭐k⭐7 with
probabilities {0.49 , 0.26 , 0.12 , 0.04 , 0.04 , 0.03 , 0.02}.
(a) Compute the entropy H(X). Construct a binary Huffman code and
calculate the average bit rate RX .
(b) Suppose that the symbols are coded with digits that may take three
values (⫺1, 0, 1) instead of two as in a bit representation.Variable-length
ternary prefix codes can be represented with ternary trees. Extend the
Huffman algorithm to compute a ternary prefix code for X that has a
minimal average length.
10.2
1 Let x be the symbol of highest probability of a random variable X, and l
1
1
1
the length of its binary word in a Huffman code. Show that if p1 ⬎ 2/5, then
l1 ⫽ 1. Verify that if p1 ⬍ 1/3, then l1 ⭓ 2.
10.3
1 Let X be a random variable equal to x or x with probabilities p ⫽ 1 ⫺
1
2
1
and p2 ⫽ . Verify that H(X) converges to 0 when goes to 0. Show that
the Huffman code has an average number of bits that converges to 1 when
goes to 0.
10.4
2 Prove the Huffman code Theorem 10.2.
10.5
2 Let H ⫽ (N ⫺ M) log (N /N ⫺ M) ⫹ M log (N /M) be the entropy of a
0
2
2
binary coding of the position of M significant coefficients among N as in
N
and compute the difference between the
(10.49). Show that H0 ⭐ log2 M
two, by using the Stirling formula limn→⬁ (2n)⫺1/2 (n/e)⫺n !n ⫽ 1.
10.6
2 Let X be a random variable with a probability density p(x). Let Q be a
quantizer with quantization bins that are {( yk⫺1 , yk ]}1⭐k⭐K .
(a) Prove that E{|X ⫺ Q(X)|2 } is minimum if and only if
yk
x p(x) dx
y
for x ∈ ( yk⫺1 , yk ].
Q(x) ⫽ xk ⫽ k⫺1
yk
yk⫺1 p(x) dx
(b) Suppose that p(x) is a Gaussian with variance 2 . Find x0 and x1 for a
“1 bit” quantizer defined by y0 ⫽ ⫺⬁, y1 ⫽ 0, and y2 ⫽ ⫹⬁.
531
532
CHAPTER 10 Compression
10.7
1 Consider a pulse code modulation that quantizes each sample of a Gaussian
random vector F [n] and codes it with an entropy code that uses the same
number of bits for each n. If the high-resolution quantization hypothesis is
satisfied, prove that the distortion rate is
d(R̄) ⫽
10.8
3 Let d ⫽
e
E{F 2 } 2⫺2R̄ .
6
N ⫺1
m⫽0 dm be the total distortion of a transform code. We suppose
that the distortion rate dm (r) for coding the mth coefficient is convex. Let
⫺1
R⫽ N
m⫽0 Rm be the total number of bits.
(a) Prove with the distortion rate Lagrangian that there exists a unique bit
allocation that minimizes d(R) for R fixed,and that it satisfies ⭸dm⭸r(Rm ) ⫽
⫺ where is a constant that depends on R.
(b) To impose that each Rm is a positive integer, we use a greedy iterative
algorithm that allocates the bits one by one. Let {Rm,p }0⭐m⬍N be the
bit allocation after p iterations, which means that a total of p bits have
been allocated. The next bit is added to Rk,p such that
⭸dk (Rk,p ) ⫽ max ⭸dm (Rm,p ) .
0⭐m⬍N ⭸r
⭸r
Justify this strategy. Prove that this algorithm gives an optimal solution
if all curves dm (r) are convex and if dm (n ⫹ 1) ⫺ dm (n) ≈ ⭸dm⭸r(n) for all
n ∈ N.
10.9
2 Let X[m] be a binary first-order Markov chain, which is specified by
the transition probabilities p01 ⫽ Pr{X[m] ⫽ 1 | X[m ⫺ 1] ⫽ 0}, p00 ⫽ 1 ⫺ p01 ,
p10 ⫽ Pr{X[m] ⫽ 0 | X[m ⫺ 1] ⫽ 1}, and p11 ⫽ 1 ⫺ p10 .
(a) Prove that p0 ⫽ Pr{X[m] ⫽ 0} ⫽ p10 /( p10 ⫹ p01 ) and that
p1 ⫽ Pr{X[m] ⫽ 1} ⫽ p01 /( p10 ⫹ p01 ).
(b) A run-length code records the length Z of successive runs of 0 values of
X[m] and the length I of successive runs of 1. Show that if Z and I are
entropy coded,the average number of bits per sample of the run-length
code, denoted R̄, satisfies
R̄ ⭓ R̄min ⫽ p0
H(Z)
E{Z}
⫹ p1
H(I )
E{I }
.
(c) Let H0 ⫽ ⫺p01 log2 p01 ⫺ (1 ⫺ p01 ) log2 (1 ⫺ p01 ) and H1 ⫽ ⫺p10 log2
p10 ⫺ (1 ⫺ p10 ) log2 (1 ⫺ p10 ). Prove that
R̄min ⫽ H(X) ⫽ p0 H0 ⫹ p1 H1 ,
which is the average information gained by moving one step ahead in
the Markov chain.
(d) Suppose that the binary significance map of the transform code of a
signal of size N is a realization of a first-order Markov chain. We denote
10.6 Exercises
␣ ⫽ 1/E{Z} ⫹ 1/E{I }. Let M be the number of significant coefficients
(equal to 1). If M N , then show that
R̄min ≈
M
N
␣ log2
N
⫹
M
(10.72)
with  ⫽ ␣ log2 e ⫺ 2␣ log2 ␣ ⫺ (1 ⫺ ␣) log2 (1 ⫺ ␣).
(e) Implement a run-length code for the binary significance maps of wavelet
l
image coefficients djl [n, m] ⫽ f , j,n,m
for j and l fixed. See whether
(10.72) approximates the bit rate R̄ calculated numerically as a function
of N /M for the Lena and Barbara images. How does ␣ vary depending
on the scale 2 j and the orientation l ⫽ 1, 2, 3?
10.10
4 Implement a transform code in a block cosine basis with an arithmetic code
and with a local cosine transform over blocks of the same size. Compare
the compression rates in DCT-I and local cosine bases, as well as the visual
image quality for R̄ ∈ [0.2, 1].
10.11
4 Implement a wavelet transform code for color images. Transform the red,
green, and blue channels in the color Karhunen-Loève basis calculated in
Exercise 9.2 or with any standard color-coordinate system such as Y , U , V .
Perform a transform code in a wavelet basis with the multichannel decomposition (12.157), which uses the same approximation support for all color
channels. Use an arithmetic coder to binary encode together the three color
coordinates of wavelet coefficients. Compare numerically the distortion rate
of the resulting algorithm with the distortion rate of a wavelet transform
code applied to each color channel independently.
10.12
4 Develop a video compression algorithm in a three-dimensional wavelet
basis [474]. In the time direction,choose a Haar wavelet in order to minimize
the coding delay. This yields zero coefficients at locations where there is no
movement in the image sequence. Implement a separable three-dimensional
wavelet transform and an arithmetic coding of quantized coefficients.
Compare the compression result with an MPEG-2 motion-compensated
compression in a DCT basis.
533
CHAPTER
Denoising
11
Removing noise from signals is possible only if some prior information is available.
This information is encapsulated in an operator designed to reduce the noise while
preserving the signal. Ideally, the joint probability distribution of the signal and
the noise is known. Bayesian calculations then derive optimal operators that minimize the average estimation error. However,such probabilistic models are often not
available for complex signals such as natural images.
Simpler signal models can be incorporated in the design of a basis or a frame,
which takes advantage of known signal properties to build a sparse representation.
Efficient nonlinear estimators are then computed by thresholding the resulting coefficients. For one-dimensional signals and images,thresholding estimators are studied
in wavelet bases,time-frequency representations,and curvelet frames. Block thresholdings are introduced to regularize these operators,which improves the estimation
of audio recordings and images.
The optimality of estimators is analyzed in a minimax framework, where the
maximum estimation error is minimized over a predefined set of signals. When
signals are not uniformly regular,nonlinear thresholding estimators in wavelet bases
are shown to be much more efficient than linear estimators; they nearly reach the
minimax risk over different signal classes, such as bounded variation signals and
images.
11.1 ESTIMATION WITH ADDITIVE NOISE
Digital acquisition devices, such as cameras or microphones, output noisy measurements of an incoming analog signal f̄ (x). These measurements can be modeled by
a filtering of f̄ (x) with the sensor responses ¯ n (x), to which is added a noise W [n]:
X[n] ⫽ f̄ , ¯ n ⫹ W [n] for 0 ⭐ n ⬍ N .
(11.1)
The noise W incorporates intrinsic physical fluctuations of the incoming signal. For example, an image intensity with low illumination has a random variation
depending on the number of photons captured by each sensor. It also includes
noises introduced by the measurement device, such as electronic noises or transmission errors. The aggregated noise W is modeled by a random vector that has a
probability distribution that is supposed to be known a priori and is often Gaussian.
535
536
CHAPTER 11 Denoising
Let us denote the discretized signal by f [n] ⫽ f̄ , ¯ n . This analog-to-digital
acquisition is supposed to be stable, so that, according to Section 3.1.3, an analog
signal approximation of f̄ (x) can be recovered from f [n] with a linear projector.
One must then optimize an estimation F̃ [n] ⫽ D X[n] of f [n] calculated from the
noisy measurements (11.1) that are rewritten as
X[n] ⫽ f [n] ⫹ W [n]
for
0⭐n⬍N.
The decision operator D is designed to minimize the estimation error f ⫺ F̃ ,
measured by a loss function.
For audio signals or images, the loss function should measure the perceived
audio or visual degradation. A mean-square distance is certainly not a perfect model
of perceptual degradation, but it is mathematically simple and sufficiently precise
in most applications. Throughout this chapter, the loss function is thus chosen to
be a square Euclidean norm. The risk of the estimator F̃ of f is the average loss,
calculated with respect to the probability distribution of the noise W :
r(D, f ) ⫽ E{ f ⫺ DX2 }.
(11.2)
The decision operator D is optimized with the prior information available on
the signal. The Bayes framework supposes that signals are realizations of a random
vector that has a known probability distribution, and a Bayes estimator minimizes
the expected risk. A major difficulty is to acquire enough information to model this
prior probability distribution. The minimax framework uses simpler deterministic
models, which define signals as elements of a predefined set ⌰. The expected risk
cannot be computed,but the maximum risk can be minimized over ⌰. Section 11.1.2
relates minimax and Bayes estimators through the minimax theorem.
11.1.1 Bayes Estimation
A Bayesian model considers signals f are realizations of a random vector F with a
probability distribution known a priori. This probability distribution is called the
prior distribution. The noisy data are thus rewritten as
X[n] ⫽ F [n] ⫹ W [n]
for
0⭐n⬍N.
We suppose that noise and signal values W [k] and F [n] are independent for any
0 ⭐ k, n ⬍ N .The joint distribution of F and W is thus the product of the distributions
of F and W . It specifies the conditional probability distribution of F given the
observed data X, also called the posterior distribution. This posterior distribution
is used to optimize the decision operator D that computes an estimation F̃ ⫽ DX of
F from the data X.
The Bayes risk is the expected risk calculated with respect to the prior
probability distribution of the signal:
r(D, ) ⫽ E {r(D, F )}.
11.1 Estimation with Additive Noise
By inserting (11.2), it can be rewritten as an expected value for the joint probability
distribution of the signal and the noise:
r(D, ) ⫽ E{F ⫺ F̃ 2 } ⫽
N
⫺1
E{|F [n] ⫺ F̃ [n]|2 }.
n⫽0
Let On be the set of all operators (linear and nonlinear) from CN to CN . Optimizing
D yields the minimum Bayes risk:
rn () ⫽ inf r(D, ).
D∈On
Theorem 11.1 proves that there exist a Bayes decision operator D and a corresponding Bayes estimator F̃ that achieve this minimum risk.
Theorem 11.1. The Bayes estimator F̃ that yields the minimum Bayes risk rn () is the
conditional expectation
F̃ [n] ⫽ E{F [n] | X[0], X[1], . . . , X[N ⫺ 1]}.
(11.3)
Proof. Let n ( y) be the probability distribution of the value y of F [n]. The minimum risk is
obtained by finding F̃ [n] ⫽ Dn (X) that minimizes r(Dn , n ) ⫽ E{|F [n] ⫺ F̃ [n]|2 } for each
0 ⭐ n ⬍ N . This risk depends on the conditional distribution Pn (x| y) of the data X ⫽ x
given F [n] ⫽ y:
r(Dn , n ) ⫽
(Dn (x) ⫺ y)2 dPn (x| y) dn ( y).
Let P(x) ⫽ Pn (x| y) dn ( y) be the marginal distribution of X and n ( y|x) be the
posterior distribution of F [n] given X. The Bayes formula gives
r(Dn , n ) ⫽
(Dn (x) ⫺ y)2 dn ( y|x) dP(x).
The double integral is minimized by minimizing the inside integral for each x. This
quadratic form is minimum when its derivative vanishes:
⭸
⭸Dn (x)
(Dn (x) ⫺ y)2 dn ( y|x) ⫽ 2
(Dn (x) ⫺ y) dn ( y|x) ⫽ 0,
which implies that
Dn (x) ⫽
so Dn (X) ⫽ E{F [n] | X}.
y dn ( y|x) ⫽ E{F [n] | X ⫽ x},
■
Linear Estimation
The conditional expectation (11.3) is generally a complicated nonlinear function
of the data {X[k]}0⭐k⬍N , and is difficult to evaluate. To simplify this problem, we
537
538
CHAPTER 11 Denoising
restrict the decision operator D to be linear. Let Ol be the set of all linear operators
from CN to CN . The linear minimum Bayes risk is:
rl () ⫽ inf r(D, ).
D∈Ol
The linear estimator F̃ ⫽ DX that achieves this minimum risk is called the Wiener
estimator. Theorem 11.2 gives a necessary and sufficient condition that specifies
this estimator. We suppose that E{F [n]} ⫽ 0, which can be enforced by subtracting
E{F [n]} from X[n] to obtain a zero-mean signal.
Theorem 11.2. A linear estimator F̃ is a Wiener estimator if and only if
E{(F [n] ⫺ F̃ [n]) X[k]} ⫽ 0
for
0 ⭐ k, n ⬍ N .
(11.4)
Proof. For each 0 ⭐ n ⬍ N , we must find a linear estimation
F̃ [n] ⫽ Dn X ⫽
N
⫺1
h[n, k] X[k],
k⫽0
which minimizes
r(Dn , n ) ⫽ E
F [n] ⫺
N
⫺1
h[n, k] X[k]
F [n] ⫺
k⫽0
N
⫺1
h[n, k] X[k]
.
(11.5)
k⫽0
The minimum of this quadratic form is reached if and only if for each 0 ⭐ k ⬍ N ,
⭸r(Dn , n )
⫽ ⫺2 E
⭸h[n, k]
F [n] ⫺
N
⫺1
h[n, l] X[l] X[k] ⫽ 0,
l⫽0
■
which verifies (11.4).
If F and W are independent Gaussian random vectors, then the linear optimal
estimator is also optimal among nonlinear estimators. Indeed, two jointly Gaussian
random vectors are independent if they are noncorrelated [53]. Since F [n] ⫺ F̃ [n]
is jointly Gaussian with X[k], the noncorrelation (11.4) implies that F [n] ⫺ F̃ [n]
and X[k] are independent for any 0 ⭐ k, n ⬍ N . In this case, we can verify that F̃ is
the Bayes estimator (11.3): F̃ [n] ⫽ E{F [n] | X}. Theorem 11.3 computes the Wiener
estimator from the covariance RF and RW of the signal F and of the noise W . The
properties of covariance operators are described in Section A.6 of the Appendix.
Theorem 11.3: Wiener. If the signal F and the noise W are independent random vectors of covariance RF and RW, then the linear Wiener estimator F̃ ⫽ DF that minimizes
E{F̃ ⫺ F 2 } is
F̃ ⫽ RF (RF ⫹ RW )⫺1 X.
(11.6)
Proof. Let F̃ [n] be a linear estimator of F [n]:
F̃ [n] ⫽
N
⫺1
l⫽0
h[n, l] X[l].
(11.7)
11.1 Estimation with Additive Noise
This equation can be rewritten as a matrix multiplication by introducing the N ⫻ N matrix
H ⫽ (h[n, l])0⭐n,l⬍N :
F̃ ⫽ H X.
(11.8)
Theorem 11.2 proves that an optimal linear estimator satisfies the noncorrelation
condition (11.4), which implies that for 0 ⭐ n, k ⬍ N ,
E{F [n] X[k]} ⫽ E{F̃ [n] X[k]} ⫽
N
⫺1
h[n, l] E{X[l] X[k]}.
l⫽0
Since X[k] ⫽ F [k] ⫹ W [k] and E{F [n] W [k]} ⫽ 0, it results that
E{F [n] F [k]} ⫽
N
⫺1
h[n, l] E{F [l] F [k]} ⫹ E{W [l] W [k]} .
(11.9)
l⫽0
Let RF and RW be the covariance matrices of F and W , defined by E{F [n] F [k]} and
E{W [n] W [k]}, respectively. Equation (11.9) can be rewritten as a matrix equation:
RF ⫽ H (RF ⫹ RW ).
Inverting this equation gives
H ⫽ RF (RF ⫹ RW )⫺1 .
■
The optimal linear estimator (11.6) is simple to compute since it only depends
on second-order covariance moments of the signal and of the noise.
Estimation in a Karhunen-Loève Basis
Since a covariance operator is symmetric, it is diagonalized in an orthonormal basis
that is called a Karhunen-Loève basis or a basis of principal components. If the
covariance operators RF and RW are diagonal in the same Karhunen-Loève basis
B ⫽ {gm }0⭐m⬍N , then Corollary 11.1 derives from Theorem 11.3 that the Wiener
estimator is diagonal in this basis. We write
XB [m] ⫽ X, gm , FB [m] ⫽ F , gm , F̃B [m] ⫽ F̃ , gm ,
WB [m] ⫽ W , gm and B [m]2 ⫽ E{|W , gm |2 }.
Corollary 11.1. If there exists a Karhunen-Loève basis B ⫽ {gm }0⭐m⬍N that diagonalizes
the covariance matrices RF and RW of F and W , then the Wiener estimator that minimizes
E{F̃ ⫺ F 2 } is
F̃ ⫽
N
⫺1
E{|FB [m]|2 }
XB [m] gm .
E{|FB [m]|2 } ⫹ B [m]2
m⫽0
(11.10)
539
540
CHAPTER 11 Denoising
The resulting minimum linear Bayes risk is
rl () ⫽
N
⫺1
E{|FB [m]|2 } B [m]2
.
E{|FB [m]|2 } ⫹ B [m]2
m⫽0
(11.11)
Proof. The diagonal values of RF and RW are RF gm , gm ⫽ E{|FB [m]|2 } and RW gm , gm ⫽
2 [m]. Since R
E{|WB [m]|2 } ⫽ B
F and RW are diagonal in B, the linear operator
RF (RF ⫹ RW )⫺1 in (11.6) is also diagonal in B, with diagonal values equal to
E{|FB [m]|2 } (E{|FB [m]|2 } ⫹ B [m]2 )⫺1 . So (11.6) proves that the Wiener estimator is
F̃B ⫽ RF (RF ⫹ RW )⫺1 X ⫽
N
⫺1
E{|FB [m]|2 }
XB [m] gm ,
E{|FB [m]|2 } ⫹ B [m]2
m⫽0
(11.12)
which proves (11.10).
The resulting risk is
E{F ⫺ F̃ 2 } ⫽
N
⫺1
E |FB [m] ⫺ F̃B [m]|2 .
(11.13)
m⫽0
Inserting (11.12) in (11.13) with XB [m] ⫽ FB [m] ⫹ WB [m], where FB [m] and WB [m]
are independent, yields (11.11).
■
This corollary proves that the Wiener estimator is implemented with a diagonal attenuation of each data coefficient XB [m] by a factor that depends on
the signal-to-noise ratio E{|FB [m]|2 }/B [m]2 in the direction of gm . The smaller
the signal-to-noise ratio (SNR), the more attenuation is required. If F and W are
Gaussian processes,then theWiener estimator is optimal among linear and nonlinear
estimators of F .
If W is a white noise, then its coefficients are uncorrelated with the same
variance:
E{W [n] W [k]} ⫽ 2 ␦[n ⫺ k].
Its covariance matrix is therefore RW ⫽ 2 Id. It is diagonal in all orthonormal
bases and, in particular, in a Karhunen-Loève basis of F . Thus,Theorem 11.1 can be
applied with B [m] ⫽ for 0 ⭐ m ⬍ N .
Frequency Filtering
Suppose that F and W are zero-mean,wide-sense circular stationary random vectors.
The properties of such processes are reviewed in Section A.6 of the Appendix. Their
covariance satisfies
E {F [n] F [k]} ⫽ RF [n ⫺ k],
E {W [n] W [k]} ⫽ RW [n ⫺ k],
where RF [n] and RW [n] are N periodic. These matrices correspond to circular
convolution operators and are therefore diagonal in the discrete Fourier basis
i2mn
1
gm [n] ⫽ √ exp
.
N
N
0⭐m⬍N
11.1 Estimation with Additive Noise
The eigenvalues E{|FB [m]|2 } and B [m]2 are the discrete Fourier transforms of
RF [n] and RW [n], also called power spectra:
E{|FB [m]|2 } ⫽
B [m]2 ⫽
⫺i2mn
RF [n] exp
⫽ R̂F [m],
N
n⫽0
N
⫺1
⫺i2mn
RW [n] exp
⫽ R̂W [m].
N
n⫽0
N
⫺1
The Wiener estimator (11.10) is then a diagonal operator in the discrete Fourier
basis, computed with the frequency filter
ĥ[m] ⫽
R̂F [m]
R̂F [m] ⫹ R̂W [m]
.
(11.14)
It is therefore a circular convolution:
F̃ [n] ⫽ DX ⫽ X h[n].
The resulting risk is calculated with (11.11):
rl () ⫽ E{F ⫺ F̃ 2 } ⫽
N
⫺1
R̂F [m] R̂W [m]
m⫽0 R̂F [m] ⫹ R̂W [m]
.
(11.15)
The numerical value of the risk is often specified by the signal-to-noise ratio, which
is measured in decibels:
E{F 2 }
.
(11.16)
SNR db ⫽ 10 log10
E{F ⫺ F̃ 2 }
EXAMPLE 11.1
Figure 11.1(a) shows a realization of a Gaussian process F obtained as a convolution of a
Gaussian white noise B of variance 2 with a low-pass filter g:
F [n] ⫽ B g[n],
with
g[n] ⫽ C cos2
n
2K
1[⫺K ,K ] [n].
Theorem A.7 proves that
R̂F [m] ⫽ R̂B [m] |ĝ[m]|2 ⫽ 2 |ĝ[m]|2 .
541
542
CHAPTER 11 Denoising
100
100
50
50
0
0
⫺50
⫺50
⫺100
0
0.2
0.4
0.6
0.8
1
⫺100
0
0.2
0.4
(a)
0.6
0.8
1
( b)
100
50
0
⫺50
⫺100
0
0.2
0.4
0.6
0.8
1
(c)
FIGURE 11.1
(a) Realization of a Gaussian process F . (b) Noisy signal obtained by adding a Gaussian white
noise (SNR ⫽ ⫺0.48 db). (c) Wiener estimation F̃ (SNR ⫽ 15.2 db).
The noisy signal X shown in Figure 11.1(b) is contaminated by a Gaussian white noise W of
variance 2 , so R̂W [m] ⫽ 2 . The Wiener estimation F̃ is calculated with the frequency filter
(11.14)
ĥ[m] ⫽
2 |ĝ[m]|2
.
2 |ĝ[m]|2 ⫹ 2
This linear estimator is also an optimal nonlinear estimator because F and W are jointly
Gaussian random vectors.
Piecewise Regular
The limitations of linear estimators appear clearly for processes with realizations
that are piecewise regular signals. A simple example is a random-shift process F
constructed
by translating randomly a piecewise regular signal f [n] of zero-mean,
N ⫺1
f
[n]
⫽
0:
n⫽0
F [n] ⫽ f [(n ⫺ Q) mod N ].
(11.17)
11.1 Estimation with Additive Noise
The translation variable Q is an integer random variable with a probability distribution on [0, N ⫺ 1]. It is proved in (9.28) that F is a circular wide-sense stationary
process with a power spectrum calculated in (9.29):
R̂F [m] ⫽
1
| f̂ [m]|2 .
N
(11.18)
Figure 11.2 shows an example of a piecewise polynomial signal f of degree d ⫽ 3
contaminated by a Gaussian white noise W of variance 2 . Assuming that we know
| f̂ [m]|2 , the Wiener estimator F̃ is calculated as a circular convolution with the
filter in (11.14). This Wiener filter is a low-pass filter that averages the noisy data to
attenuate the noise in regions where the realization of F is regular,but this averaging
is limited to avoid degrading the discontinuities too much. As a result, some noise
is left in the smooth regions and the discontinuities are averaged a little. The risk
200
200
150
150
100
100
50
50
0
0
250
250
2100
0
0.2
0.4
(a)
0.6
2100
0.8
1
0.2
0.4
(c)
0
0.2
0.4
(b)
0.6
0.8
1
0.6
0.8
200
150
100
50
0
250
2100
0
FIGURE 11.2
(a) Piecewise polynomial of degree 3. (b) Noisy signal degraded by a Gaussian white noise
(SNR ⫽ 21.9 db). (c) Wiener estimation (SNR ⫽ 25.9 db).
1
543
544
CHAPTER 11 Denoising
calculated in (11.15) is normalized by the total noise energy E{W 2 } ⫽ N 2 :
N ⫺1
rl () N ⫺1 | f̂ [m]|2
⫽
.
N 2 m⫽0 | f̂ [m]|2 ⫹ N 2
(11.19)
Suppose that f has discontinuities of amplitude on the order of C ⭓ and that
the noise energy is not negligible: N 2 ⭓ C 2 . Using the fact that | f̂ [m]| decays
typically like C N m⫺1 , a direct calculation of the risk (11.19) gives
C
rl ()
∼
.
(11.20)
N 2 N 1/2
The equivalence ∼ means that upper and lower bounds of the left side are obtained
by multiplying the right side by two constants A, B ⬎ 0 that are independent of C,
, and N .
The estimation of F can be improved by nonlinear operators, which average
the data X over large domains where F is regular, but do not make any averaging
where F is discontinuous. Many estimators have been studied [262, 380],to recover
the position of the discontinuities of f in order to adapt the data averaging. These
algorithms have long remained ad hoc implementations of intuitively appealing
ideas. Wavelet thresholding estimators perform such an adaptive smoothing and
Section 11.5.3 proves that the normalized risk decays like N ⫺1 (log N )2 as opposed
to N ⫺1/2 in (11.20).
11.1.2 Minimax Estimation
Although we may have some prior information,it is rare that we know the probability
distribution of complex signals. Presently, there exists no stochastic model that
takes into account the diversity of natural images. However, many images, such as
the one in Figure 2.2, have some form of piecewise regularity, with a bounded total
variation. Models are often defined over the original analog signal f̄ that is measured
with sensors having a response ¯ n .The resulting discrete signal f [n] ⫽ f̄ , ¯ n then
belongs to a particular set ⌰ in CN derived from the analog model. This prior
information defines a signal set ⌰,but it does not specify the probability distribution
of signals in ⌰. The more prior information, the smaller the set ⌰.
Knowing that f ∈ ⌰, we want to estimate this signal from the noisy data
X[n] ⫽ f [n] ⫹ W [n].
The risk of an estimation F̃ ⫽ DX is r(D, f ) ⫽ E{DX ⫺ f 2 }.The expected risk over ⌰
cannot be computed because the probability distribution of signals in ⌰ is unknown.
To control the risk for any f ∈ ⌰, we thus try to minimize the maximum risk:
r(D, ⌰) ⫽ sup E{DX ⫺ f 2 }.
f ∈⌰
The minimax risk is the lower bound computed over all linear and nonlinear
operators D:
rn (⌰) ⫽ inf r(D, ⌰).
D∈On
11.1 Estimation with Additive Noise
In practice, we must find a decision operator D that is simple to implement and
such that r(D, ⌰) is close to the minimax risk rn (⌰).
As a first step, as for Wiener estimators in the Bayes framework, the problem is
simplified by restricting D to be a linear operator. The linear minimax risk over ⌰
is the lower bound:
rl (⌰) ⫽ inf r(D, ⌰).
D∈Ol
This strategy is efficient only if rl (⌰) is of the same order as rn (⌰).
Bayes Priors
A Bayes estimator supposes that we know the prior probability distribution of
signals in ⌰. If available,this supplement of information can only improve the signal
estimation. The central result of game and decision theory shows that minimax
estimations are Bayes estimations for a “least-favorable” prior distribution.
Let F be the signal random vector with a probability distribution that is given by
the prior . For a decision operator D, the expected risk is r(D, ) ⫽ E {r(D, F )}.
The minimum Bayes risks for linear and nonlinear operators are defined by:
rl () ⫽ inf r(D, )
D∈Ol
and
rn () ⫽ inf r(D, ).
D∈On
Let ⌰∗ be the set of all probability distributions of random vectors with realizations
in ⌰. The minimax theorem (11.4) relates a minimax risk and the maximum Bayes
risk calculated for priors in ⌰∗ .
Theorem 11.4: Minimax. For any subset ⌰ of CN ,
rl (⌰) ⫽ sup rl ()
∈⌰∗
and
rn (⌰) ⫽ sup rn ().
∈⌰∗
(11.21)
Proof. For any ∈ ⌰∗ ,
r(D, ) ⭐ r(D, ⌰)
(11.22)
because r(D, ) is an average risk over realizations of F that are in ⌰, whereas r(D, ⌰)
is the maximum risk over ⌰. Let O be a convex set of operators (either Ol or On ). The
inequality (11.22) implies that
sup r() ⫽ sup inf r(D, ) ⭐ inf r(D, ⌰) ⫽ r(⌰).
∈⌰∗
∈⌰∗ D∈O
D∈O
(11.23)
The main difficulty is to prove the reverse inequality: r(⌰) ⭐ sup∈⌰∗ r(). When ⌰
is a finite set, the proof gives a geometrical interpretation of the minimum Bayes risk and
the minimax risk. The extension to an infinite set ⌰ is sketched.
Suppose that ⌰ ⫽ { fi }1⭐i⭐p is a finite set of signals. We define a risk set:
R ⫽ {( y1 , . . . , yp ) ∈ Cp : ∃ D ∈ O
with
yi ⫽ r(D, fi ) for 1 ⭐ i ⭐ p}.
This set is convex in Cp because O is convex. We begin by giving geometrical
interpretations to the Bayes risk and the minimax risk.
545
546
CHAPTER 11 Denoising
A prior ∈ ⌰∗ is a vector of discrete probabilities (1 , . . . , p ) and
r(, D) ⫽
p
i r(D, fi ).
(11.24)
i⫽1
p
The equation i⫽1 i yi ⫽ b defines a hyperplane Pb in Cp . Computing r() ⫽ inf D∈O
r(D, ) is equivalent to finding the infimum b0 ⫽ r() of all b for which Pb intersects R.
The plane Pb0 is tangent to R as shown in Figure 11.3.
The minimax risk r(⌰) has a different geometrical interpretation. Let Qc ⫽ {( y1 , . . . ,
yp ) ∈ Cp : yi ⭐ c}. One can verify that r(⌰) ⫽ inf D∈O supfi ∈⌰ r(D, fi ) is the infimum c0 ⫽
r(⌰) of all c such that Qc intersects R.
To prove that r(⌰) ⭐ sup∈⌰∗ r(), we look for a prior distribution ∈ ⌰∗ such that
r() ⫽ r(⌰). Let Q̃c0 be the interior of Qc0 . Since Q̃c0 ∩ R ⫽ ∅ and both Q̃c0 and R are
convex sets, the hyperplane separation theorem says that there exists a hyperplane of
equation
p
i yi ⫽ . y ⫽ b,
(11.25)
i⫽1
with . y ⭐ b for y ∈ Q̃c0 and . y ⭓ b for y ∈ R. Each i ⭓ 0, for if j ⬍ 0, then for y ∈ Q̃c0 , we
obtain a contradiction by taking yj to ⫺⬁ with the other coordinates being fixed. Indeed,
. y goes to ⫹⬁ and since y remains in Q̃c0 , it contradicts the fact that . y ⭐ b. We can
p
p
normalize i⫽1 i ⫽ 1 by dividing each side of (11.25) by i⫽1 i ⬎ 0. So corresponds
to a probability distribution. By letting y ∈ Q̃c0 converge to the corner point (c0 , . . . , c0 ),
since y . ⭐ b, we derive that c0 ⭐ b. Moreover, since . y ⭓ b for all y ∈ R,
r() ⫽ inf
p
D∈O
i r(D, fi ) ⭓ c ⭓ c0 ⫽ r(⌰).
i⫽1
So, r(⌰) ⭐ sup∈⌰∗ r(), which, together with (11.23), proves that r(⌰) ⫽ sup∈⌰∗ r().
r (D, f 2)
π
R
Bayes
Minimax
c0
τ
Qc
0
c0
r (D, f 1)
FIGURE 11.3
At the Bayes point, a hyperplane defined by the prior is tangent to the risk set R. The
least-favorable prior defines a hyperplane that is tangential to R at the minimax point.
11.1 Estimation with Additive Noise
The extension of this result to an infinite set of signals ⌰ is done with a compactness
argument. When O ⫽ Ol or O ⫽ On , for any prior ∈ ⌰∗ , we know from Theorems 11.1
and 11.2 that inf D∈O r(D, ) is reached by some Bayes decision operator D ∈ O. One can
verify that there exists a subset of operators C that includes the Bayes operator for any
prior ∈ ⌰∗ , and such that C is compact for an appropriate topology. When O ⫽ Ol , one
can choose C to be the set of linear operators of norm smaller than 1, which is compact
because it belongs to a finite-dimensional space of linear operators. Moreover, the risk
r( f , D) can be shown to be continuous in this topology with respect to D ∈ C.
Let c ⬍ r(⌰). For any f ∈ ⌰,we consider the set of operators Sf ⫽ {D ∈ C : r(D, f ) ⬎ c}.
The continuity of r implies that Sf is an open set. For each D ∈ C there exists f ∈ ⌰ such
that D ∈ Sf ,so C ⫽ ∪f ∈⌰ Sf . Since C is compact,there exists a finite covering C ⫽ ∪1⭐i⭐p Sfi .
The minimax risk over ⌰c ⫽ { fi }1⭐i⭐p satisfies
r(⌰c ) ⫽ inf sup r(D, fi ) ⭓ c.
D∈O 1⭐i⭐p
Since ⌰c is a finite set, we proved that there exists c ∈ ⌰∗c ⊂ ⌰∗ such that r(c ) ⫽ r(⌰c ).
But r(⌰c ) ⭓ c, so letting c go to r(⌰) implies that sup∈⌰∗ r() ⭓ r(⌰). Together with
(11.23) this shows that inf ∈⌰∗ r() ⫽ r(⌰).
■
A distribution ∈ ⌰∗ such that r() ⫽ inf ∈⌰∗ r() is called a least-favorable prior
distribution. The minimax theorem proves that the minimax risk is the minimum
Bayes risk for a least-favorable prior.
In signal processing, minimax calculations are often hidden behind apparently
orthodox Bayes estimations. Let us consider an example involving images. It has
been observed that histograms of the wavelet coefficients of “natural” images can
be modeled with generalized Gaussian distributions [361, 440].This means that natural images belong to a certain set ⌰,but it does not specify a prior distribution over
this set. To compensate for the lack of knowledge about the dependency of wavelet
coefficients spatially and across scales,one may be tempted to create a“simple probabilistic model” where all wavelet coefficients are considered to be independent.
This model is clearly simplistic since images have geometrical structures that create
strong dependencies both spatially and across scales (see Figure 7.24). However,
calculating a Bayes estimator with this inaccurate prior model may give valuable
results when estimating images. Why? Because this “simple” prior is often close to
a least-favorable prior. The resulting estimator and risk are thus good approximations of the minimax optimum. If not chosen carefully, a “simple”prior may yield an
optimistic risk evaluation that is not valid for real signals.
On the other hand, the minimax approach may seem very pessimistic since we
always consider the maximum risk over ⌰; this is sometimes the case. However,
when the set ⌰ is large, one can often verify that a “typical” signal of ⌰ has a risk of
the order of the maximum risk over ⌰.A minimax calculation attempts to isolate the
class of signals that are the most difficult to estimate, and one can check whether
these signals are indeed typically encountered in an application. If this is not the
case, then it indicates that the model specified by ⌰ is not well adapted to this
application.
547
548
CHAPTER 11 Denoising
11.2 DIAGONAL ESTIMATION IN A BASIS
It is generally not possible to compute the optimal Bayes or minimax estimator
that minimizes the risk among all possible operators. To manage this complexity,
the most classical strategy limits the choice of operators among linear operators.
This comes at a cost, because the minimum risk among linear estimators may be
well above the minimum risk obtained with nonlinear estimators. Figure 11.2 is an
example where the linear Wiener estimation can be considerably improved with a
nonlinear averaging. This section studies a particular class of nonlinear estimators
that are diagonal in a basis B. If the basis B defines a sparse signal representation,
then such diagonal estimators are nearly optimal among all nonlinear estimators.
Section 11.2.1 computes a lower bound for the risk when estimating an arbitrary
signal f with a diagonal operator. Donoho and Johnstone [221] made a fundamental
breakthrough by showing that thresholding estimators have a risk that is close to
this lower bound. The general properties of thresholding estimators are introduced
in Sections 11.2.2 and 11.2.3.
11.2.1 Diagonal Estimation with Oracles
We consider estimators computed with a diagonal operator in an orthonormal basis
B ⫽ {gm }0⭐m⬍N . Lower bounds for the risk are computed with “oracles,” which
simplify the estimation by providing information about the signal that is normally
not available. These lower bounds are closely related to errors when approximating
signals from a few vectors selected in B.
The noisy data
X[n] ⫽ f [n] ⫹ W [n]
0⭐n⬍N
for
(11.26)
are decomposed in B. We write
XB [m] ⫽ X, gm , fB [m] ⫽ f , gm and WB [m] ⫽ W , gm .
The inner product of (11.26) with gm gives
XB [m] ⫽ fB [m] ⫹ WB [m].
We suppose that W is a zero-mean white noise of variance 2 , which means
E{W [n] W [k]} ⫽ 2 ␦[n ⫺ k].
The noise coefficients
WB [m] ⫽
N
⫺1
∗
W [n] gm
[n]
n⫽0
also define a white noise of variance 2 . Indeed,
11.2 Diagonal Estimation in a Basis
E{WB [m] WB [ p]} ⫽
⫺1
N
⫺1 N
gm [n] gp [k] E{W [n] W [k]}
n⫽0 k⫽0
⫽ 2 gp , gm ⫽ 2 ␦[ p ⫺ m].
Since the noise remains white in all bases, it does not influence the choice of basis.
A diagonal operator independently estimates each fB [m] by multiplying XB [m]
by a factor am (XB [m]). The resulting estimator is
F̃ ⫽ D X ⫽
N
⫺1
am (XB [m]) XB [m] gm .
(11.27)
m⫽0
The operator D is linear when am (XB [m]) is a constant independent of XB [m].
Oracle Attenuation
For a given signal f let us find the constant am ⫽ am (XB [m]) that minimizes the
risk r(D, f ) of the estimator (11.27):
⫺1
N
r(D, f ) ⫽ E f ⫺ F̃ 2 ⫽
E{| fB [m] ⫺ am XB [m]|2 }.
(11.28)
m⫽0
We shall see that |am | ⭐ 1, which means that the diagonal operator D should
attenuate the noisy coefficients.
Since XB ⫽ fB ⫹ WB and E{|WB [m]|2 } ⫽ 2 , and since am is a constant that does
not depend on the noise, it follows that
E{| fB [m] ⫺ XB [m] am |2 } ⫽ | fB [m]|2 (1 ⫺ am )2 ⫹ 2 a2m .
(11.29)
This risk is minimum for
am ⫽
| fB [m]|2
,
| fB [m]|2 ⫹ 2
(11.30)
in which case
rinf ( f ) ⫽ E{ f ⫺ F̃ 2 } ⫽
N
⫺1
| fB [m]|2 2
.
| f [m]|2 ⫹ 2
m⫽0 B
(11.31)
Observe that the attenuation factor am and the resulting risk have the same
structure as the Wiener filter (11.10). However, the Wiener filter is a linear operator
that depends on the expected SNRs that are constant values. This attenuation factor
depends on the unknown original signal-to-noise ratio | fB [m]|2 / 2 . Since | fB [m]|
is unknown, the attenuation factor am cannot be computed. It is considered as an
oracle information.The resulting oracle risk rinf ( f ) is a lower bound that is normally
not reachable. However, Section 11.2.2 shows that one can get close to rinf ( f ) with
a simple thresholding.
549
550
CHAPTER 11 Denoising
Oracle Projection
The analysis of diagonal estimators can be simplified by restricting am ∈ {0, 1}.When
am ⫽ 1, the estimator F̃ ⫽ DX selects the coefficient XB [m], and it removes it if
am ⫽ 0. The operator D is then an orthogonal projector on a selected subset of
vectors of the basis B.
The nonlinear projector that minimizes the risk (11.29) is defined by
1 if | fB [m]| ⭓
am ⫽
(11.32)
0 if | fB [m]| ⬍ .
The resulting oracle projector is
DX ⫽
XB [m] gm with ⌳ ⫽ {0 ⭐ m ⬍ N : | fB [m]| ⭓ }.
(11.33)
m∈⌳
It is an orthogonal projection on the set {gm }m∈⌳ of basis vectors that best
approximate f . This “oracle” projector cannot either be implemented because am
and ⌳ depend on fB [m] instead of XB [m]. The resulting risk is computed with
(11.29):
rpr ( f ) ⫽ E{ f ⫺ F̃ 2 } ⫽
N
⫺1
min(| fB [m]|2 , 2 ).
(11.34)
m⫽0
Since for any (x, y) ∈ R2 ,
min(x, y) ⭓
xy
1
⭓ min(x, y),
x ⫹y 2
the risk of the oracle projector (11.34) is of the same order as the risk of an oracle
attenuation (11.31):
rpr ( f ) ⭓ rinf ( f ) ⭓
1
rpr ( f ).
2
(11.35)
The risk of an oracle projector can also be related to the approximation error of
f in the basis B. Let M ⫽ |⌳ | be the number of coefficients such that | fB [m]| ⭓ .
The best M-term approximation of f , defined in Section 9.2.1, is the orthogonal
projection on the M vectors {gm }m∈⌳ that yield the largest-amplitude coefficients:
fM ⫽
fB [m] gm .
m∈⌳
The nonlinear oracle projector risk can be rewritten as
rpr ( f ) ⫽
N
⫺1
min(| fB [m]|2 , 2 ) ⫽
| fB [m]|2 ⫹ M 2
(11.36)
m∈⌳
/
m⫽0
⫽ n (M, f ) ⫹ M ,
2
(11.37)
11.2 Diagonal Estimation in a Basis
where
n (M, f ) ⫽ f ⫺ fM 2 ⫽
| fB [m]|2
m∈⌳
/
is the best M-term approximation error. It is the bias produced by setting signal
coefficients to zero, and M 2 is the variance of the remaining noise on the M
coefficients that are kept.Theorem 11.5 proves that when decreases,the decay of
this risk is characterized by the decay of the nonlinear approximation error n (M, f )
as M increases.
Theorem 11.5. If n (M, f ) ⭐ C 2 M 1⫺2s with 1 ⭐ C/ ⭐ N s , then
rpr ( f ) ⭐ 3 C 1/s 2⫺1/s .
(11.38)
Proof. Observe that
rpr ( f ) ⫽ min
0⭐M⭐N
n (M, f ) ⫹ M 2 .
(11.39)
Let M0 be defined by 2M0 2 ⬎ C 2 M01⫺2s ⭓ M0 2 . Since 1 ⭐ C/ ⭐ N s , necessarily 1 ⭐
M0 ⭐ N . Inserting n (M0 , f ) ⭐ C 2 M01⫺2s with s ⬎ 1/2 and M0 ⭐ C 1/s ⫺1/s yields
rpr ( f ) ⭐ n (M0 , f ) ⫹ M0 2 ⭐ 3 M0 2 ⭐ 3C 1/s 2⫺1/s ,
(11.40)
■
which proves (11.38).
Linear Projection
Oracle estimators cannot be implemented because am is a constant that depends
on the unknown signal f . Let us consider linear projectors obtained by setting am
to be equal to 1 on the first M vectors and 0 otherwise:
F̃ ⫽ DM X ⫽
M⫺1
XB [m] gm .
(11.41)
| fB [m]|2 ⫹ M 2 ⫽ l (M, f ) ⫹ M 2 ,
(11.42)
m⫽0
The risk (11.28) becomes
r(DM , f ) ⫽
N
⫺1
m⫽M
where l (M, f ) is the linear approximation error computed in (9.3). The two terms
l (M, f ) and M 2 are, respectively, the bias and the variance components of the
estimator. To minimize r(DM , f ), the parameter M is adjusted so that the bias is of
the same order as the variance.When the noise variance 2 decreases, Theorem 11.6
proves that resulting risk depends on the decay of l (M, f ) as M increases.
Theorem 11.6. If l (M, f ) ⭐ C 2 M 1⫺2s with 1 ⭐ C/ ⭐ N s , then
r(DM0 , f ) ⭐ 3 C 1/s 2⫺1/s
for (C/(2))1/s ⬍ M0 ⭐ (C/)1/s .
(11.43)
551
552
CHAPTER 11 Denoising
Proof. As in (11.40), we verify that
r(DM0 , f ) ⫽ l (M0 , f ) ⫹ M0 2 ⭐ 3C 1/s 2⫺1/s,
for (C/(2))1/s ⬍ M0 ⭐ (C/)1/s , which proves (11.43).
■
Theorems 11.5 and 11.6 prove that the performances of oracle projection estimators and optimized linear projectors depend, respectively, on the precision of
nonlinear and linear approximations in the basis B. Having an approximation error
that decreases quickly means that one can then construct a sparse and precise signal representation with only a few vectors in B. Section 9.2 shows that nonlinear
approximations can be much more precise, in which case the risk of a nonlinear
oracle projection is much smaller than the risk of a linear projection. The next
section shows that thresholding estimators are nonlinear projection estimators that
have a risk close to the oracle projection risk.
11.2.2 Thresholding Estimation
In a basis B ⫽ {gm }0⭐m⬍N ,a diagonal estimator of f from X ⫽ f ⫹ W can be written as
F̃ ⫽ DX ⫽
N
⫺1
am (XB [m]) XB [m] gm .
(11.44)
m⫽0
We suppose that W is a Gaussian white noise of variance 2 . When am are thresholding functions, the risk of this estimator is shown to be close to the lower bounds
obtained with oracle estimators.
Hard Thresholding
A hard-thresholding estimator is implemented with
1 if |x| ⭓ T
am (x) ⫽
0 if |x| ⬍ T ,
and can thus be rewritten as
F̃ ⫽ DX ⫽
XB [m] gm
with
˜ T ⫽ {0 ⭐ m ⬍ N : |XB [m]| ⭓ T }.
⌳
(11.45)
(11.46)
˜T
m∈⌳
It is an orthogonal projection of X on the set of basis vectors {gm }m∈⌳˜ T . This
estimator can also be rewritten with a hard-thresholding function
N
⫺1
x if |x| ⬎ T
F̃ ⫽
T (XB [m]) gm with T (x) ⫽
(11.47)
0 if |x| ⭐ T .
m⫽0
The risk of this thresholding is
rth ( f ) ⫽ r(D, f ) ⫽
N
⫺1
m⫽0
E{| fB [m] ⫺ T (XB [m])|2 },
11.2 Diagonal Estimation in a Basis
with XB [m] ⫽ fB [m] ⫹ WB [m], and thus
|WB [m]|2 if |XB [m]| ⬎ T
| fB [m] ⫺ T (XB [m])|2 ⫽
| fB [m]|2 if |XB [m]| ⭐ T.
Since a hard thresholding is a nonlinear projector in the basis B,the thresholding
risk is larger than the risk (11.34) of an oracle projector:
rth ( f ) ⭓ rpr ( f ) ⫽
N
⫺1
min(| fB [m]|2 , 2 ).
m⫽0
Soft Thresholding
An oracle attenuation (11.30) yields a risk rinf ( f ) that is smaller than the risk rpr ( f )
of an oracle projection, by slightly decreasing the amplitude of all coefficients in
order to reduce the added noise. A soft attenuation, although nonoptimal, is implemented by
T
0 ⭐ am (x) ⫽ max 1 ⫺
, 0 ⭐ 1.
(11.48)
|x|
The resulting diagonal estimator F̃ in (11.44) can be written as in (11.47) with a
soft-thresholding function, which decreases the amplitude of all noisy coefficients
by T :
⎧
⎨x ⫺ T if x ⭓ T
(11.49)
T (x) ⫽ x ⫹ T if x ⭐ ⫺T
⎩
0
if |x| ⭐ T .
The threshold T is generally chosen so that there is a high probability that it is just
above the maximum level of the
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )