Keynote 1: Sparse Time-Frequency Transforms and Applications

advertisement
Sparse Time-Frequency Transforms and
Applications.
Bruno Torrésani
http://www.cmi.univ-mrs.fr/~torresan
LATP, Université de Provence, Marseille
DAFx, Montreal, September 2006
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
1 / 41
1
Introduction
2
Signal waveform representations
Bases
Frames
Multiple frames
More realistic time-frequency atoms ?
3
Coefficient domain models
Hybrid random waveform models
Estimation algorithms based on observed coefficients
Estimation algorithms based on synthesis coefficients
4
Conclusion
5
References
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
2 / 41
Introduction
Introduction
During the last twenty years (and much more than that in fact): harmonic
analysis has provided many new techniques for expanding signals into
“elementary” waveforms.
Redundant Gabor wavelet systems (frames)
Wavelet bases
MDCT and wilson bases
Matching pursuit and cognates
...
Most often, sparsity of the representation was a key issue.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
3 / 41
Introduction
Introduction
During the last twenty years (and much more than that in fact): harmonic
analysis has provided many new techniques for expanding signals into
“elementary” waveforms.
Redundant Gabor wavelet systems (frames)
Wavelet bases
MDCT and wilson bases
Matching pursuit and cognates
...
Most often, sparsity of the representation was a key issue.
In this talk: we review a number of such approaches, in view of a few
selected applications.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
3 / 41
Introduction
Introduction: What is sparsity ?
A signal representation is sparse when most information is concentrated in
a small amount of data (coefficients). For example, a sine wave is sparsely
represented in the Fourier domain, not in the time domain.
Sparsity is an “vague” concept. Ideally, the volume of data (number of
coefficients for example) would be a good sparsity measure.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
4 / 41
Introduction
Introduction: What is sparsity ?
A signal representation is sparse when most information is concentrated in
a small amount of data (coefficients). For example, a sine wave is sparsely
represented in the Fourier domain, not in the time domain.
Sparsity is an “vague” concept. Ideally, the volume of data (number of
coefficients for example) would be a good sparsity measure.
In noisy situations, this measure is generally polluted by a large number of
small coefficients, originating from noise.
Other measures may be used (entropies)... but they often do not yield the
same results [Jaillet & BT 2003].
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
4 / 41
Introduction
Introduction: sparsity: what for ?
A sparse time-frequency representation concentrates the relevant
information in a small amount of coefficients: the pdf of the coefficients is
peaked at 0, and heavy tailed.
Most popular applications
Signal coding... if the cost of encoding the representation itself is not
too high
Signal modeling: expand signals into components that make sense.
Denoising: most often, noise is not sparse.
Source separation (exploiting dimension reduction).
...
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
5 / 41
Introduction
1
Introduction
2
Signal waveform representations
Bases
Frames
Multiple frames
More realistic time-frequency atoms ?
3
Coefficient domain models
Hybrid random waveform models
Estimation algorithms based on observed coefficients
Estimation algorithms based on synthesis coefficients
4
Conclusion
5
References
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
6 / 41
Signal waveform representations
Signal representations
Signal waveform expansion: decompose a signal as a linear combination of
“elementary waveforms” ψλ , often generated using simple rules.
X
x(t) =
αλ ψλ (t)
λ
with αλ the coefficients, and ψλ the waveforms.
Examples:
Time-frequency atoms (MDCT or Wilson bases, Gabor atoms,...)
Time-scale atoms (wavelets, multiwavelets,...)
Chirplets,...
Higher dimensional versions
See [Mallat 1998], [Carmona et al. 1998] or [Wickerhauser 1994].
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
7 / 41
Signal waveform representations
Bases
Signal representations: bases
The mathematically simplest situation: orthonormal bases. The waveform
system W = {ψλ , λ ∈ Λ} is an orthonormal basis of the signal space (inner
product space, or Hilbert space) H is
The atoms are mutually orthogonal and normalized: hψλ , ψµ i = δµν
They form a complete set in H: if the signal x ∈ H is such that
hx, ψλ i = 0 for all λ ∈ Λ, then x = 0.
Then, any signal may be written in an unique way as
X
x(t) =
αλ ψλ (t) , with αλ = hx, ψλ i
λ∈Λ
Thus, analysis and synthesis involve the same atoms.
In addition, the “coefficient mapping” x → {αλ , λ ∈ Λ} preserves energy
(Parseval’s formula)
X
|αλ |2 = kxk2 .
λ∈Λ
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
8 / 41
Signal waveform representations
Bases
Signal representations: bases
MDCT basis: smooth windows modulated by a sinusoidal function.
In the continuous-time setting, the following (infinite) family of functions
forms an orthonormal basis of L2 (R).
r
2
π
1
wk (t) cos
n+
(t − ak ) , k ∈ Z, n = 0, 1, 2, . . .
ukn (t) =
`k
`k
2
In bounded intervals, as well as finite dimensional settings, similar bases
may be constructed (Malvar, Suter, ...)
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
9 / 41
Signal waveform representations
Bases
Signal representations: bases
More precisely, the only assumption is that the window functions wk must
satisfy some symmetry conditions at boundaries.
In general, windows are taken as regular translates of a single one.
More freedom may be introduced, as long as the symmetry conditions are
fullfilled. For example, some audio coders use systems with wide and
narrow windows:
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
10 / 41
Signal waveform representations
Bases
Signal representations: bases
More precisely, the only assumption is that the window functions wk must
satisfy some symmetry conditions at boundaries.
In general, windows are taken as regular translates of a single one.
More freedom may be introduced, as long as the symmetry conditions are
fullfilled. For example, some audio coders use systems with wide and
narrow windows:
Simple implementations are available on the Wavelab Stanford package:
http://www-stat.stanford.edu/~wavelab
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
10 / 41
Signal waveform representations
Bases
Signal representations: bases
MDCT basis is well adapted for audio signals: the expansion of most
signals is sparse. See below: pdf (log scale) of MDCT coefficients of some
organ recording.
Besides signal coding/compression, sparsity also helps for several
applications.
Application: denoising: as noise is generally not sparse in the MDCT
basis, simply threshold the MDCT coefficients of the noisy signal before
reconstruction.
Organ signal;
Noisy organ signal;
Denoised organ signal.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
11 / 41
Signal waveform representations
Bases
Signal representations: bases
Application: source separation:
Consider two mixtures (linear combinations): Mix 1;
Mix 2.
Below: scatter plots of the samples of mix 1 against mix 2 (left), and the
mdct coefficients of mix 1 against mix 2 (right).
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
12 / 41
Signal waveform representations
Bases
Signal representations: bases
Application: source separation:
Consider two mixtures (linear combinations): Mix 1;
Mix 2.
Below: scatter plots of the samples of mix 1 against mix 2 (left), and the
mdct coefficients of mix 1 against mix 2 (right).
Method: identify the two directions, and project.
Reconstructed organ;
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
12 / 41
Signal waveform representations
Bases
Signal representations: bases
Advantages:
“Optimal” in terms of redundancy.
There exist bases for which fast algorithms have been developed
(MDCT, Wilson, wavelets,...)
Drawbacks:
Being an orthonormal basis has a price: rigidity. Not any window
function will generate a basis. Mathematically speaking, windows are
not as smooth as one would like.
Being a basis also imposes constraints on the sampling in time and
frequency. No “free access” to the time-frequency domain.
Question: can we make it sparser by introducing redundancy ?
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
13 / 41
Signal waveform representations
Frames
Signal representations: frames
As an alternative to orthonormal bases, frames (wavelet, or Gabor) offer
more flexibility.
A frame is a (generally overcomplete) system of waveforms
W = {ψλ , λ ∈ Λ} with respect to which signals may be expanded, with
stable synthesis.
In the case of frames, the energy conservation (Parseval’s formula) is
generally replaced by an inequality of the form
X
Akxk2 ≤
|hx, ψλ i|2 ≤ Bkxk2 ,
λ∈Λ
for some constants 0 < A ≤ B < ∞, for all signal x.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
14 / 41
Signal waveform representations
Frames
Signal representations: frames
In such cases, one does not have exact reconstruction as before, but an
approximation
2 X
hx, ψλ iψλ
x−
A+B
2
≤
λ∈Λ
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
B −A
B +A
September 2006
15 / 41
Signal waveform representations
Frames
Signal representations: frames
In such cases, one does not have exact reconstruction as before, but an
approximation
2 X
hx, ψλ iψλ
x−
A+B
2
≤
λ∈Λ
B −A
B +A
Good news: there exists a (non unique) dual waveform system
{ψ̃λ , λ ∈ Λ} such that for all signal:
X
X
x=
hx, ψλ iψ̃λ =
hx, ψ̃λ iψλ .
λ∈Λ
λ∈Λ
Hence: analysis and synthesis do not involve the same waveforms.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
15 / 41
Signal waveform representations
Frames
Signal representations: frames
An example: as an alternative to MDCT bases, the Gabor frames
ψmn (t) = e 2iπnν0 t ψ(t − mb0 )
provide a regular sampling of the time-frequency plane: a regular grid
with mesh sizes b0 and ν0 .
For b0 ν0 small enough, these indeed for a frame of the considered signal
space, and the (canonical) dual system is a Gabor frame too. There exists
a dual window ψ̃ such that the dual atoms are of the form ψ̃mn .
The smaller b0 ν0 , the more redundant the system, the closer A and B, and
the closer ψ and ψ̃.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
16 / 41
Signal waveform representations
Frames
Signal representations: frames
Sampling grids in time-frequency domain have to be adapted to the
time/frequency resolution of the atoms.
Left: good frequency resolution (wide windows); Right: good time
resolution (narrow window). Full circles correspond to “large” coefficients.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
17 / 41
Signal waveform representations
Frames
Signal representations: frames
A Gaussian Gabor atom (blue), and its duals for low redundancy (left) and
high redundancy (right)
To play with Gabor atoms, dual atoms,...: the Linear Time-Frequency
Analysis Toolbox (P. Söndergaard)
LTFAT: http://www.univie.ac.at/nuhag-php/ltfat
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
18 / 41
Signal waveform representations
Frames
Signal representations: multiple frames
Gabor frames offer more flexibility than MDCT bases in the choice of the
window. However, in the standard construction, the Gabor atoms are of
constant size, which is not always convenient for describing all features of
audio signals.
The time-frequency resolution of the atoms plays a significant role. Wide
windows (from 20 to 40 ms) are well adapted to tonals (partials), while
shorter ones (or wavelets instead of Gabor atoms) are beter suited for
transients.
Example: xilophone;
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
19 / 41
Signal waveform representations
Frames
Signal representations: multiple frames
Question: can we get the best of the two frames ?
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
20 / 41
Signal waveform representations
Frames
Signal representations: multiple frames
Question: can we get the best of the two frames ?
Answer: yes, provided we can select the right signal expansion (among
infinitely many), and control sparsity.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
20 / 41
Signal waveform representations
Multiple frames
Signal representations: multiple frames
Idea: expand signals with respect to a larger system, involving both wide
atoms Wg = {gmn , (m, n) ∈ Λg } and narrow atoms
Wh = {hmn , (m, n) ∈ Λh }:
D = Wg ∪ Wh
D is still a frame, i.e. stable signal expansions on D exist.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
21 / 41
Signal waveform representations
Multiple frames
Signal representations: multiple frames
Idea: expand signals with respect to a larger system, involving both wide
atoms Wg = {gmn , (m, n) ∈ Λg } and narrow atoms
Wh = {hmn , (m, n) ∈ Λh }:
D = Wg ∪ Wh
D is still a frame, i.e. stable signal expansions on D exist.
For all finite-energy signal x, there exist infinitely many expansions
X
X
x = xg + xh =
αλ gλ +
βµ hµ
λ∈Λg
µ∈Λh
The coefficients α and β provide information on the time-frequency
content of x; but some expansions are more meaningful than others.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
21 / 41
Signal waveform representations
Multiple frames
Signal representations: multiple frames
How to pick the right time-frequency atoms ?
Quilted frames [Dörfler 2002]: tile the time-frequency plane into
domains corresponding to different time-frequency resolutions
Time-frequency Jigsaw Puzzle [Jaillet & BT 2006]: let the computer
choose the “right” atoms in time-frequency domain, using sparsity
requirement.
Matching Pursuit and Orthogonal Matching
Pursuit [Mallat & Zhang 1993]: recursive search of atoms that
correlate best with the signal.
Basis Pursuit and Basis Pursuit Denoising [Chen et al 1998].
In all cases, partial synthesis from atoms of similar properties (i.e.
time-frequency resolution) is possible
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
22 / 41
Signal waveform representations
Multiple frames
Signal representations: multiple frames
Example with the TFJP algorithm [Jaillet & BT 2006]
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
23 / 41
Signal waveform representations
More realistic time-frequency atoms ?
More realistic time-frequency atoms ?
In the previous approaches, sparsity was the only requirement. Can we do
more ?
In addition, whatever the choice of the waveform system, the
time-frequency atoms are generally not realistic as “sound atoms”.
Several possible approaches for improvement
Learn more realistic atoms from sound databases: “dictionary
learning” approach (e.g. [Bluemensath & Davies 2004]).
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
24 / 41
Signal waveform representations
More realistic time-frequency atoms ?
More realistic time-frequency atoms ?
In the previous approaches, sparsity was the only requirement. Can we do
more ?
In addition, whatever the choice of the waveform system, the
time-frequency atoms are generally not realistic as “sound atoms”.
Several possible approaches for improvement
Learn more realistic atoms from sound databases: “dictionary
learning” approach (e.g. [Bluemensath & Davies 2004]).
Build “time-frequency molecules” from atoms as compound objects
(see for example [Daudet 2006])
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
24 / 41
Signal waveform representations
More realistic time-frequency atoms ?
More realistic time-frequency atoms ?
In the previous approaches, sparsity was the only requirement. Can we do
more ?
In addition, whatever the choice of the waveform system, the
time-frequency atoms are generally not realistic as “sound atoms”.
Several possible approaches for improvement
Learn more realistic atoms from sound databases: “dictionary
learning” approach (e.g. [Bluemensath & Davies 2004]).
Build “time-frequency molecules” from atoms as compound objects
(see for example [Daudet 2006])
Model dependencies between atoms in the coefficient domain.
In the rest of the lecture, we focus on this last approach, using a pair of
orthonormal bases (following [Daudet & Torrésani 2005]).
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
24 / 41
Signal waveform representations
More realistic time-frequency atoms ?
1
Introduction
2
Signal waveform representations
Bases
Frames
Multiple frames
More realistic time-frequency atoms ?
3
Coefficient domain models
Hybrid random waveform models
Estimation algorithms based on observed coefficients
Estimation algorithms based on synthesis coefficients
4
Conclusion
5
References
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
25 / 41
Coefficient domain models
Coefficient domain models
Experimental observation”:
Interesting features in signals are “often” characterized by localized
families of large coefficients, forming structured sets:
Tonals: Horizontal lines in the short time Fourier domain.
Transients: Vertical lines in the short time Fourier domain, or
vertical trees in the wavelet domain.
To encode separately such structures, explicit models may be introduced in
the coefficient domain.
Strategy: characterize the behavior of certain indicators (coefficients, or
others) in the framework of the model, in view of estimation from real
data.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
26 / 41
Coefficient domain models
Hybrid random waveform models
Hybrid random waveform models
The generic form of such models (in N-dimensional space) is the
following [Kowalski & BT 2006]
X
X
x=
αλ gλ +
βδ hδ + r
λ∈Λ
δ∈∆
where
Wg = {gλ , λ = 1, . . . N} and Wh = {hδ , δ = 1 . . . N} are two
orthonormal bases of waveforms.
The coefficients αλ and βδ are iid Gaussian random variables, with
frequency dependent variances (λ and δ are time-frequency indices)
and r is a small residual signal, modeled as white noise.
The sets Λ and ∆ are sparse random subsets of the index set. The
simplest model is the Bernoulli model: iid sets, with membership
probabilities p and p̃. More complex models (for example Markov
models) introduce dependencies between coefficients.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
27 / 41
Coefficient domain models
Hybrid random waveform models
Hybrid random waveform models
Are such models able to reproduce “experimental observations” ? Study
the behavior of observed coefficients (which differ from the synthesis
coefficients αn and βn )
an = hx, gn i ,
bm = hx, hm i
Introduce the membership variables XnΛ = 1 if n ∈ Λ and 0 otherwise, and
similarly for Xn∆ .
Then
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
28 / 41
Coefficient domain models
Hybrid random waveform models
Hybrid random waveform models
Are such models able to reproduce “experimental observations” ? Study
the behavior of observed coefficients (which differ from the synthesis
coefficients αn and βn )
an = hx, gn i ,
bm = hx, hm i
Introduce the membership variables XnΛ = 1 if n ∈ Λ and 0 otherwise, and
similarly for Xn∆ .
Then
= hx, gn i
= αn XnΛ +
PN
bn = hx, hn i
= βn Xn∆ +
PN

 an

B. Torrésani (LATP Marseille)
∆
m=1 βm Xm hhm , gn i
Λ
m=1 αm Xm hgm , hn i
Sparse Time-Frequency Transforms
September 2006
28 / 41
Coefficient domain models
Estimation algorithms based on observed coefficients
Estimation: observed coefficients
In particular, assuming for simplicity that all coefficients α (resp. β) have
the same variance σ 2 (resp. σ̃ 2 ), one has
!
X
var{ak } = σ 2 XkΛ + σ̃ 2
|hgk , hδ i|2 + σ02 .
δ∈∆
If the significance maps are sparse, and if the two bases are sufficiently
different, one recovers the “experimental” observations. This justifies the
fact of approximating the signal by keeping the largest coefficients.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
29 / 41
Coefficient domain models
Estimation algorithms based on observed coefficients
Estimation: observed coefficients
Bernoulli model:
It may be proved that the observed coefficients follow a mixture
distribution:
A “small variance” Gaussian mixture for coefficients whose
time-frequency index does not belong to the significance map
A “large variance” Gaussian mixture for coefficients whose
time-frequency index does belong to the significance map.
Exploiting numerically such a results yields an algorithm for estimating the
significance maps, which yields significant dimension reduction, and allows
one to estimate the coefficients.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
30 / 41
Coefficient domain models
Estimation algorithms based on observed coefficients
Bernoulli-based estimation algorithm: 3 steps
Goal: decompose the signal into two layers (+ residual)
Parameter estimation: membership probabilities and synthesis
coefficients variances (EM algorithm).
Estimation of the significance maps Λ and ∆ (maximum likelihood...
thresholding for the Bernoulli model)
Estimation of the layers: orthogonal projection onto the subspace
generated by the selected time-frequency atoms.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
31 / 41
Coefficient domain models
Estimation algorithms based on observed coefficients
Bernoulli-based estimation algorithm: 3 steps
Goal: decompose the signal into two layers (+ residual)
Parameter estimation: membership probabilities and synthesis
coefficients variances (EM algorithm).
Estimation of the significance maps Λ and ∆ (maximum likelihood...
thresholding for the Bernoulli model)
Estimation of the layers: orthogonal projection onto the subspace
generated by the selected time-frequency atoms.
The algorithm is in fact more complex, and involves several iterations of
steps 1 and 2.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
31 / 41
Coefficient domain models
Estimation algorithms based on observed coefficients
Bernoulli-based estimation algorithm: example
Decomposition of a Xilophone signal (top) into transient (bottom left) and
tonal (bottom right) layers
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
32 / 41
Coefficient domain models
Estimation algorithms based on observed coefficients
Structured model-based estimation algorithm
Structured model: implements other a priori information, such as the fact
that significant coefficients tend to form clusters, or lines (horizontal or
vertical).
Several models may be developed, among which
Markov models for the significance maps [Molla & Torrésani 2005]
Two-levels Bernoulli models (M. Kowalski)
The estimation procedure has to be modified accordingly.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
33 / 41
Coefficient domain models
Estimation algorithms based on observed coefficients
Structured model-based estimation algorithm
Example: Markov model: (see [Molla & Torrésani 2005])
∆ is
for the tonal layer, the distribution of the indicator random variables Xkn
characterized by transition matrices
πn
1 − πn
Pn =
,
1 − πn0 πn0
with
n
o
∆
∆
πn = P∆ Xk+1,n
= 1|Xk,n
=1 ,
n
o
∆
∆
πn0 = P∆ Xk+1,n
= 0|Xk,n
=0
In particular,
P{Xk+1,n = 1, Xk,n = 1} > P{Xk+1,n = 1}P{Xk,n = 1}
P{Xk+1,n = 0, Xk,n = 0} > P{Xk+1,n = 0}P{Xk,n = 0} .
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
34 / 41
Coefficient domain models
Estimation algorithms based on observed coefficients
Structured model-based estimation algorithm
The estimation algorithm keeps a similar structure as before... but
becomes more complex:
the estimation of the significance maps is not local (in the coefficient
domain) anymore.
Parameter estimation may be performed via EM algorithms
The estimation of membership probabilities has to be replaced with
the estimation of Markov matrices
The estimation of the maps is done using Viterbi algorithm.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
35 / 41
Coefficient domain models
Estimation algorithms based on observed coefficients
Structured model-based algorithm: example
“Blues Brothers” recording: original, tonal, transient and residual.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
36 / 41
Coefficient domain models
Estimation algorithms based on synthesis coefficients
Estimation: synthesis coefficients
Alternative: in the framework of such random models, work directly on the
distribution of the synthesis coefficients.
A hierarchical Bayesian model model can be
constructed [Févotte et al 2006], implementing
Sparse signal decomposition into a dictionary of two MDCT bases,
with Gaussian random coefficients
Markov significance maps, implementing “structured” sets of
coefficients
Suitable priors for the model coefficients
MAP and MMSE estimates are obtained by MCMC algorithms,
significantly heavier than the previous ones.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
37 / 41
Coefficient domain models
Estimation algorithms based on synthesis coefficients
Estimation: synthesis coefficients
Alternative: in the framework of such random models, work directly on the
distribution of the synthesis coefficients.
A hierarchical Bayesian model model can be
constructed [Févotte et al 2006], implementing
Sparse signal decomposition into a dictionary of two MDCT bases,
with Gaussian random coefficients
Markov significance maps, implementing “structured” sets of
coefficients
Suitable priors for the model coefficients
MAP and MMSE estimates are obtained by MCMC algorithms,
significantly heavier than the previous ones.
Example: S. Raman; noisy version; denoised version (MMSE estimate);
Tonal; Transient.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
37 / 41
Conclusion
Conclusions
Hybrid expansions generally provide sparser signal representations.
The introduction of structured significance maps also improves
sparsity, generally at the price of increased computational burden. In
addition, tonal layers turn out to be more difficult to model accurately
(MDCT bases do not offer the same flexibility as harmonic models).
These techniques yield “transient + tonal + residual” signal
decompositions: a sort of elementary (single captor) source
separation.
These may be exploited for various tasks (denoising, source
separation, coding...), in situations where the residual signal is not
too important.
Otherwise, the residual will have to be modelled.
To do next: relax the assumption of independence of the tonal and
transient layers.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
38 / 41
References
Bibliography
T. Blumensath, M.E. Davies, Unsupervised learning of sparse and shift-invariant
decompositions of polyphonic music, in: Proceedings of ICASSP ’04, vol. 5, 2004, pp.
V:497-V:500.
R. Carmona, W.L. Hwang, and B. Torrésani. Practical Time-Frequency Analysis:
continuous wavelet and Gabor transforms, with an implementation in S, volume 9 of
Wavelet Analysis and its Applications. Academic Press, San Diego, 1998.
S.S. Chen, D.L. Donoho and M.A. Saunders, Atomic Decomposition by Basis Pursuit
SIAM Journal on Scientific Computing 20:1 (1998), pp. 33 - 61
L. Daudet. Sparse and structured decompositions of signals with the molecular matching
pursuit. IEEE Transactions on Acoustics, Speech, and Signal Processing, 2006, to
appear.
L. Daudet and B. Torrésani, Sparse adaptive representations for musical signals,
Technical report, to appear in Signal processing for music transcription, M. Davy and A.
Klapuri Eds. (2005).
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
39 / 41
References
Bibliography
M. Dörfler, Gabor Analysis for a Class of Signals called Music, PhD Dissertation, 2002,
Mathematics Department, University of Vienna.
http://www.mat.univie.ac.at/~moni/diss.pdf
C. Fevotte, L. Daudet, S.J. Godsill and B. Torrésani, Sparse Regression with Structured
Priors: Application to Audio Denoising. Proceedings of ICASSP 2006, Volume: 3, pp.
III-57 - III-60.
C. Févotte, B. Torrésani, L. Daudet and S. Godsill, Denoising of musical audio using
sparse linear regression and structured priors, submitted.
F. Jaillet and B. Torrésani, Remarques sur l’adaptativit des reprsentations
temps-frquence Proceedings of the GRETSI’03 conference, Vol 1, pp. 145-148.
F. Jaillet and B. Torrésani, Time-Frequency Jigsaw Puzzles, To appear in Int. J. on
Wavelets and Multiresolution Information Processing (2006).
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
40 / 41
References
Bibliography
M. Kowalski, and B. Torrésani, A Family of Random Waveform Models for Audio
Coding, Proceedings of ICASSP 2006, Volume: 3, pp III-472 - III-475.
S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE
Transactions on Signal Processing, 41:3397–3415, 1993.
S. Mallat. A wavelet tour of signal processing. Academic Press, 1998.
S. Molla and B. Torrésani. Hybrid Audio Scheme using Hidden Markov Models of
Waveforms Applied and Computational Harmonic Analysis 18 (2005), pp. 137-166.
M. V. Wickerhauser. Adapted Wavelet Analysis from Theory to Software. AK Peters,
Boston, MA, USA, 1994.
B. Torrésani (LATP Marseille)
Sparse Time-Frequency Transforms
September 2006
41 / 41
Download