Source-filter models

advertisement
Sound Modeling:
signal-based approaches
(part 2)
Sound Analysis, Synthesis and
Processing
Paolo Bestagini
Summary
u 
u 
What does “signal based” approach mean?
u 
Models representing the sound without referring to the
mechanism used for its generation
Possible approaches:
u 
Time-segment based models
u 
Spectral models
u 
Source-filter models
u 
u 
u 
Subtractive synthesis
Speech modeling
Non-linear models
u 
u 
Non-linear distortion
Modulations
Source-filter models
Source-filter Model
A spectrally rich excitation signal shaped in the spectrum by linear system (filter)
that acts as a resonator
u 
u 
In computer music, source-filter models are traditionally grouped under the
label subtractive synthesis
often used in an analysis-synthesis framework: the source signal and the filter
parameters are estimated from a target sound signal, that can be
subsequently resynthesized through the identified model
Digression: z-transform
u 
u 
Z-transform and filters:
u 
Time domain
u 
Z domain
Examples:
Source-filter Model
u 
Filter block to be linear and time-invariant defined as:
x[n ]
u 
u 
in the z-domain
: excitation signal
Generic form of a filter
Features of the source and of the filter are combined: the spectral fine
structure of the excitation signal is multiplied by the spectral envelope
of the filter, which has a shaping effect on the source spectrum
Source Signal
u 
Rich spectrum that extends to a relevant portion of the audible
frequency range:
u  Noise Signal
u  Non-smooth periodic waveforms, whose spectral energy is
concentrated in a (large) set of discrete spectral lines (square
waves, sawtooth waves, triangle waves)
Source-filter Model – Square Waveform
Square Wave:
u  An ideal square wave alternates periodically and instantaneously
between two levels
where
Source-filter Model – Square Waveform
Triangular Wave
u An ideal triangular wave alternates periodically between a linearly rising
portion and a linearly decreasing portion
Source-filter Model – Square Waveform
Sawtooth Wave
u An ideal sawtooth wave is a periodic series of linear ramps
where
is the floor function.
Source-filter Model – Source signals
Impulse Train
u A sequence of unit impulses spaced by the desired fundamental period
Stochastic noise
u  Another simple generator for stochastic sources is the random noise
generator, which produces a flat spectrum noise (white noise, pink
noise)
Source-filter Model - Filters
Source-filter Model – Filters
Resonant Filter
l The second-order IIR filter is the simplest one, and is described by a
transfer function
where r and ±ωc are the magnitude and phases of the poles, and the
condition r < 1 must hold in order for the filter to be stable
l 
Source-filter Model – Speech Modeling
u 
u 
u 
Speech is an acoustic pressure wave
created when air is expelled from the lungs
through the trachea and vocal tract
Vocal tract: throat, nose, mouth and lips
As the acoustic wave passes through the
vocal tract, its spectrum is altered by the
resonances of the vocal tract (the formants)
Source-filter Model – Speech Modeling
Voiced sounds (vowels or nasals):
result from a quasi-periodic excitation of the vocal tract caused by
oscillation of the vocal folds in a quasi-periodic fashion
u 
Unvoiced sounds:
do not involve vocal fold oscillations and are typically associated to
turbulent flow generated when air passes through narrow restrictions of
the vocal tract
u 
u 
During voiced signals:
u  the quasi-periodic nature of the oscillations gives rise to an harmonic
signal
u  the frequency associated with the first harmonic partial is commonly
termed the pitch of the voiced signal
Source-filter Model – Speech Modeling
Concatenative synthesis
u  Connect pre-recorded natural phonetic units
u  Pros: Easiest way and the most popular approach to produce
intelligible and natural sounding synthetic speech
u  Cons: Are usually limited to one speaker and one voice and
usually require much memory – Not flexible
u Formant Synthesis
u  Formant synthesis is based on the source-filter modeling approach
Source: acoustic flow
Filter: vocal tract
u  The transfer function of the vocal tract is typically represented as a
series of resonant filters, each accounting for one formant
u Articulatory synthesis
u  Model the human speech production system directly
u  Parameters associated to vocal folds: glottal aperture, fold tension,
lung pressure, etc.
u  Pros: promise high quality synthesis
u  Cons: computational costs are high, parametric control is arduous
u 
Source-filter Model – Speech Modeling
Source-filter Model – Speech Modeling
Source-filter Model – Speech Modeling
Formant Synthesis
u  Formant synthesis of speech realizes a source-filter model:
u  a broadband source signal undergoes multiple filtering transformations
that are associated to the action of different elements of the phonatory
system
u  If s[n] is a voiced speech signal, it can be expressed in the z-domain as:
the source signal X(z) is a periodic pulse train whose period coincides
with the pitch of the signal and gv is a constant gain term
u  G(z) is a filter associated to the response of the glottis (the vocal folds) to
pitch pulses
u  V (z) is the vocal tract filter
u  R(z) simulates the radiation effect of the lips
If s[n] is a unvoiced speech signal
u 
u 
u 
The turbulence can be modeled as white noise, so X(z) is a white noise
sequence
Source-filter Model – Speech Modeling
Formant Synthesis
G(z) shapes the glottal pulses
u  since the input x[n] is a pulse train, the output is the impulse response
g[n] of this filter
u  A model is a IIR low pass filter
Source-filter Model – Speech Modeling
Formant Synthesis
R(z) is a load that converts the airflow signal at the lips into an outgoing
pressure wave
u  can be approximated by a differentiator (high pass) filter
where ρ is a lip radiation coefficient
The vocal tract filter V(z) models vocal tract formants
u  a single formant can be modeled with a two-pole resonator
V i ( z)
u 
u 
the filter associated to the i-th formant is Vi(z), having center
frequency fi and bandwidth Bi
at least 3 formants are needed (5 for high quality)
Source-filter Model – Speech Modeling
Formant Synthesis
u  Two possible structure which are used combined (cascade and
parallel)
u  A cascade formant synthesizer consists of band-pass resonators
connected in series
u  A parallel formant synthesizer consists of resonators connected in
parallel, i.e. the same input is applied to each formant filter and the
outputs are summed
u 
A cascade model of the vocal tract is considered to provide good
quality in the synthesis of vowels, but is less flexible than a parallel
structure, which enables controlling of bandwidth and gain for each
formant individually
Source-filter Model – Speech Modeling
Linear Prediction
u 
It’s possible to use the analysis-synthesis technique
u 
The problem is to extract a spectral envelope from a signal spectrum
u 
Linear prediction estimates an all-pole filter that matches the spectral
content of a sound. When the order of this filter is low, only the
formants are taken, hence the spectral envelope
Source-filter Model – Speech Modeling
Example:
•  The frequencies of the source and the frequencies of the filter are independent
•  This is why it is sometimes difficult to understand the vowels of a soprano
singing at the top of her range.
Non linear models
Non linear Models
u 
u 
The transformations seen until now are linear (a):
u  frequency does not change
Using non linear (b) transformations:
u  frequencies can be drastically changed
u  new components are created
It is possible to vary substantially the nature of the input sound
Non linear Models
u 
Two main effects:
u 
u 
Spectrum enrichment:
u  due to non linear distortion
u  allows for controlling the brightness of a sound
u  nonlinearities and saturations found on real systems e.g. analog
amplifiers, electronic valves
Spectrum shift:
u  due to multiplication of the signal by a sinusoid
u  moves the spectrum, altering the harmonic relationship between the
modulating signal lines
u  used in electronic music and it is a new metaphor for computer
musicians
the vicinitytrum
of the
carrier
signal,
the harmonic
relationship
between
the modulating
to the
vicinity
of the altering
carrier signal,
altering the harmonic
relationship
between
the modulating
signal lines. ofThe
possibility
shifting theisspectrum
is very intriguing
in when
applied
to music.
ines. The possibility
shifting
theofspectrum
very intriguing
in when
applied
to music.
From simple
components,
and inharmonic
sounds
can be created,
and various
harmonic
mple components,
harmonic
and harmonic
inharmonic
sounds can
be created,
and various
harmonic
relations among the partials can be established. The first effect try to reproduce the nonlinearities and
s among thesaturations
partials can
beon
established.
first effect
try toelectronic
reproduce
the The
nonlinearities
and
found
real systems The
e.g. analog
amplifiers,
valves.
second one instead
ons found onderives
real from
systems
e.g.mathematical
analog amplifiers,
valves.
The
second
one instead
abstract
properties ofelectronic
trigonometric
functions
as used
in modulation
theory
to music signal.
Therefore,
it inherits, in part,functions
the analogic
as used in electronic
from abstractapplied
mathematical
properties
of trigonometric
asinterpretation
used in modulation
theory
musicTherefore,
and is a new it
metaphor
forin
computer
musicians.
to music signal.
inherits,
part, the
analogic interpretation as used in electronic
nd is a new metaphor for computer musicians.
Non linear Models – Non linear distortion
(Waveshaping)
2.6.1 Memoryless non-linear processing
A sinusoidal
input x[n] = A cos(ω0n) which passes through a LTI system
2.6.1.1 Harmonic distortion and waveshaping
Memoryless
non-linear
processingproduces an output signal y[n] which is still a
(Linear
Time Invariant)
In Chapter Fundamentals of digital audio processing we have seen that a sinusoidal input x[n] = A cos(ω0 n)
Harmonic
distortion
and
waveshaping
sinusoid
with
the
frequency
ω0 anand
amplitude
and
phase
modified
which
passes
through
asame
LTI system
(a filter) produces
output
signal y[n] which
is still
a sinusoid
with the same
ω0 and amplitude
and phase
modified according to the transfer function
according
to frequency
the transfer
function
values
pter Fundamentals
of digital audio processing we have seen that a sinusoidal input x[n] = A cos(ω0 n)
u 
values (see Fig. 2.26(a)). On the other hand, if the signal is processed through a non-linear system,
through
a LTI
system
(a filter) of
produces
an output
signal
which
still
a sinusoid
more
substantial
modifications
the spectrum
occur: the
outputy[n]
has in
generalisthe
form
passes
e same frequency ω0 and amplitude and phase modified
according to the transfer function
N
X
see Fig.
2.26(a)).
On theisother
hand, if the
signal
processed
a non-linear
system,
y[n]
= isAa
cos(kω
(2.49)
u  If
the signal
processed
through
non-linear
system,
more
substantial
0 n),through
k
k=0
bstantialmodifications
modifications of the
spectrum
occur: the
output has in general the form
of the
spectrum
occur
and therefore the spectrum of y possesses energy at higher harmonics of ω0 (see Fig. 2.26(b)). This
N
X
effect, which is characteristic of
non-linear systems, is termed harmonic distortion, and can be quany[n]
=
Ak cos(kω
(2.49)
0 n),
Typical
output:
tified through
the total harmonic distortion
(T HD)
parameter:
k=0
v
u PN
2
u
k=2 Ak of ω (see Fig. 2.26(b)). This
t
refore the
spectrum
of
y
possesses
energy
at
higher
harmonics
0
T
HD
=
.
(2.50)
PN
Total Harmonic Distortion:
2
A
k=1 k
which is characteristic of non-linear systems, is termed harmonic
distortion, and can be quan-
rough the total
harmonic
distortion
(T HD)harmonic
parameter:
In many
cases one
wants to minimize
distortion in non-linear processing, but in other cases
distortion is exactly what we want inv
order to enrich an input sound. an example is the effect of valves,
u PN
u
A2k
k=2
t
This book is licensed
under the
Attribution-NonCommercial-ShareAlike
3.0 license,(2.50)
T HD
= CreativeCommons
.
PN
2
Non linear Models – Non linear distortion
(Waveshaping)
u 
We define the distortion block as a non-linear memory-less system:
u 
With a sinusoidal signal as input:
N
y [n ]= F ( A⋅ cos(ω0 n))=
u 
∑
k= 0
Ak⋅ cos( k ω 0 n )
Harmonic distortion
If we consider F(X[n]) as a polynomial with degree N (Taylor expansion)
u  the first N harmonics
Non linear Models – Non linear distortion
(Waveshaping)
Overdrive and distortion guitar effects
u  Analog guitar effects, based either on vacuum tubes (valves like diods,
triods, pentods) or solid-state devices, provide a good example of nonlinear processing
u  Overdrive: refers to a nearly linear audio effect device which can be
driven into the non-linear region of its distortion curve only by high
input levels. The transition from the operating linear region to the nonlinear region is smooth.
u  Distortion: refers to a similar effect, with the difference that the device
operates mainly in the non-linear region of the distortion curve
Non linear Models – Non linear distortion
(Waveshaping)
u 
u 
Symmetric distortion is based on static non-linearities that are odd with respect
to the origin, are approximately linear for low input values
u 
q in the second equation controls the amount of clipping (higher values
provide faster saturation)
Asymmetric overdrive effects are based on distortion curves that clip positive
and negative input values in different ways
u 
The parameter q scales the range of linear behavior (more negative values
increase the linear region of operation) and d controls the smoothness of
the transition to clipping (higher values provide stronger distortions)
Non linear Models – Multiplicative Synthesis
It is most simple technique for spectrum shift and in analog domain it’s called Ring
Modulation (RM)
s [n]= x 1 [n]⋅ x 2 [n ]
u  Let x [n] and x [n] be two input signals
1
2
u  The spectrum is convolution
S (ωd )= [ X 1∗ X 2 ](ωd )
u  Carrier Signal c[n] is a sinusoid with frequency ω
c
u  Modulation Signal the second signal is the input that will be transformed by the
ring modulation and is called the modulating signal m[n]
u 
x 1 [n]= c 1 [n]= cos(ωc n+ ϕc )
u  The Spectrum is
u 
x 2 [n ]= m[n]
i.e. S(ωd) is composed of two copies of the spectrum of M(ωd), symmetric around
ωc: a lower side- band (LSB), reversed in frequency, and an upper sideband
(USB)
Non linear Models – Multiplicative Synthesis
l 
If we consider
fundamental
harmonic partials
l 
l 
l 
l 
l 
in this case multiplicative synthesis causes every spectral line kωm to be replaced
by two spectral lines, one in the LSB and the other one in the USB, with
frequencies ωc − kωm and ωc + kωm
The resulting spectrum has partials at frequencies | ωc ± kωm | with k = 1, . . . , N
Spectra of this kind can be characterized through the ratio ωc/ωm
When this ratio is rational (i.e. ωc/ωm = N1/N2 with N1, N2 ∈ N and mutually
prime) the sound is periodic.
When this ratio is irrational the sound is inharmonic
Non linear Models – Multiplicative Synthesis –
Amplitude Modulation
l 
l 
Of particular interest is the case of an ωc/ωm ratio approximating a simple rational
value
In this case the fundamental frequency is still ω0 = ωm/N2, but partials are shifted
from the harmonic series by ±εωm, so that the spectrum becomes slightly
inharmonic.
Amplitude Modulation
l 
where α is the amplitude modulation index. In this case the spectrum S(ωd)
contains also the carrier spectral line, plus side-bands of the form. From the
expression for S(ωd) one can see that α controls the amplitude of the sidebands
Non linear Models – Frequency Modulation
Frequency modulation
l 
They are not derived from models of sound signals or sound production,
and are instead based on abstract mathematical descriptions
l Pros
l  versatile methods for producing many types of sounds
l  great timbral variability
l  very limited number of control parameters
l  low computational costs
l Cons
l  It can’t be used analysis-synthesis scheme in which parameters of the
synthesis model are derived from analysis of real sounds. No intuitive
interpretation can be given to the parameter choice
l 
Non linear Models - Modulation
Synthesis by Frequency modulation (FM)
l  The definition of synthesis by frequency modulation (FM) encompasses
an entire family of techniques in which the instantaneous frequency of a
periodic signal (carrier) is itself a signal that varies at audio rate
(modulating).
l 
The general formulation of FM is:
where
l  a[n] is the amplitude signal,
l  ω [n] is the carrier frequency,
c
l  Φ[n] is the modulating signal.
Non linear Models - Modulation
Basic FM Scheme
l It’s used a sinusoidal modulating signal φ[n] with
amplitude I[n] (called modulation index) and
frequency ωm[n]
where both I[n] and ωm[n] vary at frame rate
l 
This modulation produce the signal
where Jk(I[n]) is the k-th order Bessel function of first kind, computed in
I[n]
Non linear Models - Modulation
Basic FM Scheme
l we can see that the resulting spectrum is composed of partials at
frequencies | ωc ± kωm |, each with amplitude Jk(I)
Note that an infinite number of partials is generated, so that the signal
bandwidth is not limited. In practice however only a few low-order Bessel
functions take significantly non-null values for small values of I
l 
As I increases, the number of significantly non-null Bessel functions
increases too. So we can control the bandwidth around ωc
l 
we can control inharmonic factor through the ratio ωc / ωm
l 
Non linear Models - Modulation
Compound modulation
l If the modulating signal is composed of two sinusoids
s[n] possesses the partials with frequencies |ωc ± k1ω1 ± k2ω2| with
amplitudes given by
l 
Simplification: consider ω1>ω2 and consider only the sinusoid with ω1.
We obtain partials with frequencies |ωc ± k1ω1|. Adding the second
sinusoid, each partial of the first one become a carry for the second one
l 
If ωc is the greatest common divider for ω1 and ω2 then the spectrum is
|ωc ± kωm| similar to the basic case, but with a more rich spectrum
l 
Otherwise we produce inharmonic components
l 
Non linear Models - Modulation
Compound modulation – general case
l If the modulating signal is composed of N sinusoids
s[n] possesses all the partials with frequencies |ωc ± k1ωm,1 ±· · ·± kNωm,N|
with amplitudes given by the product of N Bessel functions
l 
Non linear Models - Modulation
Compound modulation – general case
Non linear Models - Modulation
Nested modulation
l A sinusoidal modulator is itself modulated by a second one
The result can be interpreted as if each partial produced by the
modulating frequency ωm,1 were modulated by ωm,2 with modulation index
kI2
l 
The spectral structure is similar to that produced by two sinusoidal
modulators, but with larger bandwidth
l 
Non linear Models - Modulation
Nested modulation
Non linear Models - Modulation
Feedback modulation
l Past values of the output signal are used as a modulating signal
With n0=1
l 
and β (called the feedback factor) acts as a scale factor or feedback
modulation index.
l 
For increasing values of β the resulting signal is periodic of frequency ωc
and changes smoothly from a sinusoid to a sawtooth waveform. Moreover
one may vary the delay n0 in the feedback, and observe emergence of
chaotic behaviors for suitable combinations of the parameters n0 and β.
l 
Download