Lecture 5: Sinusoidal Modeling

advertisement
ELEN E4896 MUSIC SIGNAL PROCESSING
Lecture 5:
Sinusoidal Modeling
1.
2.
3.
4.
Sinusoidal Modeling
Sinusoidal Analysis
Sinusoidal Synthesis & Modification
Noise Residual
Dan Ellis
Dept. Electrical Engineering, Columbia University
dpwe@ee.columbia.edu
E4896 Music Signal Processing (Dan Ellis)
http://www.ee.columbia.edu/~dpwe/e4896/
2013-02-18 - 1 /16
1. Sinusoidal Modeling
• Periodic sounds
each ridge is a
sinusoidal harmonic
.. with smoothly-varying
parameters
Violin.arco.ff.A4
➔ ridges in spectrogram
.. an efficient & flexible
description?
E4896 Music Signal Processing (Dan Ellis)
2013-02-18 - 2 /16
Sinusoid Modeling
• Analogous to Fourier series
model harmonics explicitly? e.g.
x[n] =
ak [n]cos( k [n])
k
... for pitched signal with fundamental
k [n] = k · 0 [n] · n
0 [n]
• Additional constraints
harmonicity
smoothness of ak [n]
• Arbitrarily accurate given enough sinusoids
E4896 Music Signal Processing (Dan Ellis)
2013-02-18 -
/16
Examples
• Using Michael Klingbeil’s SPEAR
http://www.klingbeil.com/spear/
E4896 Music Signal Processing (Dan Ellis)
2013-02-18 - 4 /16
Envelope Limitations
• Extracted envelope reflects analysis window
2000
0.3
1500
Frequency
0.4
0.2
0.1
0
1000
500
0
10
20
30
40
50
60
0
70
2000
0.3
1500
Frequency
0.4
0.2
0.1
0
0
0.5
1
Time
1.5
2
0
0.5
1
Time
1.5
2
1000
500
0
10
20
30
40
50
60
70
0
• Sharp window violates assumptions
E4896 Music Signal Processing (Dan Ellis)
2013-02-18 - 5 /16
2. Sinusoidal Analysis
• Sinusoids = peaks in spectrogram slices
N 1
= DFT frames X[k, m] =
x[n + mL] w[n]e
j 2 Nkn
n=0
• DFT length N
window determines frequency resolution: X(ej )
long enough to “see harmonics”
e.g. 2-3x longest pitch cycle typically 50-100 ms
but: too long → blurs amplitude envelope ak [n]
W (ej )
• Hop advance L
choose N/2 or N/4
.. denser for simpler interpolation along time
E4896 Music Signal Processing (Dan Ellis)
2013-02-18 - 6 /16
Sinusoidal Peak Picking
• Local maxima in DFT frames
freq / Hz
8000
6000
4000
2000
level / dB
0
0
20
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
time / s
0
-20
-40
-60
0
1000
2000
3000
4000
5000
6000
7000 freq / Hz
20
y
10
ab2/4
0
-10
0
y = ax(x-b)
x
b/2
-20
400
600
E4896 Music Signal Processing (Dan Ellis)
phase / rad
level / dB
• Quadratic fit for sub-bin resolution
800 freq / Hz
-5
-10
400
600
800 freq / Hz
2013-02-18 - 7 /16
Peak Selection
• Don’t want every peak
just “true” sinusoids
threshold?
level / dB
20
0
-20
-40
-60
0
1000
2000
3000
local shape - fits
4000
(
• Look for stability
5000
0)
6000
7000
freq / Hz
W (ej )
of frequency & amplitude in successive time frames
phase derivative in time/freq
E4896 Music Signal Processing (Dan Ellis)
2013-02-18 - 8 /16
Track Formation
• Connect peaks in adjacent frames to form
sinusoids
freq
can be ambiguous if large frequency changes
birth
existing
tracks
death
new
peaks
time
• Unclaimed peak → create new track
• No continuation of track → termination
hysteresis
E4896 Music Signal Processing (Dan Ellis)
2013-02-18 - 9 /16
Pitch Tracking
• Extracted sinusoids could be anywhere
6000
freq / Hz
freq / Hz
but often expect them to be in harmonic series
4000
2000
0
0
0.05
0.1 0.15
0.2
time / s
700
650
600
550
0
0.05
0.1 0.15
0.2
factor
• Find pitch by searching for common
[n] = k · [n]
can then “regularize” pitch
E4896 Music Signal Processing (Dan Ellis)
k
time / s
0
2013-02-18 - 10/16
3. Sinusoidal Synthesis
• Each sinusoid track {a [n],
level
drives an oscillator
k [n]}
k
3
3
2
2
a k[n]
1
a k[n]·cos(W k[n]·t)
0
0
700
Hz
/
q
fre
1
-1
600
500 0
Wk[n]
0.05 0.1 0.15 0.2
n
-2
time / s
-3
0
0.05
0.1 0.15
time / s
0.2
can interpolate amplitude, frequency samples
• Faster method synthesizes DFT frames
then overlap-add
trickier to achieve frequency modulation
E4896 Music Signal Processing (Dan Ellis)
2013-02-18 - 11/16
Sinusoidal Modification
• Sinusoidal description very easy to modify
e.g. changing time base of sample points
freq / Hz
5000
4000
3000
2000
1000
0
0
0.05
0.1
0.15
0.2
0.25
0.3
• Frequency stretch
0.35
0.4
0.45
0.5
time / s
40
level / dB
level / dB
preserve formant envelope?
30
20
10
0
0
1000 2000 3000 4000
freq / Hz
E4896 Music Signal Processing (Dan Ellis)
40
30
20
10
0
0
1000 2000 3000 4000
freq / Hz
2013-02-18 - 12/16
4. Noise Residual
• Some energy is not well fit with sinusoids
e.g. noisy energy
• Can just keep it as residual
or model it some other way
• Leads to “sinusoidal + noise” model
x[n] =
ak [n]cos(
k [n]n)
+ e[n]
k
mag / dB
20
sinusoids
0
original
-20
-40
-60
-80
residual
LPC
0
1000
2000
E4896 Music Signal Processing (Dan Ellis)
3000
4000
5000
6000
7000 freq / Hz
2013-02-18 - 13/16
Sinusoids + Noise Decomposition
• Removing sines reveals noise & transients
Guitar - original
4000
Frequency
3000
2000
1000
0
0
0.2
0.4
0.6
0.8
1
Time
1.2
1.4
1.6
1.8
2
1.4
1.6
1.8
2
1.4
1.6
1.8
2
Guitar - sinusoid reconstruction
4000
Frequency
3000
2000
1000
0
0
0.2
0.4
0.6
0.8
1
Time
1.2
Guitar - residual (original - sines)
4000
Frequency
3000
2000
1000
0
0
0.2
0.4
0.6
0.8
1
Time
1.2
• Different representation approaches...
E4896 Music Signal Processing (Dan Ellis)
2013-02-18 - 14/16
5. Limitations
• The spectrogram (mag STFT) is not linear
freq / Hz
superpositions suffer from phase effects
abs(stft(s1)) + abs(stft(s2))
1400
1300
1300
1200
1200
1100
1100
1000
1000
900
900
800
0
0.5
1
abs(stft(s1+s2))
1400
800
250
200
150
100
50
0
0.5
time / sec
1
0
• Separating sources
is generally hard...
parameters
tracking
E4896 Music Signal Processing (Dan Ellis)
2013-02-18 - 15/16
Summary
• Spectrogram shows sinusoid harmonics
in many sounds
• Peak picking in spectrogram
can effectively extract them
• Sinusoidal domain extremely flexible
for modification
• Noise residual can add even more realism
E4896 Music Signal Processing (Dan Ellis)
2013-02-18 - 16/16
Download