Lecture 9: Time & Pitch Scaling 1. Time Scale Modification (TSM)

advertisement
ELEN E4896 MUSIC SIGNAL PROCESSING
Lecture 9:
Time & Pitch Scaling
1.
2.
3.
4.
Time Scale Modification (TSM)
Time-Domain Approaches
The Phase Vocoder
Sinusoidal Approach
Dan Ellis
Dept. Electrical Engineering, Columbia University
dpwe@ee.columbia.edu
E4896 Music Signal Processing (Dan Ellis)
http://www.ee.columbia.edu/~dpwe/e4896/
2013-03-25 - 1 /16
1. Time Scale Modification (TSM)
• The basic problem in time scaling:
Modify a sound to make it “quicker/slower”?
to examine “detail” in speech/performance
to synchronize tracks
• Why not just
change the
sampling rate?
0.1
Original
0.05
0
-0.05
“slowing the tape”
e.g. for r times
longer (slower),
xs (t) = x(t/r)
-0.1
2.35
0.1
2.45
2.5
2.55
2.6
2x slower
0.05
0
-0.05
-0.1
2.35
E4896 Music Signal Processing (Dan Ellis)
2.4
2.4
2.45
2.5
2.55
2.6
time/s
2013-03-25 - 2 /16
Time & Pitch
• Changing speed alters time AND pitch
how do we change just one?
• Preserve pitch, change time
preserve pitch = keep local time structure
but changing global time course?
• Preserve timing, change pitch
analogous problem
change time scale, then change sampling rate
E4896 Music Signal Processing (Dan Ellis)
2013-03-25 - 3 /16
2. Time-Domain TSM
• Pre-digital time/pitch scaling:
Fairbanks et al., 1954
Revolving tape heads
E4896 Music Signal Processing (Dan Ellis)
2013-03-25 - 4 /16
Simple OLA TSM
• The digital equivalent of revolving tape heads
y [mL + n] = y
m
0.1
1
2
m 1
3
m
L+n
r
[mL + n] + w[n] · x
4
5
6
0
-0.1
2.35
1
2.4
2.45
2.5
4
2
1
1
2.6
time / s
5
6
3
0.1
2.55
2
2
3
3
4
4
5
5
6
4.95
time / s
0
-0.1
4.7
4.75
E4896 Music Signal Processing (Dan Ellis)
4.8
4.85
4.9
2013-03-25 - 5 /16
SOLA / SOLAFS
Roucos & Wilgus 1985
Hejna & Musicus 1991
• Simple OLA leads to
random phase interactions during overlap
so.. search for optimal alignment Km via small offset
1
2
Km maximizes alignment of 1 and 2
y [mL + n] = y
m
m 1
[mL + n] + w[n] · x
m
L + n + Km
r
find best Km with cross-correlation:
Km = arg
max
0 K Ku
Nov
n=0
(y m
E4896 Music Signal Processing (Dan Ellis)
ym
1
[mL + n] · x
1 [mL
+ n])2
(x
m
r
m
r
L+n+K
L + n + K )2
2013-03-25 - 6 /16
The Importance of Time Window
• Preserved “local structure”
= structure within each window
shorter windows → less structure preserved
tw = 25ms
tw = 5ms
0.1
0.1
0
0
-0.1
2.4
2.45
2.5
2.55
2.6
2.65
2.7
2.75
2.8
-0.1
2.4
0.1
0.1
0
0
-0.1
4.8
4.85
4.9
4.95
5
5.05
5.1
E4896 Music Signal Processing (Dan Ellis)
5.15
-0.1
4.8
2.45
2.5
2.55
2.6
2.65
2.7
2.75
4.85
4.9
4.95
5
5.05
5.1
5.15
2.8
2013-03-25 - 7 /16
Source-Filter TSM
• Decompose into excitation + filter
Kawahara 1997
before TSM
filter (e.g. LPC) easy to scale - change frame rate
excitation scaled e.g. by SOLAFS
.. then reapplying filter can reduce artifacts
Filter coefficients
f
Filter
analysis
Resample
in time
Residual
t
E4896 Music Signal Processing (Dan Ellis)
Time scale
modification
Apply
filter
2013-03-25 - 8 /16
3. The Phase Vocoder
• TSM based on the spectrogram
Flanagan & Golden 1966
Dolson 1986
4000
Frequency
3000
2000
1000
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Time
4000
Frequency
3000
2000
1000
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Time
just stretching it out it time gives “what we want”:
but: STFT magnitude isn’t quite enough
E4896 Music Signal Processing (Dan Ellis)
2013-03-25 - 9 /16
Magnitude-only reconstruction
need
iterate
1
{Y (ej , t)} ...
y n (t) =
n+1
( , t) =
converges
to a solution,
but SLOWLY
{|Y (ej , t)|} ?
{|Y (ej , t)| n ( , t)}
{ST F T {y n (t)}}
1
ST F T
SNR of reconstruction (dB)
•
How to recover ST F T
Griffin & Lim 1984
Iterative phase estimation
8
7.6 dB
7
6
5
4.7 dB
Magnitude quantization only
Starting from quantized phase
Starting from random phase
4
3.4 dB
3
E4896 Music Signal Processing (Dan Ellis)
0
1
2
3
4
5
6
7
8
Iteration number
2013-03-25 - 10/16
Phase Correction
• Essential idea of the phase vocoder
phase advance will stretch by time scaling factor
... preserve rate of phase change between slices
d
˙
{Y (ej , t)}
= instantaneous frequency ( , t) =
dt
(from differencing adjacent frames, or directly...)
• Makes sense for a single sinusoid...
ΔT
.
θ = Δθ
ΔT
time
.
Δθ' = θ·2ΔT
E4896 Music Signal Processing (Dan Ellis)
2013-03-25 - 11/16
Phase Vocoder Results
freq / kHz
• Reconstructed phase
8
gives smooth evolution
6
4
2
freq / kHz
0
8
6
4
2
0
0.5
1
1.5
2
2.5
time / s
3
“metallic” extended noise...
E4896 Music Signal Processing (Dan Ellis)
2013-03-25 - 12/16
Time Window (again)
freq / kHz
• STFT time-frequency tradeoff
8
determines what “scaling” means
6
4
2
freq / kHz
0
8
6
4
2
0
0.5
1
1.5
2
2.5
time / s
3
is pitch structure within DFT window?
E4896 Music Signal Processing (Dan Ellis)
2013-03-25 - 13/16
4. Sinusoidal TSM
• Phase Vocoder essentially
treats each STFT bin as a sinusoid
with magnitude |Y (ej , t)| and frequency ˙ ( , t)
time scaling simply projects that sinusoid forward
• But: bins around peak don’t behave that way
need phase alignment to achieve interference
• Model as sinusoids explicitly
peak-picking, magnitude & frequency estimation
separate noise signal?
Frequency
8000
6000
4000
2000
0
0.5
1
E4896 Music Signal Processing (Dan Ellis)
1.5
Time
2
2.5
3
2013-03-25 - 14/16
Summary
• Time / Pitch scale modification
Intuitive but difficult to define
• Time domain methods
Preserve local structure
• Spectral methods
Elongate features in spectrogram
E4896 Music Signal Processing (Dan Ellis)
2013-03-25 - 15/16
References
•
•
•
•
•
•
•
•
M. Covell, M.Withgott, M., Slaney, “Mach1: Nonuniform Time-Scale Modification
of Speech,” Proc. ICASSP, vol 1, pp. 349-352,1998.
Mark Dolson, “The phase vocoder: A tutorial,” Computer Music Journal, vol. 10,
no. 4, pp. 14-27, 1986.
G. Fairbanks, W. Everitt, and R. Jaeger, “Method for time of frequency
compression-expansion of speech,” Transactions of the IRE Professional Group on
Audio, vol.2, no.1, pp. 7-12, Jan 1954.
J. L. Flanagan, R. M. Golden, “Phase Vocoder,” Bell System Technical Journal, pp.
1493-1509, November 1966.
D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier
transform,”
IEEE Tr. Acous,, Speech, and Sig. Proc., vol. 32, pp. 236–242, 1984.
Don Hejna and Bruce R. Musicus, “The SOLAFS Time-Scale Modification
Algorithm,” BBN Technical Report, July 1991.
Hideki Kawahara, “Speech Representation and Transformation using Adaptive
Interpolation of Weighted Spectrum:VOCODER Revisited,'' Proc. ICASSP, vol.2,
pp.1303-1306, 1997.
Jean Laroche, “Time and Pitch Scale Modification of Audio Signals.” chapter 7 in
Applications of Digitial Signal Processing to Audio and Acoustics, ed. M. Kahrs & K.
Brandenburg, Kluwer, 1998.
E4896 Music Signal Processing (Dan Ellis)
2013-03-25 - 16/16
Download