Lecture 8: Pitch Tracking 1. Pitch Tracking 2. Spectral Approaches

advertisement
ELEN E4896 MUSIC SIGNAL PROCESSING
Lecture 8:
Pitch Tracking
1.
2.
3.
4.
Pitch Tracking
Spectral Approaches
Time Domain
Example algorithms
Dan Ellis
Dept. Electrical Engineering, Columbia University
dpwe@ee.columbia.edu
E4896 Music Signal Processing (Dan Ellis)
http://www.ee.columbia.edu/~dpwe/e4896/
2013-03-11 - 1 /18
1. Pitch Tracking
• Pitch is a big part
of hearing
orthogonal to
formants (vowels)
in speech
follows fundamental
frequency (f0) of sounds
because of speech? because of periodicity?
• Pitch extraction (tracking) is useful
for coding / representation (telephony)
for extracting information
• .. but can be tricky
sounds without clear f0
E4896 Music Signal Processing (Dan Ellis)
2013-03-11 - 2 /18
Challenges to Pitch Extraction
freq / kHz
• Pitch extraction challenges include...
“Oh, I’m just about to lose my mind...”
4
noise / multiple f0s
3
2
1
0
0.5
1
1.5
2
2.5
40
20
20
0
0
dB
40
-20
ampl.
Talkin, 1995
0
1000
2000
3000
-20
4000
time / s
unclear f0
1000
2000
3000
4000
freq / Hz
3.3
3.31
3.32
3.33
3.34
time / s
0.2
0
0
note segmentation
0.1
-0.1
-0.1
-0.2
0.75
3.5
0
0.2
0.1
3
Voice Activity
Detection
-0.2
0.76
0.77
0.78
0.79
E4896 Music Signal Processing (Dan Ellis)
2013-03-11 - 3 /18
Multiple f0s
• Common in music (“polyphony”)
Frequency
4000
3000
2000
1000
0
0.5
1
1.5
2
Time
2.5
3
3.5
0.5
1
1.5
2
Time
2.5
3
3.5
Frequency
4000
3000
2000
1000
0
cross-terms...
identify-remove-repeat?
E4896 Music Signal Processing (Dan Ellis)
2013-03-11 - 4 /18
2. Spectral Approaches
freq / kHz
• Just pick the most intense spectral peak?
4
3
2
freq / kHz
1
0
4
3
2
1
0
0.5
1
1.5
2
2.5
3
3.5
time / s
fundamental may not be strongest...
E4896 Music Signal Processing (Dan Ellis)
2013-03-11 - 5 /18
Spectral Templates
• “Filter” log-freq
spectrum
with template
*
-60
-40
-20
0
matched
filter ...
0
20
40
60
80
100
120
140
160
0
20
40
60
80
100
120
140
160
=
8000
4000
2000
1000
500
250
8000
4000
2000
1000
500
250
0.5
1
E4896 Music Signal Processing (Dan Ellis)
1.5
2
2.5
3
3.5
time / s
2013-03-11 - 6 /18
Cepstrum
• Spectrum
shows periodicity
at f0
reveal via Fourier
transform of log|Y|
= cepstrum
40
40
20
20
0
0
-20
freq / kHz
0
1000
2000
3000
4000
-20
3000
3000
2000
2000
1000
1000
0
4
Noll, 1967
0
200
400
600
800
1000
0
0
0
1000
200
2000
400
600
3000 freq / Hz
quefrency
3
2
quefrency
1
0
100
80
60
40
20
0.5
1
E4896 Music Signal Processing (Dan Ellis)
1.5
2
2.5
3
3.5
time / s
2013-03-11 - 7 /18
“Specmurt”
Sagayama et al., 2004
• Log-spectrum is like convolution of
pitch peaks and harmonics profile
esp. for multiple notes by one instrument (piano)
• Separate out by deconvolution
inverse Fourier transform of log-f spectrum V(y)
divide out harmonic template H(y)
E4896 Music Signal Processing (Dan Ellis)
2013-03-11 - 8 /18
Sinusoid Tracks
Maher & Beauchamp 1994
• Do sinusoid tracking before pitch tracking
freq / kHz
apply templates / sieve on tracks
consider each strong sinusoid as a candidate
limits to clearly periodic energy (normalized?)
4
3
2
freq / kHz
1
0
4
3
2
1
0
0.5
1
E4896 Music Signal Processing (Dan Ellis)
1.5
2
2.5
3
3.5
time / s
2013-03-11 - 9 /18
3. Time Domain Approaches
Gold & Rabiner 1969
ampl.
• Periodicity is primarily time-domain
0.2
0.2
0
0
0.1
0.1
-0.1
-0.1
-0.2
-0.2
0.75
0.76
0.77
0.78
0.79
3.3
3.31
time-domain processing avoids FFT
3.32
3.33
3.34
time / s
• Early approaches: Look for common gap
between wave features
E4896 Music Signal Processing (Dan Ellis)
2013-03-11 - 10/18
Autocorrelation
• Directly measures
0.2
0.1
0
period of self-similarity
(autoco is inverse FT of
magnitude spectrum)
-0.1
-0.2
0.75
20
0.76
0.77
0.78
3.3
3.31
3.32
3.33 time / s
0.01
0.02
0.03
0
0.01
0.02
0.03
10
0
-10
freq / kHz
Sondhi 1967
4
0
lag / s
3
2
1
lag / ms
0
6
4
2
lag / ms
0
6
4
2
0
0.5
E4896 Music Signal Processing (Dan Ellis)
1
1.5
2
2.5
3
3.5
time / s
2013-03-11 - 11/18
Post-processing
• Can assume & exploit pitch smoothness
• Median filtering
only knows best point
• Dynamic programming (Viterbi path)
Talkin, 1995
lag / ms
can look at all underlying values
raw
median31
viterbi
6
4
2
0
0
0.5
1
E4896 Music Signal Processing (Dan Ellis)
1.5
2
2.5
3
3.5 time / s
2013-03-11 - 12/18
4. Examples:YIN
de Cheveigné & Kawahara, 2002
• Like autocorrelation, but
difference function - robust to changes in amplitude
progressive normalization by cumulative average
- eliminates lag = 0 advantage
outperforms others without post processing
Oct. re: 440 Hz
0
-1
-2
-3
sqrt(apty)
-4
0.5
sqrt(power)
0
0
0.5
1
E4896 Music Signal Processing (Dan Ellis)
1.5
2
2.5
3
3.5 time / s
2013-03-11 - 13/18
Examples: sigmund~
Puckette et al. 1998
• Miller Puckette’s Pd pitch tracker
Ŷ (k · f )
• Sinusoids + sieve L(f ) =
+k
k
• Also performs
note segmentation
and polyphony
E4896 Music Signal Processing (Dan Ellis)
2013-03-11 - 14/18
Examples: Weighted Sieve Polyphonic
• Based on more complex multi-f
• Chooses f via weighted sieve
0
system
Klapuri 2006
0
spectrum is whitened before measuring peaks
weights are optimized for training data
• Multiple f s are found via cancel+repeat, or
0
joint optimization
E4896 Music Signal Processing (Dan Ellis)
2013-03-11 - 15/18
Example: SAcC
Lee & Ellis 2012
• Subband Autocorrelation Classification
multiple subbands
autocorrelation + PCA
trained MLP classifier to find pitch
la
de
yl
short-time
autocorrelation
ine
MLP Classifier
Cochlea
filterbank
frequency channels
Sound
HMM
Pitch
Tracker
freq
lag
Pitch
Correlogram
slice
time
E4896 Music Signal Processing (Dan Ellis)
2013-03-11 - 16/18
• Pitch
Summary
Important perceptually, for speech & music
• Spectrum
Combine multiple harmonics to f0
• Time
Autocorrelation directly measures
periodicity
E4896 Music Signal Processing (Dan Ellis)
2013-03-11 - 17/18
References
•
•
•
•
•
•
•
•
•
•
A. de Cheveigné & H. Kawahara, “YIN, a Fundamental Frequency Estimator for
Speech and Music,” J. Acoust. Soc. Am. 111, 1917–1930, 2002.
B. Gold & L. Rabiner, “Parallel Processing Techniques for Estimating Pitch
Periods of Speech in the Time Domain,” J. Acoust. Soc. Am. 46(2), 442-448, 1969.
A. Klapuri, “Multiple Fundamental Frequency Estimation by Summing Harmonic
Amplitudes,” Proc. Int. Symp. Music IR, 216-221, 2006.
B. S. Lee & D. Ellis, “Noise Robust Pitch Tracking by Subband Autocorrelation
Classification,” Proc. Interspeech-12, P3b.05, 2012.
R. Maher & J. Beauchamp, “Fundamental frequency estimation of musical signals
using a two-way mismatch procedure,” J. Acoust. Soc. Am. 95(4), 2254-2263,1994.
A. Noll, “Cepstrum Pitch Determination,” J. Acoust. Soc. Am. 41, 293-309, 1967.
M. Puckette, T. Apel, D. Zicarelli, “Real-time audio analysis tools for Pd and MSP,”
Proc. Int. Comp. Music Conf., 109-112, 1998.
S. Sagayama, K. Takahashi, H. Kameoka and T. Nishimoto, “Specmurt Anasylis: A
Piano-Roll-Visualization of Polyphonic Music Signal by Deconvolution of LogFrequency Spectrum,” Proc. SAPA-2004, 2004.
M. Sondhi, “New Methods of Pitch Extraction,” IEEE Trans. Audio and
Electroacoustics AU-16(2), 262-266, 1968.
D. Talkin, “A robust algorithm for pitch tracking (RAPT),” in Speech Coding and
Synthesis (W. B. Kleijn and K. K. Paliwal, eds.), 495-518, Elsevier, 1995.
E4896 Music Signal Processing (Dan Ellis)
2013-03-11 - 18/18
Download