ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 8: Pitch Tracking 1. 2. 3. 4. Pitch Tracking Spectral Approaches Time Domain Example algorithms Dan Ellis Dept. Electrical Engineering, Columbia University dpwe@ee.columbia.edu E4896 Music Signal Processing (Dan Ellis) http://www.ee.columbia.edu/~dpwe/e4896/ 2013-03-11 - 1 /18 1. Pitch Tracking • Pitch is a big part of hearing orthogonal to formants (vowels) in speech follows fundamental frequency (f0) of sounds because of speech? because of periodicity? • Pitch extraction (tracking) is useful for coding / representation (telephony) for extracting information • .. but can be tricky sounds without clear f0 E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - 2 /18 Challenges to Pitch Extraction freq / kHz • Pitch extraction challenges include... “Oh, I’m just about to lose my mind...” 4 noise / multiple f0s 3 2 1 0 0.5 1 1.5 2 2.5 40 20 20 0 0 dB 40 -20 ampl. Talkin, 1995 0 1000 2000 3000 -20 4000 time / s unclear f0 1000 2000 3000 4000 freq / Hz 3.3 3.31 3.32 3.33 3.34 time / s 0.2 0 0 note segmentation 0.1 -0.1 -0.1 -0.2 0.75 3.5 0 0.2 0.1 3 Voice Activity Detection -0.2 0.76 0.77 0.78 0.79 E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - 3 /18 Multiple f0s • Common in music (“polyphony”) Frequency 4000 3000 2000 1000 0 0.5 1 1.5 2 Time 2.5 3 3.5 0.5 1 1.5 2 Time 2.5 3 3.5 Frequency 4000 3000 2000 1000 0 cross-terms... identify-remove-repeat? E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - 4 /18 2. Spectral Approaches freq / kHz • Just pick the most intense spectral peak? 4 3 2 freq / kHz 1 0 4 3 2 1 0 0.5 1 1.5 2 2.5 3 3.5 time / s fundamental may not be strongest... E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - 5 /18 Spectral Templates • “Filter” log-freq spectrum with template * -60 -40 -20 0 matched filter ... 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 = 8000 4000 2000 1000 500 250 8000 4000 2000 1000 500 250 0.5 1 E4896 Music Signal Processing (Dan Ellis) 1.5 2 2.5 3 3.5 time / s 2013-03-11 - 6 /18 Cepstrum • Spectrum shows periodicity at f0 reveal via Fourier transform of log|Y| = cepstrum 40 40 20 20 0 0 -20 freq / kHz 0 1000 2000 3000 4000 -20 3000 3000 2000 2000 1000 1000 0 4 Noll, 1967 0 200 400 600 800 1000 0 0 0 1000 200 2000 400 600 3000 freq / Hz quefrency 3 2 quefrency 1 0 100 80 60 40 20 0.5 1 E4896 Music Signal Processing (Dan Ellis) 1.5 2 2.5 3 3.5 time / s 2013-03-11 - 7 /18 “Specmurt” Sagayama et al., 2004 • Log-spectrum is like convolution of pitch peaks and harmonics profile esp. for multiple notes by one instrument (piano) • Separate out by deconvolution inverse Fourier transform of log-f spectrum V(y) divide out harmonic template H(y) E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - 8 /18 Sinusoid Tracks Maher & Beauchamp 1994 • Do sinusoid tracking before pitch tracking freq / kHz apply templates / sieve on tracks consider each strong sinusoid as a candidate limits to clearly periodic energy (normalized?) 4 3 2 freq / kHz 1 0 4 3 2 1 0 0.5 1 E4896 Music Signal Processing (Dan Ellis) 1.5 2 2.5 3 3.5 time / s 2013-03-11 - 9 /18 3. Time Domain Approaches Gold & Rabiner 1969 ampl. • Periodicity is primarily time-domain 0.2 0.2 0 0 0.1 0.1 -0.1 -0.1 -0.2 -0.2 0.75 0.76 0.77 0.78 0.79 3.3 3.31 time-domain processing avoids FFT 3.32 3.33 3.34 time / s • Early approaches: Look for common gap between wave features E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - 10/18 Autocorrelation • Directly measures 0.2 0.1 0 period of self-similarity (autoco is inverse FT of magnitude spectrum) -0.1 -0.2 0.75 20 0.76 0.77 0.78 3.3 3.31 3.32 3.33 time / s 0.01 0.02 0.03 0 0.01 0.02 0.03 10 0 -10 freq / kHz Sondhi 1967 4 0 lag / s 3 2 1 lag / ms 0 6 4 2 lag / ms 0 6 4 2 0 0.5 E4896 Music Signal Processing (Dan Ellis) 1 1.5 2 2.5 3 3.5 time / s 2013-03-11 - 11/18 Post-processing • Can assume & exploit pitch smoothness • Median filtering only knows best point • Dynamic programming (Viterbi path) Talkin, 1995 lag / ms can look at all underlying values raw median31 viterbi 6 4 2 0 0 0.5 1 E4896 Music Signal Processing (Dan Ellis) 1.5 2 2.5 3 3.5 time / s 2013-03-11 - 12/18 4. Examples:YIN de Cheveigné & Kawahara, 2002 • Like autocorrelation, but difference function - robust to changes in amplitude progressive normalization by cumulative average - eliminates lag = 0 advantage outperforms others without post processing Oct. re: 440 Hz 0 -1 -2 -3 sqrt(apty) -4 0.5 sqrt(power) 0 0 0.5 1 E4896 Music Signal Processing (Dan Ellis) 1.5 2 2.5 3 3.5 time / s 2013-03-11 - 13/18 Examples: sigmund~ Puckette et al. 1998 • Miller Puckette’s Pd pitch tracker Ŷ (k · f ) • Sinusoids + sieve L(f ) = +k k • Also performs note segmentation and polyphony E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - 14/18 Examples: Weighted Sieve Polyphonic • Based on more complex multi-f • Chooses f via weighted sieve 0 system Klapuri 2006 0 spectrum is whitened before measuring peaks weights are optimized for training data • Multiple f s are found via cancel+repeat, or 0 joint optimization E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - 15/18 Example: SAcC Lee & Ellis 2012 • Subband Autocorrelation Classification multiple subbands autocorrelation + PCA trained MLP classifier to find pitch la de yl short-time autocorrelation ine MLP Classifier Cochlea filterbank frequency channels Sound HMM Pitch Tracker freq lag Pitch Correlogram slice time E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - 16/18 • Pitch Summary Important perceptually, for speech & music • Spectrum Combine multiple harmonics to f0 • Time Autocorrelation directly measures periodicity E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - 17/18 References • • • • • • • • • • A. de Cheveigné & H. Kawahara, “YIN, a Fundamental Frequency Estimator for Speech and Music,” J. Acoust. Soc. Am. 111, 1917–1930, 2002. B. Gold & L. Rabiner, “Parallel Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain,” J. Acoust. Soc. Am. 46(2), 442-448, 1969. A. Klapuri, “Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes,” Proc. Int. Symp. Music IR, 216-221, 2006. B. S. Lee & D. Ellis, “Noise Robust Pitch Tracking by Subband Autocorrelation Classification,” Proc. Interspeech-12, P3b.05, 2012. R. Maher & J. Beauchamp, “Fundamental frequency estimation of musical signals using a two-way mismatch procedure,” J. Acoust. Soc. Am. 95(4), 2254-2263,1994. A. Noll, “Cepstrum Pitch Determination,” J. Acoust. Soc. Am. 41, 293-309, 1967. M. Puckette, T. Apel, D. Zicarelli, “Real-time audio analysis tools for Pd and MSP,” Proc. Int. Comp. Music Conf., 109-112, 1998. S. Sagayama, K. Takahashi, H. Kameoka and T. Nishimoto, “Specmurt Anasylis: A Piano-Roll-Visualization of Polyphonic Music Signal by Deconvolution of LogFrequency Spectrum,” Proc. SAPA-2004, 2004. M. Sondhi, “New Methods of Pitch Extraction,” IEEE Trans. Audio and Electroacoustics AU-16(2), 262-266, 1968. D. Talkin, “A robust algorithm for pitch tracking (RAPT),” in Speech Coding and Synthesis (W. B. Kleijn and K. K. Paliwal, eds.), 495-518, Elsevier, 1995. E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - 18/18