Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault Presentation Goals Describe the requirements of RT pitch tracking algorithm for musical applications Briefly introduce key developments in RT pitch tracking algorithms Provide insight on what techniques might be more suitable for a given application Pitch tracking requirements in musical context Must often function in real-time Minimal output latency Accuracy in the presence of noise Frequency resolution Flexibility and adaptability to various musical requirements: Pitch range Dynamic range … Overview of techniques Time-domain methods Frequency-domain methods Autocorrelation Function (Rabiner 77) Average Magnitude Difference Function (AMDF) Fundamental Period Measurement (Kuhn 90) Cepstrum (Noll 66) Harmonic Product Spectrum (Schroeder 68) Constant-Q transform (Brown 92) Least-Squares fitting (Choi 97) Maximum Likelihood (McAulay 86, Puckette 98) Other approaches… Autocorrelation method Based on the fact that periodic signal will correlate strongly with itself offset by the fundamental period Measures to which extent a signal correlates with a time-shifted version of itself The time shifts which display peaks in the ACF corresponds to likely period estimate 1 ø(t) = N N- 1 • x(n) x(n + t) n=0 Autocorrelation Pros/Cons Simple implementation (good for hardware) Can handle poor quality signals (phase insensitive) Often requires preprocessing (spectral flattening) Poor resolution for high frequencies Analysis parameters hard to tune Uncertainty between peaks generated by formants and periodicity of sound can lead to wrong estimation AMDF Again based on the idea that a periodic signal will be similar to itself when shifted by fundamental period Similar in concept to ACF, but looks at difference with time shifted version of itself The time shifts which display valleys correspond to likely period estimates 1 psi(t) = N N-1 • n=0 x(n) - x(n + t) AMDF Pros/Cons Poor frequency resolution Even simpler implementation then ACF (good for hardware) Less computationally expensive then ACF Combination of AMDF and ACF yields result more robust to noise (Kobayashi 95) f(t) = ø(t) psi(t) + k Fundamental Period Measurement approach Signal is first ran through bank of half-octave bandpass filters If filters are sharp enough, the output of one filter should display the input waveform freed of its upper partials (nearly sinusoidal) It is up to a decision algorithm to decide which filter output corresponds to fundamental frequency Time between zero crossings of that filter output determines period FPM Pros/Cons Easy implementation (hardware and software) Efficiency of computation Decision algorithm highly dependent on thresholds But, automatic threshold setting provided for most situations Cepstrum approach Tool often used in speech processing Cepstrum is defined as power spectrum of logarithm of the power spectrum Clearly separate contribution of vocal tract and excitation A strong peak is displayed in the excitation part (high cepstral region) at the fundamental frequency Use a peak picker on cepstrum and translate quefrency into fundamental frequency Cepstrum Pros/Cons Less confusion between candidates than in ACF Proven method, especially suitable for signal easily characterized by source-filter models (e.g. voice) Relatively computationally intensive (2 FFTs) Harmonic Product Spectrum approach Measures the maximum coincidence of harmonics for each spectral frame Resulting periodic correlation array is searched for maximum which should correspond to fundamental frequency Algorithm ran for octave correction Y(w) = Prod X(wr) Ÿ = max (Y(wi)) HPS Pros/Cons Simple to implement Does well under wide variety of conditions Poor low frequency resolution Computing complexity augmented by zero padding required for interpolation of low frequencies Requires post-processing for error correction Constant-Q transform approach First computes the Constant-Q transform to obtain constant pattern in log frequency domain (Q = fc/bw) Compute the cross-correlation with a fixed comb pattern (ideal partial positions for given fundamental frequency) Peak-pick the result to obtain fundamental frequency Constant-Q Pros/Cons Complexity of constant-Q reduced but still… (Brown and Puckette 91) Sensitive to octave errors Other peaks could be candidates Least-Squares fitting approach Perform least-squares spectral analysis --> minimize error by fitting sinusoids to the signal segment Strong sinusoidal components are identified as sharp valleys in least-square error signal Relatively few evaluation of the error signal are required to identify a valley Fundamental frequency is obtained as average of partial frequencies over their partial number Uses rectangular windowing to provide faster response LS fitting Pros/Cons Operates on shorter frame segments Best option for real-time applications with minimum latency requirements Efficient evaluation scheme allows reasonable computation complexity Maximum Likelihood Maximum likelihood algorithm searches trough a set of possible ideal spectra and chooses closest match (Noll 69) Was adapted to sinusoidal modeling theory, by finding best fit for harmonic partials sets to the measured model (McAulay 86) Enhance discrimination by suppressing partials of small amplitude values ML Pros/Cons Inherits high computational requirement from sinusoidal modeling Very robust estimation Allows guess of fundamental frequency even with several partials missing. Other approaches Neural Nets (Barnar 91) Hidden Markov Models (Doval 91) Parrallel processing approaches (Rabiner 69) Fourier of Fourier transforms (Marchand 2001) Two-way mismatch model (Cano 98) Subharmonic to harmonic ratio (Sun 2000) Conclusions Lot of research still… Motivated by speech telecommunication Abundant literature since 1950 Complete and objective performance overviews seems missing Combination of techniques in parallel processing seems foreseeable with today’s fast computers