RTpitchtrack - McGill University

advertisement
Overview of Real-Time
Pitch Tracking
Approaches
Music information retrieval seminar
McGill University
Francois Thibault
Presentation Goals



Describe the requirements of RT pitch
tracking algorithm for musical applications
Briefly introduce key developments in RT
pitch tracking algorithms
Provide insight on what techniques might
be more suitable for a given application
Pitch tracking requirements
in musical context





Must often function in real-time
Minimal output latency
Accuracy in the presence of noise
Frequency resolution
Flexibility and adaptability to various musical
requirements:



Pitch range
Dynamic range
…
Overview of techniques

Time-domain methods




Frequency-domain methods






Autocorrelation Function (Rabiner 77)
Average Magnitude Difference Function (AMDF)
Fundamental Period Measurement (Kuhn 90)
Cepstrum (Noll 66)
Harmonic Product Spectrum (Schroeder 68)
Constant-Q transform (Brown 92)
Least-Squares fitting (Choi 97)
Maximum Likelihood (McAulay 86, Puckette 98)
Other approaches…
Autocorrelation method



Based on the fact that periodic signal will
correlate strongly with itself offset by the
fundamental period
Measures to which extent a signal correlates
with a time-shifted version of itself
The time shifts which display peaks in the ACF
corresponds to likely period estimate
1
ø(t) =
N
N- 1
• x(n) x(n + t)
n=0
Autocorrelation Pros/Cons






Simple implementation (good for hardware)
Can handle poor quality signals (phase
insensitive)
Often requires preprocessing (spectral
flattening)
Poor resolution for high frequencies
Analysis parameters hard to tune
Uncertainty between peaks generated by
formants and periodicity of sound can lead to
wrong estimation
AMDF



Again based on the idea that a periodic
signal will be similar to itself when shifted
by fundamental period
Similar in concept to ACF, but looks at
difference with time shifted version of itself
The time shifts which display valleys
correspond to likely period estimates
1
psi(t) =
N
N-1
•
n=0
x(n) - x(n + t)
AMDF Pros/Cons




Poor frequency resolution
Even simpler implementation then ACF
(good for hardware)
Less computationally expensive then ACF
Combination of AMDF and ACF yields
result more robust to noise (Kobayashi 95)
f(t) =
ø(t)
psi(t) + k
Fundamental Period
Measurement approach




Signal is first ran through bank of half-octave
bandpass filters
If filters are sharp enough, the output of one filter
should display the input waveform freed of its
upper partials (nearly sinusoidal)
It is up to a decision algorithm to decide which
filter output corresponds to fundamental
frequency
Time between zero crossings of that filter output
determines period
FPM Pros/Cons




Easy implementation (hardware and
software)
Efficiency of computation
Decision algorithm highly dependent on
thresholds
But, automatic threshold setting provided
for most situations
Cepstrum approach





Tool often used in speech processing
Cepstrum is defined as power spectrum of
logarithm of the power spectrum
Clearly separate contribution of vocal tract and
excitation
A strong peak is displayed in the excitation part
(high cepstral region) at the fundamental
frequency
Use a peak picker on cepstrum and translate
quefrency into fundamental frequency
Cepstrum Pros/Cons



Less confusion between candidates than
in ACF
Proven method, especially suitable for
signal easily characterized by source-filter
models (e.g. voice)
Relatively computationally intensive (2
FFTs)
Harmonic Product Spectrum
approach



Measures the maximum coincidence of
harmonics for each spectral frame
Resulting periodic correlation array is
searched for maximum which should
correspond to fundamental frequency
Algorithm ran for octave correction
Y(w) = Prod X(wr)
Ÿ = max (Y(wi))
HPS Pros/Cons





Simple to implement
Does well under wide variety of conditions
Poor low frequency resolution
Computing complexity augmented by zero
padding required for interpolation of low
frequencies
Requires post-processing for error
correction
Constant-Q transform
approach



First computes the Constant-Q transform
to obtain constant pattern in log frequency
domain (Q = fc/bw)
Compute the cross-correlation with a fixed
comb pattern (ideal partial positions for
given fundamental frequency)
Peak-pick the result to obtain fundamental
frequency
Constant-Q Pros/Cons



Complexity of constant-Q reduced but
still… (Brown and Puckette 91)
Sensitive to octave errors
Other peaks could be candidates
Least-Squares fitting
approach





Perform least-squares spectral analysis -->
minimize error by fitting sinusoids to the signal
segment
Strong sinusoidal components are identified as
sharp valleys in least-square error signal
Relatively few evaluation of the error signal are
required to identify a valley
Fundamental frequency is obtained as average
of partial frequencies over their partial number
Uses rectangular windowing to provide faster
response
LS fitting Pros/Cons



Operates on shorter frame segments
Best option for real-time applications with
minimum latency requirements
Efficient evaluation scheme allows
reasonable computation complexity
Maximum Likelihood



Maximum likelihood algorithm searches
trough a set of possible ideal spectra and
chooses closest match (Noll 69)
Was adapted to sinusoidal modeling
theory, by finding best fit for harmonic
partials sets to the measured model
(McAulay 86)
Enhance discrimination by suppressing
partials of small amplitude values
ML Pros/Cons



Inherits high computational requirement
from sinusoidal modeling
Very robust estimation
Allows guess of fundamental frequency
even with several partials missing.
Other approaches






Neural Nets (Barnar 91)
Hidden Markov Models (Doval 91)
Parrallel processing approaches (Rabiner
69)
Fourier of Fourier transforms (Marchand
2001)
Two-way mismatch model (Cano 98)
Subharmonic to harmonic ratio (Sun 2000)
Conclusions




Lot of research still… Motivated by speech
telecommunication
Abundant literature since 1950
Complete and objective performance
overviews seems missing
Combination of techniques in parallel
processing seems foreseeable with
today’s fast computers
Download