savard06overview

advertisement
An Overview of Pitch
Detection Algorithms
Alexandre Savard
MUMT611: Music Information Acquisition,
Preservation, and Retrieval
February 2006
Content
• Introduction
– Classification
– Applications
– Problems and Constraints
• Time Domain Algorithms
• Frequency Domain Algorithms
• Alternative Techniques
• Conclusion
Introduction
Prior Definitions
– Pitch : Defined as the perceptual appreciation of
the highness or the lowness of a sound. It is related
to the periodicity of a sound.
– Frequency : Physical attribute of a sound or any
type other of signal. Describes the amount of times
that a repeated event occur per unit of time.
– Fundamental Frequency : In a complex sound or
signal, it is the lowest partial.
Introduction
Application of Pitch Tracking
– Music Automatic Transcription from audio signals to
common music notation or to MIDI number
– Score Following
– Musical Queries by singing or humming
– Acoustic feature for Human-Computer Interaction
– Sound-Editing Program like pitch-shifting and timescaling operation
Introduction
Non-Exclusive Classification
– Voice
( Speech, Singing )
– Instrumental
– Monophonic
– Polyphonic
– Time-Based Algorithm
– Spectral-Based Algorithm
– Alternative
Introduction
Generally Encountered Problems
– Noise
– Reverberation
– Other Sounds from the environment
– Shortness of the sustained part for certain sounds
– Sounds need to be analyzed right after the attack
transient where they are not totally stable
– Detuning during the sustain part of a sound
– Minimal output delay for realtime.
Introduction
Music-Specific Difficulties
– Large frequency range for musical instrument
– Many instrumental sound have inharmonic partials
– Expressiveness factors ( glissando, vibrato, thrill )
– Fast algorithm for real-time processing
– Multiphonic
Time Domain
• Zero-Crossing Detection
• Autocorrelation Function
• Average Magnitude Difference
Function
Time Domain
Zero-Crossing Detection
– Based on a direct application of the definition of
periodicity
– Counting the number of time that the signal crosses a
reference level
– Mostly Inexpensive in computation
– Weakness against noise
– Presents weakness when used to analyze signals with
energy in high frequencies
Time Domain
Zero-Crossing Detection
http://www-ccrma.stanford.edu/~pdelac/154/m154paper.htm#_ftn5
Time Domain
Autocorrelation Technique
– Cross-Correlation
is a non-linear operation that
measure the similarity between two signal.
– The coresponding samples of a signals and a timeshifted version of an other one are multiplied and
added toghether.
– The Cross-Correlation functionwill then have a peak to
the offset value which coresponds to the maximum of
similarity.
Time Domain
Autocorrelation Technique
– Autocorrelation is a cross-correlation of a signal with
itself.
– The maximum of similarity occurs for time shifting of
zero.
– An other maximum should occur in theory when the
time-shifting of the signal corresponds to the
fundamental period.
Time Domain
Autocorrelation Technique
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html
Time Domain
Autocorrelation Technique
– Not very efficient for high fundamental frequency.
– Convolution is a very expensive process.
– Computation efficiency can be improved using the FFT
algorithm instead of convolution. It reduces calculation
from N squared to NlogN.
– Most of the variation of this technique related to the
mathematical definition of the autocorrelation used, the
way the maximums are localized, and how errors in the
maximum identification are attenuated.
Time Domain
Average Magnitude Difference Function
– It is an alternate to Autocorrelation function.
– It compute the difference between the signal and a
time-shifted version of itself.
– While auttocorelation have peaks at maximum
similarity, there will be valleys in the average
magnitude difference function.
Time Domain
Other Temporal Algorithm
– Waveform Maximum Detection
– Sum Magnitude Difference Squared Function
– Average Squared Difference Function
– Cumulative Mean Normalized Difference Function
– Circular Average Magnitude Difference Function
– Adaptive Filter
Time Domain
Other Temporal Algorithm
– Adaptive Filter
– Super Resolution Pitch Determination
Frequency Domain
• Harmonic
• Cepstrum
Product Spectrum
Frequency Domain
Harmonic Product Spectrum
– FFT is used to convert temporal representation of
sound into its spectral representation
– Assume that all signals are made of harmonic partials
– The spectrum is compressed by a factor corresponding
to harmonic numbers
– Multiplying the compressed spectrum with the
original one leads to a amplification of the fundamental
frequency
Frequency Domain
Harmonic Product Spectrum
– The
highest peak most likely correspond to the
fundamental frequency
http://www-ccrma.stanford.edu/~pdelac/154/m154paper.htm#_ftn5
Frequency Domain
Harmonic Product Spectrum
– Presents a high degree of robustness in a noisy
environment
– Less efficient for sounds that are not made from
harmonic components
– Computationnally inexpensive
– Octave Errors can occur
Frequency Domain
Cepstrum
– Cepstrum is defined as the inverse Fourrier transform
of the logarithm of the power spectrum of a signal
– Cepstrum extracts periodicity from the spectrum
– It can be unformally mathematically written as:
– It results a peak which correspond to the fundamental
period
Frequency Domain
Calculation of Cepstrum for Voice
– In the source filter-model, voiced speech s(t) can be
considered as the convolution of a pulse train p(t) with
the impulse respond of the vocal tract h(t).
– In the spectrum we get:
– Taking the logarithm on both side we then obtain:
Frequency Domain
Cepstrum
– The logarithim operation flatten the spectra so that so
that it gives more robustness for formants
– However this same operation rises the noise level
Frequency Domain
Other Frequency Domain Algorithm
– Maximum Likelihood
– Linear Prediction Coding
– Spectral Autocorrelation
Alternative Technique
Teager Energy Function
– Referring again to the source-filter model for voice,
it can be represented by a pulse train filtered by the
vocal tract.
– The pulse train is produced by the successive opening
and closure of the glottis.
– The production of speech is closely related to the
release of energy through the glottis.
– The opening/closure of the glottis result in a peak of
energy into the signal
Alternative Technique
Teager Energy Function
– The Teager energy function is a non-linear operator
that defines the instantaneous energy as:
– It is derived from the total energy of an oscillatory
spring-mass system.
- Estimating the periodicity of energy peaks for the
signal leads to an approximation of the fundamental
frequency.
Alternative Technique
Miscellaneous Technique
– Wavelet Transform
– Bayesian Statistical Model
– Hidden Markov Model
– Graphical probablilistic Models
– Perceptual Pitch Detector
Conclusion
Bibliography
• Liu B.,Wu Y., L Yi. "Linear Hidden Markov Model for Music Information Retrieval
Based on Humming." Paper presented at the International Conference on Acoustics,
Speech, and Signal Processing 2003.
• Li B., Li Y., Wang C., Tang C., Zhang E. "A New Efficient Pitch-Tracking Algorithm."
Paper presented at the International Conference on Robotics, Intelligent Systems and
Signal Processing 2003.
• Chilton E., Evans B. "The Spectral Autocorrelation Applied to the Linear Prediction
Residual of Speech for Robust Pitch Detection." Paper presented at the International
Conference on Acoustics, Speech, and Signal Processing 1988.
• Monti G., Sandler M. "Monophonic Transcription with Autocorrelation " Paper
presented at the Conference on Digital Audio Effects 2000.
• Liu J., Zheng T., Deng J. and Wu W. "Real-Time Pitch Tracking Based on Combined
Smdsf." Paper presented at the Conference on Speech Communcation and Technology
2005.
Bibliography
• Luo H., Denbigh P. "A Speech Separation System That Is Robust to Reverberation."
Paper presented at the International Symposium on Speech, Image Processing and Neural
Networks 1994.
• Wu M., Wang D., Brown G. "A Multi-Pitch Tracking Algorithm for Noisy Speech."
Paper presented at the International Conference on Acoustic, Speech, and Signal
Processing 2002.
• Nazih Abu-Shikhah Mohamed Deriche. "A Novel Pitch Estimation Technique Using the
Teager Energy Function." Paper presented at the International Symposium on Signal
Processing and its Applications 1999.
• Picone J., Doddington G., Secrest B. "Robust Pitch Detection in a Noisy Telephone
Environment." Paper presented at the International Conference on Acoustics, Speech, and
Signal Processing 1987.
• Quast H., Schreiner O., Schroeder R. "Robust Pitch Tracking in the Car Environment."
Paper presented at the International Conference on Acoustics, Speech, and Signal
Processing 2002.
Bibliography
• Marchand S. "An Efficient Pitch-Tracking Algorithm Using a Combination of
Fourier Transforms." Paper presented at the Conference on Digital Audio Effects
2001.
• Walmsley P., Godsill S., Rayner P. "Polyphonic Pitch Tracking Using Joint
Bayesian Estimation of Multiple Frame Parameters." Paper presented at the
Workshop on Applications of Signal Processing to Audio and Acoustics 1999.
•Zhu W., Kankanhalli M. "Robust and Efficient Pitch Tracking for Query-byHumming." Paper presented at the Conference on Information, Communications
and Signal Processing 2003.
• Roads C., “The Computer Music Tutorial”, p.497-533, Boston, The MIT Press,
1996.
Download