Daniel Han

advertisement
Daniel Han
ISE599
Feb 5, 2004
Summary review of Pitch Histograms in Audio and Symbolic Music Information
Retrieval Tzanetakis, George
This paper simply describes a method that tries to classify a piece of audio into a
musical genre. This is done by creating pitch histograms, basically graphing the number
of times a certain pitch occurs. The piece of audio that is used may be in the form of
MIDI, where the pitch is explicit, audio synthesized from MIDI, and normal audio.
Moreover, these pitch histograms can be readily applied to music information systems.
The authors’ basis for genre classification is that certain genres will have distinct pitch
patterns in term of how many times a note will occur. Of course, this is done from a
purely statistical viewpoint.
Essentially a pitch histogram shows the relative number of times a note occurs in
a musical piece. These histograms are presented in two ways folded and unfolded. The
latter simply represents all the different pitches of the chromatic scale (basically shows
the entire pitch height). The former, takes into account the chroma of the sound and maps
all the notes into a single octave, which I thought was a very clever thing to do. From the
folded version, he mentions one can easily see the tonal music relations like perfect fifths.
In determining actual pitch, various methods are needed for the different types of
input used. For the MIDI file, this task is trivial as the pitch is explicit from the MIDI file.
For the audio, he actually uses an already established multiple-pitch determining
algorithm. This algorithm is quite complicated as it breaks the input into high and low
frequency components using the discrete Fourier transform, which luckily is left for
electrical engineers to solve. From here the task for classifying the input was discussed.
The author used what is called a statistical pattern recognition classifiers for this task. He
was somewhat vague about what this exactly was, but I believe he compiled a set of data
mapping precompiled pitch histograms to its proper music genre. Thus his system had
some basis for future input in order to classify them.
His results were fairly predictable as the MIDI was the most accurate in
determining the proper genre, followed by audio-from-MIDI and finally regular audio. I
found it odd that he kept on comparing his statistics to that of random guessing. I would
hope that his work would significantly better than mere random guessing as there would
be no point in documenting such a failure. I suppose it has to do with the lack of research
done in this area and so the author really doesn’t have a good way to compare his work
quantitatively. However, I was fairly impressive with the relative speed at which his
program can determine genre, which is on the order of a few seconds. I can imagine how
difficult it would be to determine such a thing if given only a few seconds of an audio
sample. It seems that much of this work is still in its nascent stages as I found nothing
very overly dramatic about the authors’ research.
Download