Daniel Han ISE599 Feb 5, 2004 Summary review of Pitch Histograms in Audio and Symbolic Music Information Retrieval Tzanetakis, George This paper simply describes a method that tries to classify a piece of audio into a musical genre. This is done by creating pitch histograms, basically graphing the number of times a certain pitch occurs. The piece of audio that is used may be in the form of MIDI, where the pitch is explicit, audio synthesized from MIDI, and normal audio. Moreover, these pitch histograms can be readily applied to music information systems. The authors’ basis for genre classification is that certain genres will have distinct pitch patterns in term of how many times a note will occur. Of course, this is done from a purely statistical viewpoint. Essentially a pitch histogram shows the relative number of times a note occurs in a musical piece. These histograms are presented in two ways folded and unfolded. The latter simply represents all the different pitches of the chromatic scale (basically shows the entire pitch height). The former, takes into account the chroma of the sound and maps all the notes into a single octave, which I thought was a very clever thing to do. From the folded version, he mentions one can easily see the tonal music relations like perfect fifths. In determining actual pitch, various methods are needed for the different types of input used. For the MIDI file, this task is trivial as the pitch is explicit from the MIDI file. For the audio, he actually uses an already established multiple-pitch determining algorithm. This algorithm is quite complicated as it breaks the input into high and low frequency components using the discrete Fourier transform, which luckily is left for electrical engineers to solve. From here the task for classifying the input was discussed. The author used what is called a statistical pattern recognition classifiers for this task. He was somewhat vague about what this exactly was, but I believe he compiled a set of data mapping precompiled pitch histograms to its proper music genre. Thus his system had some basis for future input in order to classify them. His results were fairly predictable as the MIDI was the most accurate in determining the proper genre, followed by audio-from-MIDI and finally regular audio. I found it odd that he kept on comparing his statistics to that of random guessing. I would hope that his work would significantly better than mere random guessing as there would be no point in documenting such a failure. I suppose it has to do with the lack of research done in this area and so the author really doesn’t have a good way to compare his work quantitatively. However, I was fairly impressive with the relative speed at which his program can determine genre, which is on the order of a few seconds. I can imagine how difficult it would be to determine such a thing if given only a few seconds of an audio sample. It seems that much of this work is still in its nascent stages as I found nothing very overly dramatic about the authors’ research.