jSymbolic Summary

advertisement
Cedar Wingate
MUMT 621
Professor Ichiro Fujinaga
28 October 2009
jSymbolic Summary
The field of automatic music classification attempts to measure similarity between
music through the use of computers and computer programs. In order to figure out
similarity, these classification systems need to break down the audio track into different
features by which similarity can be measured. In their paper, “jSymbolic: A Feature
Extractor for MIDI Files” (2006), Cory McKay and Ichiro Fujinaga describe the broad
way that these characteristics are broken down. There are three main types of features:
low-level, derived from audio signal analysis; high-level, derived from symbolic
representations like MIDI files; and cultural features, derived from Internet data mining
(McKay and Fujinaga 2006, 1).
High-level features, in particular, have many aspects about them that make them
desirable for automatic music classification.
For one, they represent musically
meaningful data that can be used by musicologists and music theorists. For example,
instruments, contour, tempo, dynamics, harmony, and melody, just to name a few.
Secondly, there is a great deal of symbolic data already available through MIDI files and
Humdrum, and potential for a great deal more data to become available with advances in
optical music recognition technology (McKay and Fujinaga 2006).
jSymbolic is an application developed to extract high-level features from MIDI
files for use in automatic music classification. Unfortunately, these features cannot be
reliably extracted from audio signals.
So MIDI or some other symbolic data are
necessary. The jSymbolic package is open source and designed to be easily expanded
(McKay and Fujinaga 2006).
In designing the application, there were a few goals and issues that were taken
into consideration. The main goal was to create a system for the classification of a wide
variety of musics that would not need to be tweeked for some new style. Designing the
features for extraction was key to accomplishing this. The developers wanted to avoid
the “curse of dimensionality” (McKay and Fujinaga 2006) and did not want to relegate
their features to the analysis system of one particular music theoretical approach (McKay
2004a, 57). They decided to use “large catalogue of features, with an emphasis on
general features, and to give users the option of selecting which ones they want to
extract” (McKay and Fujinaga 2006, 2).
There were two basic characteristics of the features. They were either onedimensional or two dimensional. One-dimensional features are features that can be
represented by one value, like means, standard deviations, and true/false values. Multidimensional features must be represented by more than one value, like historgrams.
Histograms, in particular, are a pretty cool feature of jSymbolic. In their essay,
“Style-independent computer-assisted exploratory analysis of large music collections”
(2007), McKay and Fujinaga use a few diagrams to show the difference between a
Ramones song and a Thelonious Monk song. Using a beat historgram, they show that
The Ramone’s song is visibly not as tight on the beat and has multiple tempo emphases in
its tempo range compared to the very tight rhythm of the Thelonious Monk song (McKay
and Fujinaga 2007, 70).
The features they chose were drawn from research in multiple fields, including
music theory, ethnomusicology, music cognition, and popular musicology (McKay and
Fujinaga 2006). 160 total features were chosen, 111 of which have been implemented
(McKay 2004a). These features are broken down into 7 groups: instrumentation, texture,
rhythm, dynamics, pitch statistics, melody and chords. Only the last group, chords, has
not been implemented at all.
In another paper, McKay and Fujinaga (2005) investigated which features
actually had the best statistics for classifying music into a set of genres. “It was found
that features based on instrumentation (i.e., timbre) were by far the most important type
of features” (McKay and Fujinaga 2005, 9). This study also validated their work into
high-level feature extraction by showing that similarity results with high-level features
scored better than results with audio signal feature extraction. They also used a larger
number of classes (genres) in this study, which gave some hope that there is potential for
automatic music classification systems to be able handle larger class sets and maybe
someday deal with the number of classes a human expert is dealing with.
McKay, C. 2004a. Automatic genre classification of MIDI recordings. M.A. Thesis.
McGill University, Canada.
McKay, C. 2004b. Automatic genre classification as a study of the viability of high-level
features for music classification. Proceedings of the International Computer Music
Conference. 367–70.
McKay, C., and I. Fujinaga. 2005. Automatic music classification and the importance of
instrument identification. Proceedings of the Conference on Interdisciplinary
Musicology.
McKay, C., and I. Fujinaga. 2006. jSymbolic: A feature extractor for MIDI files.
Proceedings of the International Computer Music Conference. 302–5.
McKay, C., and I. Fujinaga. 2007. Style-independent computer-assisted exploratory
analysis of large music collections. Journal of Interdisciplinary Music Studies 1 (1): 63–
85.
Download