Automatic Classification of Music Genre Cory McKay Introduction • Genre classifications can be of great utility to musical information retrieval systems • Genre is a natural way of classifying music • Genre is intrinsically built on the similarities between pieces of the same genre and differences between pieces of different genres Introduction • Currently no widely accepted automatic genre identification system • Most genre annotation done by hand • An automated genre recognition system would make it possible to classify and search large electronic music libraries Feature Extraction • Genre is characterized by common features of pieces belonging to it: – – – – – – Instrumentation Texture Dynamics Rhythmic characteristics Melodic gestures Harmonic content Feature Extraction • Not always clear which features are the most relevant • Features can be difficult to extract • First challenge of genre classification is to overcome these problems Complementary Research • Large existing body of research in speech recognition and classification systems • Can use techniques relating to extraction of timbral texture features • Can also make use of existing systems that can distinguish between musical, speech and environmental signals Complementary Research • Existing beat-tracking systems can prove useful • Many beat-trackers provide only an estimate of the main beat and its strength • More detailed information needed for genre classification: – – – – – Overall meter Syncopation Rubato Recurring rhythmic gestures Relative strengths of beats and sub-beats Pattern Recognition • Once features have been extracted, then need to perform classification • Existing general-purposed machine-learning and heuristic-based techniques that can be adapted Defining a Taxonomy • How do we define the taxonomy that pieces will be classified into? – Different people may classify the same piece differently – Selections can be made from entirely different genre domains – Different people emphasize different features – Often overlap between genres – How are different genres related? • Lack of universally agreed upon definitions of genres makes it difficult to find appropriate heuristics for defining genre Pachet and Cazaly (2000) • Observe that the taxonomies currently used by the music industry are inconsistent • Are therefore inappropriate for the purposes of developing a global music database Pachet and Cazaly (2000) • Retailers use a four-level hierarchy: – – – – Global music categories (e.g. classical, jazz, rock) Sub-categories (e.g. operas, Dixieland, heavy metal) Artists Albums • Different levels represent different dimensions • Different retailers use different sets of genres and sometimes classify the same recording differently Pachet and Cazaly (2000) • Copyright companies base taxonomies on commercial configurations and audience demographics rather than on characteristics of music itself • Internet companies such as Amazon.com tend to build tree-like classification systems – very broad categories near the root level – very specialized categories at the leafs Pachet and Cazaly (2000) – Companies differ greatly on how deep the subcategories go for different global styles of music – Many of these genres are poorly defined and interpreted differently by different companies – Lack of consistency in the relation between a parent and a child • Sometimes genealogical (e.g. rock -> hard rock) • Sometimes geographical (e.g. Africa -> Algeria) • Sometimes based on historical periods (e.g. Baroque -> Baroque Violin Concertos) Pachet and Cazaly (2000) • These inconsistencies not significant for people manually browsing through catalogues • Inconsistencies are problematic for automatic classification systems. Pachet and Cazaly (2000) • Suggest building an entirely new classification system • Goals of taxonomy: – Objective – Consistent – Independent from other metadatabase descriptor – Supports searhes by similarity Pachet and Cazaly (2000) • Suggest a tree-based system • Only leaves contain musical pieces • Each node contains the genealogical parent of the genre and the differences between that node and its parent. My Evaluation • Valid concerns about existing taxonomies • Proposed solution: – – – – How achieve an objective classification system? Hard to get people to agree to a standard New genres are constantly emerging Does not solve the problem of fuzzy boundaries between genres – Does not deal with the problem of multiple parents which can compromise the tree structure Implementations • Actual implementations have sidestepped this issue by limiting their testing to only a few simple classifications • Acceptable approach in the early stages of development • Problem of taxonomy structure will need to be carefully considered for systems that hope to scale to real-world applications Tzanetakis, Essl & Cook (2001) • Cite a study indicating that humans can often classify genre after hearing only 250 ms of a signal • Should therefore be possible to make classification system that does not consider musical form or structure • Implies that real-time analysis of genre could be easier to implement than might be thought Tzanetakis, Essl & Cook (2001) • Developed two GUI-based systems – GenreGram • Developed for real-time radio broadcasts • Displays bouncing cylinders – GenreSpace • Provides a 3-D representation of genre space • Maps each recording to a point based on its three most distinguishing features • Meant to be used for comparing large collections of recordings Tzanetakis & Cook (2002) • Further develops ideas • Most influential implementation to date • Proposes using three classes of features: – Timbral texture – Rhythmic content – Pitch content Tzanetakis & Cook (2002) • Timbral texture features: – – – – – – Means and variances of spectral centroid Rolloff Flux Zero-crossings over the texture window Low energy Means and variances of the first five melfrequency Cepstral coefficients Tzanetakis & Cook (2002) • Rhythmic content features: – “Beat histogram” – Each bin of the histograms consists of a beats-per-minute level – Can see the relative strengths of different beats and sub-beats Tzanetakis & Cook (2002) • Pitch content features: – Used 3 pitch histograms – Each bin of these corresponded to a given pitch • 1) a bin for each MIDI pitch • 2) pitches of the same chroma in a single bin • 3) reordered the bins so that adjacent bins were separated by a 5th rather than a semi-tone. – Histograms used to extract features that could be used to compare traits such as pitch variation, strength of the tonic-dominant relationship, range and harmonic complexity Tzanetakis & Cook (2002) • Histogram approach provided a great deal of useful data • Disadvantage: lose all information relating to the order that musical events occurred • Presence of recurring phrases, for example, could only be stored in a diluted form by histograms Tzanetakis & Cook (2002) • Used a variety of statistical pattern recognition (SPR) classifiers to process features • SPR classifiers attempt to estimate the probability density function for the feature vectors of each genre • Classifiers trained to distinguish between 20 musical genres and 3 speech genres Tzanetakis & Cook (2002) Tzanetakis & Cook (2002) • Used real audio recordings • Correctly distinguished between 10 genres 61% of the time • Comparable to human rates • Collection of more features and inclusion of a larger number of specialized genres could improve performance Koshina (2002) • Constructed somewhat similar system (MUGRAT) • Achieved a success rate of 82% • Only attempted to distinguish between metal, dance and classical music • Excellent overview of background information Grimaldi et al. (2003) • Used discrete wavelet transform to extract time and frequency features • 64 time features and 79 frequency features • This is a more than Tzanetakis, but details not specified • Used an enemble of binary classifiers to perform classification operation • Each trained on a pair of genres Grimaldi et al. (2003) • Final classification is arrived at through a vote of the classifiers • Tzanetakis, in contrast, used single classifiers that processed all features for all genres • Success rate of 82% • Only four categories were used Aucouturier and Pachet (2003) • Define 3 categories of genre classification • Manual approach – Manual entry is unfeasible because of huge number of titles that need to be entered – Should use taxonomies based on artist rather than title because taxonomies based on title involve many more entries and result in categories that are overly narrow and have contrived boundaries Aucouturier and Pachet (2003) • Prescriptive approach – automatic process that involves a two-step process: • frame-based feature extraction • machine learning/classification – Tzanetakis’s and Cook’s system is prescriptive – Assumes an existing adequate taxonomy that is contrived and non-scalable – Difficult to find truly representative training samples Aucouturier and Pachet (2003) • Emergent approach – Rather than using existing taxonomies, like prescriptive approach, attempts to emerge classifications according to some measure of similarity – Can use similarity measurements based on audio signals – Can also use cultural similarity gleaned from application of data mining techniques to text documents • Use collaborative filtering to search for similarities in the taste profiles of different individuals • Use co-occurrence analysis on the play lists of different radio programs and CD compilation albums My Evaluation • Valid concerns regarding prescriptive systems • Emergent system as described has yet to be successfully applied to music • Remains to be seen which approach is best • Exploiting information such as text documents to generate genre profiles is interesting THE END