Automatic Genre Classification of Music Content [A survey] Nicolas Scaringella, Giorgio Zoia, Daniel Mlynek, IEEE SIGNAL PROCESSING MAGAZINE MARCH 2006 By Yi-Tang Wang Outline • • • • • Introduction Feature extraction techniques Genre classification paradigms Classification results Future directions & Conclusion Introduction • EMD (electronic music distribution) – Restoration of analog archives – New content – music catalogues become huge • What do you want to listen ? – 1 million tracks online – Efficient ways to browse & organize Introduction (cont.) • Music Genres – Categories to characterize similarities – Boundaries are fuzzy • Automatic Classification – Finding a taxonomy – Hierarchical set of categories – Nontrivial task Critical issues • Artists, Albums, or Titles – One song to one genre(?) – Albums - heterogeneous material – Artists - several albums – Same Titles? • Nonagreement on Taxonomies – Allmusic, Amazon, Mp3 [2] F. Pachet and D. Cazaly, “A taxonomy of musical genres,” in Proc. ContentBased Multimedia Information Access (RIAO), Paris, France, 2000 Critical issues (cont.) • ILL-Defined Genre Labels – Varied criteria (geographically, timely, etc) – Dependant on cultural • Scalability of genre taxonomies – New genres appear frequently – Merging or splitting – Automatic system Feature extraction techniques • High-level model – Event-like format (MIDI) – Symbolic format (MusicXML) – Rarely availiable • Low-level – Audio samples – Low level and low density of info • Do feature extraction – Timbre, Melody, Harmony, Rhythm Timbre • Same pitch and loudness but sound different • Features to characterize timbre – Temporal features – Energy features – Spectral shape features – Perceptual features – Some have been normalized in MPEG-7 Timbre (cont.) Timbre (cont.) • Transformations – new feature or increase dimensionality – Suggested transforming into logarithmic decibel scale • Texture window – – – – – Larger window Reduce computation Increase classification accuracy 1s Variant size and positions Timbre (cont.) • Texture model – model of features over texture window: • 1) simple modeling with low-order statistics • 2) modeling with autoregressive model • 3) modeling with distribution estimation algorithms (for example, EM estimation of a GMM of frames) Melody & Harmony • Melody – succession of pitched events – Horizontal element • Harmony – pitch simultaneity, chords – Vertical element Melody & Harmony (cont.) • Pitch function – Characterizing pitch distribution – Amplitude, position of main peak, … – Unfolded • Contains pitch content and info of its range – Folded • Mapped to a single octave • Harmonic content Rhythm • No precise definition • Generically, all of the temporal aspects • Periodicity function – Low level approach as pitch function • 1) tempo: periodicities typically in the range 0.3–1,5s (i.e., 200–40 bpm) • 2) musical pattern: periodicities between 2 and 6 s (corresponding to the length of one or more measure bar) – Gouyon et al. get MFCCs-like descriptors Extracting from segments • Small segment may contain sufficient information • Reduced required computation • Typically 30s segment – and 30s after beginning • Artist classification – Voice is easier to identify than music only Local conclusion • High level descriptors from polyphonic audio signal is not yet state of the art • Focus on timbre modeling • Timbre may contain sufficient info – 250ms : 53% , 3s : 72% – Among 10 genres Local conclusion (cont.) • Another point of view (pessimistic) – Timbre similarity measure & 20,000 titles distributed over 18 genres – Little correlation – May not scalable – Take cultrual features into account Genre classification • Expert systems • Unsupervised approach – clustering • Supervised approach – Machine learning algorithms Expert systems • A knowledge based system made up of a set of rules • No model based on it so far • Expensive to implement and maintain • May yield unexpected interactions Expert systems (cont.) • Pachet and Cazaly’s work – State differences with language based, e.g. instrumentation Unsupervised approach • Clustering with similarity measures • Similarity measures – If time invariant • Euclidean distance or cosine distance – Otherwise • Build statistical model (Gaussian or GMMs) – Kullback-Leibler divergence, relative entropy – Sampling, Earth’s mover distance, asymptotic likelihood approximation • Shao et al. use HMMs Unsupervised approach • Clustering algorithms – K-means – Shao et al.’s work • agglomerative hierarchical clustering – SOM (self-organizing map) • Artificial neural network • High dim onto lower dim • GHSOM (growing hierarchical SOM) – Rauber et al. Supervised approach • A taxonomy of genres is given • VS. Expert System – No rules (or description to genre) • Supervised machine learning algo – – – – – – KNN (K-Nearest Neighbor) GMMs (Gaussian Mixture Models) HMM (Hidden Markov Models) LDA (Linear Discriminant Analysis) SVMs (Support Vector Machines) ANNs (Artificial Neural Networks) Classification results • MIREX genre classification contest – 1,005 / 510 songs over ten genres – 940 / 447 songs over six genres Classification results Future directions • Classification into perceptual categories – Moods, emotions • Novelty Detection – New or unknown data (not belong to any class) • Classification with multiple labels – Probably closer to human experience • From taxonomies to folksonomies – Does the taxonomy fit to users Conclusion • Definitions of music genres are convoluted • Features → classification → result • Research is evolving from purely objective machine calculations to techniques • Machine learning plays a fundamental role in classification domains Thank You