20060907

Automatic Genre Classification of Music Content [A survey] Nicolas Scaringella, Giorgio Zoia, Daniel Mlynek, IEEE SIGNAL PROCESSING MAGAZINE MARCH 2006 By Yi-Tang Wang Outline • • • • • Introduction Feature extraction techniques Genre classification paradigms Classification results Future directions & Conclusion Introduction • EMD (electronic music distribution) – Restoration of analog archives – New content – music catalogues become huge • What do you want to listen ? – 1 million tracks online – Efficient ways to browse & organize Introduction (cont.) • Music Genres – Categories to characterize similarities – Boundaries are fuzzy • Automatic Classification – Finding a taxonomy – Hierarchical set of categories – Nontrivial task Critical issues • Artists, Albums, or Titles – One song to one genre(?) – Albums - heterogeneous material – Artists - several albums – Same Titles? • Nonagreement on Taxonomies – Allmusic, Amazon, Mp3 [2] F. Pachet and D. Cazaly, “A taxonomy of musical genres,” in Proc. ContentBased Multimedia Information Access (RIAO), Paris, France, 2000 Critical issues (cont.) • ILL-Defined Genre Labels – Varied criteria (geographically, timely, etc) – Dependant on cultural • Scalability of genre taxonomies – New genres appear frequently – Merging or splitting – Automatic system Feature extraction techniques • High-level model – Event-like format (MIDI) – Symbolic format (MusicXML) – Rarely availiable • Low-level – Audio samples – Low level and low density of info • Do feature extraction – Timbre, Melody, Harmony, Rhythm Timbre • Same pitch and loudness but sound different • Features to characterize timbre – Temporal features – Energy features – Spectral shape features – Perceptual features – Some have been normalized in MPEG-7 Timbre (cont.) Timbre (cont.) • Transformations – new feature or increase dimensionality – Suggested transforming into logarithmic decibel scale • Texture window – – – – – Larger window Reduce computation Increase classification accuracy 1s Variant size and positions Timbre (cont.) • Texture model – model of features over texture window: • 1) simple modeling with low-order statistics • 2) modeling with autoregressive model • 3) modeling with distribution estimation algorithms (for example, EM estimation of a GMM of frames) Melody & Harmony • Melody – succession of pitched events – Horizontal element • Harmony – pitch simultaneity, chords – Vertical element Melody & Harmony (cont.) • Pitch function – Characterizing pitch distribution – Amplitude, position of main peak, … – Unfolded • Contains pitch content and info of its range – Folded • Mapped to a single octave • Harmonic content Rhythm • No precise definition • Generically, all of the temporal aspects • Periodicity function – Low level approach as pitch function • 1) tempo: periodicities typically in the range 0.3–1,5s (i.e., 200–40 bpm) • 2) musical pattern: periodicities between 2 and 6 s (corresponding to the length of one or more measure bar) – Gouyon et al. get MFCCs-like descriptors Extracting from segments • Small segment may contain sufficient information • Reduced required computation • Typically 30s segment – and 30s after beginning • Artist classification – Voice is easier to identify than music only Local conclusion • High level descriptors from polyphonic audio signal is not yet state of the art • Focus on timbre modeling • Timbre may contain sufficient info – 250ms : 53% , 3s : 72% – Among 10 genres Local conclusion (cont.) • Another point of view (pessimistic) – Timbre similarity measure & 20,000 titles distributed over 18 genres – Little correlation – May not scalable – Take cultrual features into account Genre classification • Expert systems • Unsupervised approach – clustering • Supervised approach – Machine learning algorithms Expert systems • A knowledge based system made up of a set of rules • No model based on it so far • Expensive to implement and maintain • May yield unexpected interactions Expert systems (cont.) • Pachet and Cazaly’s work – State differences with language based, e.g. instrumentation Unsupervised approach • Clustering with similarity measures • Similarity measures – If time invariant • Euclidean distance or cosine distance – Otherwise • Build statistical model (Gaussian or GMMs) – Kullback-Leibler divergence, relative entropy – Sampling, Earth’s mover distance, asymptotic likelihood approximation • Shao et al. use HMMs Unsupervised approach • Clustering algorithms – K-means – Shao et al.’s work • agglomerative hierarchical clustering – SOM (self-organizing map) • Artificial neural network • High dim onto lower dim • GHSOM (growing hierarchical SOM) – Rauber et al. Supervised approach • A taxonomy of genres is given • VS. Expert System – No rules (or description to genre) • Supervised machine learning algo – – – – – – KNN (K-Nearest Neighbor) GMMs (Gaussian Mixture Models) HMM (Hidden Markov Models) LDA (Linear Discriminant Analysis) SVMs (Support Vector Machines) ANNs (Artificial Neural Networks) Classification results • MIREX genre classification contest – 1,005 / 510 songs over ten genres – 940 / 447 songs over six genres Classification results Future directions • Classification into perceptual categories – Moods, emotions • Novelty Detection – New or unknown data (not belong to any class) • Classification with multiple labels – Probably closer to human experience • From taxonomies to folksonomies – Does the taxonomy fit to users Conclusion • Definitions of music genres are convoluted • Features → classification → result • Research is evolving from purely objective machine calculations to techniques • Machine learning plays a fundamental role in classification domains Thank You

20060907

Related documents

Products

Support

20060907

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib