Classifying music audio with timbral and chroma features

Classifying music audio with timbral and chroma features COLUMBIA UNIVERSITY Dan Ellis • Columbia University • dpwe@ee.columbia.edu Laboratory for the Recognition and Organization of Speech and Audio Summary: In addition to conventional cepstra, we model the covariance of chroma (melodic/harmonic) features and gain a small improvement in a 20-way pop music artist identification. Results Music audio Mel-frequency spectral analysis (MFCCs) Instantaneousfrequency Chroma features Per-beat averaging Per-beat averaging Per-artist Gaussian (mix) models Key normalization Classification Multi-time-frame feature stacking Key Normalization aligned chroma Taxman Eleanor Rigby I'm Only Sleeping G F G F G F D D D C C C A A A A C D F G A C D F G A Love You To G F D D D C C C A A C D C D F G Good Day Sunshine A G F G F D D D C C C A C D F G C D true u2 tori_amos suzanne_vega steely_dan roxette radiohead queen prince metallica madonna led_zeppelin green_day garth_brooks fleetwood_mac depeche_mode dave_matthews_b cure creedence_c_r beatles aerosmith $Confusion: fused – MFCC u2 tori_amos suzanne_vega steely_dan roxette radiohead queen prince metallica madonna led_zeppelin green_day garth_brooks fleetwood_mac depeche_mode dave_matthews_b cure creedence_c_r beatles aerosmith 5 0 -5 recognized $count F G A A C D F G A C D F G aligned chroma Data - “artist20” dataset • 1413 tracks, from 120 albums (20 artists x 6 albums) [5] • Contemporary pop music drawn from uspop2002 [6] and others • Studio albums, chosen for chronological & stylistic consistency • MFCCs, Chroma features, etc. available for download at http://labrosa.ee.columbia.edu/projects/artistid/ References [1] J.-J. Aucouturier & F. Pachet, “Music similarity measures: What’s the use?” ISMIR, Paris, 2002. [2] M. I. Mandel &D. P. W. Ellis, “Song-level features and support vector machines for music classification,” ISMIR, London, 2005. [3] J. H. Jensen, M. G. Christensen, S. H. Jensen, “A framework for analysis of music similarity measures,” EUSIPCO, 2007. [4] M. A. Bartsch & G. H. Wakefield, “To catch a chorus: Using chroma-based representations for audio thumbnailing,” WASPAA, Mohonk, 2001. [5] D. P. W. Ellis, “Music artist identification: artist20 baseline system in Matlab,” web resource, 2007. http://labrosa.ee.columbia.edu/projects/artistid/ [6] D. P. W. Ellis, A. Berenzweig, B. Whitman, “The uspop2002 Pop Music data set,” web resource, 2003. http://labrosa.ee.columbia.edu/projects/musicsim/uspop2002.html MATLAB code to run this system is available at: http://labrosa.ee.columbia.edu/projects/timbrechroma/ for ISMIR'07 • 2007-09-15, revised/corrected 2007-09-25 dpwe@ee.columbia.edu 0 count Confusion: MFCC+Chroma (57.40%) And Your Bird Can Sing G F A 20 A A F G She Said She Said A F G 40 u2 to su st ro ra qu pr me le ma gr fl ga de da cu cr be ae G F A C D Yellow Submarine G F 60 u2 to su st ro ra qu pr me le ma gr fl ga de da cu cr be ae • Similar melodic/harmonic gestures occur ‘relative’ to the different keys of individual songs. • Key normalization attempts to transpose (rotate) the chroma features to a canonical key prior to modeling. • We do this by: - build a chroma covariance Aligned Global model matrix from all songs - transpose each song to maximize likelihood under global model - re-estimate global model from transposed songs and repeat. • Covariance of single beat-chroma vectors carries some information about artist, beyond that captured by MFCCs. This could relate to e.g. an artist’s preferred chords. • Used alone, chroma features do quite well on a few artists (tori_amos, metallica) but learn almost nothing about others (madonna, beatles). • Our best performance came from a simple weighted sum of likelihoods from separate MFCC and Chroma models. Differences were small, but gains were on different artists than those classified best by Chroma alone (beatles, led_zeppelin). Confusion: Chroma (28.73%) u2 tori_amos suzanne_vega steely_dan roxette radiohead queen prince metallica madonna led_zeppelin green_day garth_brooks fleetwood_mac depeche_mode dave_matthews_b cure creedence_c_r beatles aerosmith u2 to su st ro ra qu pr me le ma gr fl ga de da cu cr be ae The unusual part Exec time 127 s 563 s 21 s 57 s 337 s 1060 s 70 s 197 s 516 s 1238 s • MFCC-based classification is much more accurate (56%) than using Chroma features (33% at best), but combining the two does give further gains (to 59% correct; McNemar p < .001). • A single, full-covariance Gaussian is adequate to model the MFCC data, but Chroma data need a 64-mix GMM. • Key normalization (ChromaKN) is important for Chroma features. • Stacking Chroma features from up to 4 adjacent beat-times (T win = 4) helps a little. Discussion and Conclusions Beat tracking Acc 56% 56% 14% 20% 25% 29% 23% 28% 32% 33% 59% true System Block Diagram Feature Model T win MFCC20 FullCov 1 MFCC20 64 GMM 1 Chroma FullCov 1 Chroma FullCov 4 Chroma 64GMM 1 Chroma 64GMM 4 ChromaKN FullCov 1 ChromaKN FullCov 4 ChromaKN 64GMM 1 ChromaKN 64GMM 4 MFCC + Chroma fusion true Introduction • “Classic” approaches to music audio classification model the covariance of spectral features (e.g. MFCCs) as distributions [1] or discriminatively [2]. These appear to work by reflecting the instrumentation, which correlates well with genre or band [3]. • Chroma features [4] attempt to represent the pitch content (melody and harmony) while minimizing the influence of instrumentation. • We investigate using the covariance of chroma features as a basis for music classification by artist. Although weak on their own, chroma features can improve classification when combined with spectral features • This suggests that artists have particular harmonic combinations or motifs that can be automatically recognized.

Classifying music audio with timbral and chroma features

Related documents

Products

Support

Classifying music audio with timbral and chroma features

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib