Mining Large-Scale Music Data Sets Dan Ellis & Thierry Bertin-Mahieux Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu 1. 2. 3. 4. http://labrosa.ee.columbia.edu/ Music Audio Representations The Million Song Dataset Finding Similar Items (Cover Songs) Current Work LabROSA Overview - Dan Ellis 2012-02-09 1 /15 The Problem Query track One Million Songs 4000 Frequency 3000 2000 1000 0 0 1 2 3 4 5 Time 6 7 8 9 10 “Similar” tracks 4000 Frequency 3000 2000 4000 1000 4000 0 3000 1 2 3 4 5 Time 6 7 8 9 10 Frequency 0 Frequency 3000 2000 2000 1000 1000 0 0 LabROSA Overview - Dan Ellis 0 1 2 3 4 5 Time 6 7 8 9 0 1 2 3 4 5 Time 6 7 8 9 10 10 2012-02-09 2 /15 1. Music Audio Representations • We need a description of the audio that contains “suitable” detail • Speech Recognition uses Mel-Frequency Cepstral Coefficients (MFCCs) Let It Be (LIB-1) - log-freq specgram freq / Hz 5915 1396 329 Coefficient MFCCs 12 10 8 6 4 2 Noise excited MFCC resynthesis (LIB-2) freq / Hz 5915 1396 329 0 LabROSA Overview - Dan Ellis 5 10 15 20 25 time / sec 2012-02-09 3 /15 Warren et al. 2003 Chroma Features • We’d like to preserve the notes at least within one octave Let It Be - log-freq specgram (LIB-1) freq / Hz 6000 1400 chroma bin 300 Chroma features B A G E D C Shepard tone resynthesis of chroma (LIB-3) freq / Hz 6000 1400 300 MFCC-filtered shepard tones (LIB-4) freq / Hz 6000 1400 300 0 LabROSA Overview - Dan Ellis 5 10 15 20 25 time / sec 2012-02-09 4 /15 Beat-Synchronous Chroma • Perform Beat Tracking .. and record one chroma vector per beat compact representation of harmonies Let It Be - log-freq specgram (LIB-1) freq / Hz 6000 1400 300 chroma bin Onset envelope + beat times Beat-synchronous chroma B A G E D C Beat-synchronous chroma + Shepard resynthesis (LIB-6) freq / Hz 6000 1400 300 0 LabROSA Overview - Dan Ellis 5 10 15 20 25 time / sec 2012-02-09 5 /15 2. Million Song Dataset (MSD) Thierry Bertin-Mahieux • Commercial-scale dataset available to MIR researchers 1M pop songs 250 GB of features (6 years of listening) • Many facets: Features, Lyrics, Tags, Covers, Listeners ... http://labrosa.ee.columbia.edu/millionsong LabROSA Overview - Dan Ellis 2012-02-09 6 /15 MSD Audio Features • Use Echo Nest “Analyze” features segment audio into variable-length “events” supports a crude resynthesis: freq / Hz 2416 Origina 761 240 B A G EN Featur E D C freq / Hz represent by 12 chroma + 12 “timbre” TRKUYPW128F92E1FC0 - Tori Amos - Smells Like Teen Spirit 2416 Resyn 761 240 0 LabROSA Overview - Dan Ellis 2 4 6 8 10 12 time / sec 14 2012-02-09 7 /15 MSD Metadata EN Metadata Last.fm Tags artist: release: title: id: key: mode: loudness: tempo: time_signature: duration: sample_rate: audio_md5: 7digitalid: familiarity: year: 100.0 – cover 57.0 – covers 43.0 – female vocalists 42.0 – piano 34.0 – alternative 14.0 – singer-songwriter 11.0 – acoustic 8.0 – tori amos 7.0 – beautiful 6.0 – rock 6.0 – pop 6.0 – Nirvana 6.0 – female vocalist 6.0 – 90s 5.0 – out of genre covers 'Tori Amos' 'LIVE AT MONTREUX' 'Smells Like Teen Spirit' 'TRKUYPW128F92E1FC0' 5 0 -16.6780 87.2330 4 216.4502 22050 '8' 5764727 0.8500 1992 SHS Covers %5489,4468, Smells TRTUOVJ128E078EE10 TRFZJOZ128F4263BE3 TRJHCKN12903CDD274 TRELTOJ128F42748B7 TRJKBXL128F92F994D TRIHLAW128F429BBF8 TRKUYPW128F92E1FC0 5.0 4.0 4.0 4.0 4.0 3.0 3.0 3.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 – – – – – – – – – – – – – – – cover songs soft rock nirvana cover Mellow alternative rock chick rock Ballad Awesome Covers melancholic k00l ch1x indie female vocalistist female cover song american MxM Lyric Bag-of-Words Like Teen Spirit Nirvana Weird Al Yankovic Pleasure Beach The Flying Pickets Rhythms Del Mundo feat. Shanade The Bad Plus Tori Amos LabROSA Overview - Dan Ellis 12 11 10 9 7 6 6 6 hello i a and it are we now 6 6 6 4 4 4 3 3 here us entertain the feel yeah to my 3 3 3 3 3 3 3 3 is with oh out an light less danger 2012-02-09 8 /15 3. Finding Similar Items Avery Wang ’03 • If it’s really the same, we can use freq / Hz fingerprinting (e.g. Shazam): Query audio 4000 3500 3000 2500 find local “landmarks” quantize by pairs inverted index 2000 1500 1000 500 0 Match: 05−Full Circle at 0.032 sec 4000 3500 3000 2500 2000 1500 1000 500 0 LabROSA Overview - Dan Ellis 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2012-02-09 time / sec 9 /15 Dynamic Programming Alignment Serra, Gomez et al. ’08 • We can match the chroma representations by DP alignment robust to timing changes quite efficient best performer in MIREX Cover Song evaluations LabROSA Overview - Dan Ellis 2012-02-09 10/15 Large-Scale Cover Matching Bertin-Mahieux & Ellis ’11 • How can we find covers in 1M songs? @ 1 sec / comparison, one search = 11.5 CPU-days full N2 mining = 16,000 CPU-years • Hashing? landmarks from chroma patches (like Shazam) represent item by distribution of contour fragments one stage in a filtering hierarchy... LabROSA Overview - Dan Ellis 2012-02-09 11/15 Euclidean Cover Matching Ellis & Poliner ’08 Bertin-Mahieux & Ellis ’12 • Reduce cover matching to nearest-neighbor search in fixed-size beat-chroma patches • Right “distance”? 10 8 principal components learned weightings 6 4 2 120 Alignment/ segmentation? music segmentation multiple probes framing-insensitive representation LabROSA Overview - Dan Ellis 130 140 150 160 170 Between the Bars − Glenn Phillips − pwr .25 − −17 beats − trsp 2 12 10 8 6 4 2 120 chroma bin • Between the Bars − Elliot Smith − pwr .25 12 130 140 150 160 170 160 170 pointwise product (sum(12x300) = 19.34) 12 10 8 6 4 2 120 130 140 150 time / beats 2012-02-09 12/15 4. Other Work: Pattern Mining • Cluster beat-synchronous chroma patches #32273 - 13 instances chroma bins G F D C A G F D C A G F D C A G F D C A 5 LabROSA Overview - Dan Ellis a20-top10x5-cp4-4p0 #51917 - 13 instances #65512 - 10 instances #9667 - 9 instances #10929 - 7 instances #61202 - 6 instances #55881 - 5 instances #68445 - 5 instances 10 15 20 5 10 15 time / beats 2012-02-09 13/15 Other Work: MSD Challenge with Brian McFee • Using “Taste Profile” data User-Song-playcount b80344d063b5ccb3212f76538f3d9e43d87dca9e b80344d063b5ccb3212f76538f3d9e43d87dca9e b80344d063b5ccb3212f76538f3d9e43d87dca9e b80344d063b5ccb3212f76538f3d9e43d87dca9e b80344d063b5ccb3212f76538f3d9e43d87dca9e b80344d063b5ccb3212f76538f3d9e43d87dca9e SOAKIMP12A8C130995 SOAPDEY12A81C210A9 SOBBMDR12A8C13253B SOCNMUH12A6D4F6E6D SODACBL12A8C13C273 SODDNQT12A6D4F5F7E 1 1 2 1 1 5 • Challenge: Rank 1M songs to complete half a playlist using audio, metadata, lyrics, whatever • Kaggle.com based contest self-service evaluation, leader board • Results at MIREX, AdMIRe LabROSA Overview - Dan Ellis 2012-02-09 14/15 Summary • Finding Musical Similarity at Large Scale • Beat Chroma Representation • Million Song Dataset • MSD Challenge LabROSA Overview - Dan Ellis 2012-02-09 15/15