Lab Cover Song ID with Beat-Synchronous Chroma Features ROSA Dan Ellis • Columbia University • dpwe@ee.columbia.edu Summary: Beat-synchronous chroma features capture melodic-harmonic content of music audio, and successfully detect cover versions of songs. We describe our system, including beat tracking by dynamic-programming. MIREX-06 Cover Song Identification: Evaluation Results & Analysis Tempo Estimation by Onset Autocorrelation • Cover song system requires beat tracking (see bottom right panel) - our beat tracker first estimates tempo - did ~OK in MIREX06 tempo eval. freq / Mel mirex06train/train2.wav (Bragg) • Music Audio is converted to an Onset Strength envelope by: - Mel-freq. log-mag. spectrogram - Half-wave rectify (negative values → zero) - Sum across all channels - Highpass at ~ 0.5 Hz - Smooth at ~ 20 Hz • Task: Identify Cover Songs (alternate versions of the same musical piece) within a database of music audio recordings 40 20 0 • Data: 30 songs with 11 different versions of each (330 songs), from mixed genres (“classical, jazz, gospel, rock, folk-rock, etc.”) half-wave rectified log Mel magnitude envelopes • Metric: How many of the 10 other versions of each query song are among the 10 most similar found by each algorithm? (3300 max. total) (also used Mean Reciprocal Rank, not reported here) onset strength envelope (hwr envelopes summed, smoothed, high-pass) 0 10 10.5 11 11.5 12 12.5 13 13.5 14 time / sec global autocorrelation, weighted @ 120 bpm with 1 octave log-t gaussian • Autocorrelate out to lag of ~ 4 s 0 • Weight autocorrelation by Gaussian on log-time axis (centered on 120 bpm for human tapping tempo, or 240 bpm for chroma features) 0 0.5 1 1.5 2 2.5 MIREX 06 Cover Song Results: # Covers retrieved per song per system 3 song-set (each row is one query song) C UNIVERSITY OLUMBIA Laboratory for the Recognition and Organization of Speech and Audio 3.5 lag / s 4 • Pick highest peak as main period & 2'dry peak from [0.33, 0.5, 2, 3] × main • Results: 4 cover song systems and 4 music similarity systems tested - Our system (DE) is best with 761 covers found (23% recall) - Music similarity systems score a little above random (≈ 100 covers) → Cover song task is very different from current similarity approaches • Analysis: Many song-sets difficult for all systems (e.g. 5, 11, 18, 21, 30) - most hits from a few song sets (2, 14, 19 for DE; 2, 17, 26 for KL) - different systems have different strengths Beat Tracking by Dynamic Programming 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 8 6 4 2 0 correct matches retrieved CS DE KL1 KL2 KWL KWT LR TP cover song systems similarity systems • Beat tracker takes global tempo period tB (from above) as input • Beat tracking score is calculated recursively for every time sample as: SB(t) = (1 – α)O(t) + α max W(t – τp) SB(τp) 30 from above (local match) W(t) is a log-time Gaussian window between tB/2 and 2tB centered on tB (width determines tempo tolerance) best predecessor τp is recorded for each time t largest SB near end of track is traced back for entire beat time sequence 20 10 - smoothed onset envelope + picked beats 10 5 0 McKinney & Moelants smoothed ground truth 4 - 2 0 0 - 5 10 time / s 15 • Beat tracker came 2nd of 5 systems compared to human ground truth in MIREX06 beat tracking evaluation for MIREX@ISMIR'06 • 2006-10-11 dpwe@ee.columbia.edu • Cover versions may differ in tempo, instrumentation, style - devise features to normalize these variations • Chroma features map whole spectrum to one octave - preserve essence of melody/harmony without detail - high-resolution instantaneous frequency finds peaks • Beat-synchronous features factor out tempo variations - ... provided beat tracker finds matching metrical levels - average chroma bin energy over whole beat • Cross-correlation of entire beats × semitones matrix has sharp maxima when large subsections match - repeat at all 12 chroma rotations - high-pass filter to emhasize peaks chroma bins 40 F E D B A chroma bins O(t) is onset strength envelope freq / Mel - Fleetwood Mac - Gold Dust Woman - Beat Sync Chroma Ftrs mirex06train/train2.wav (Bragg) 100 200 300 400 500 600 700 800 900 1000 1100 +6 chroma shift τp 0 -5 Cross-correlation at chroma shift +2, high-pass @ w = 0.1 rad/s 0 -600 -400 -200 0 200 400 600 relative timing / beats Download Matlab code from: http://labrosa.ee.columbia.edu/projects/coversongs/