Beat-Synchronous Chroma Representations for Music Analysis Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. 2. 3. 4. http://labrosa.ee.columbia.edu/ Chroma Features Beat Tracking Matching Cover Songs Artist Identification Beat-Chroma Representations - Ellis 2007-05-23 - 1 /23 Beyond MFCCs... • MFCCs have been useful in Audio Music IR “timbral similarity” artist ID, segmentation, thumbnailing, singing ... • Separate tradition of Symbolic MIR melody matching, chord detection, meter analysis • It’s time to bring them together Let It Be / Beatles / verse 1 4 freq / kHz freq / kHz ... with robust audio mid-level representations ... that capture tonal (melodic-harmonic) content 3 2 1 0 Let It Be / Nick Cave / verse 1 4 3 2 1 2 4 6 8 10 time / sec 0 2 4 6 8 10 time / sec = beat-synchronous chroma features Beat-Chroma Representations - Ellis 2007-05-23 - 2 /23 1. Chroma Features • Chroma features map spectral energy into one canonical octave i.e. 12 semitone bins chroma freq / kHz IF chroma 3 2 G F D C 1 A 0 2 4 6 8 10 100 time / sec 200 300 400 500 • Can resynthesize as “Shepard Tones” 600 700 time / frames all octaves at once 0 -10 -20 -30 -40 -50 -60 12 Shepard tone spectra freq / kHz level / dB Piano scale Piano chromatic scale 4 Shepard tone resynth 4 3 2 1 0 500 1000 1500 2000 2500 freq / Hz Beat-Chroma Representations - Ellis 0 2 4 6 8 10 time / sec 2007-05-23 - 3 /23 Calculating Chroma Features G F D C A 4 chroma freq / kHz chroma 1: Map every STFT bin • Method blurs non-tonal energy 3 G F 2 D 1 C 0 A 2: Map only STFT peaks • Method still blurry at low frequencies 2 G F D C A 4 6 8 10 time / sec 4 50 chroma freq / kHz chroma 50 100 150 200 fft bin 3 150 200 250 300 time / frame G F 2 D 1 C 0 100 A 3: Instantaneous Frequency /t • Method escapes frequency resolution limit ( G F D C A 0 2000 4000 4 6 8 10 time / sec ) 4 3 2 1 0 2 4 6 8 Beat-Chroma Representations - Ellis 10 time / sec chroma 2 freq / kHz chroma 50 100 150 200 fft bin → 50 100 150 200 250 300 time / frame 50 100 150 200 250 300 time / frame G F D C A 2007-05-23 - 4 /23 2. Beat Tracking (1) • Goal: One feature vector per ‘beat’ (tatum) for tempo normalization, efficiency • “Onset Strength Envelope” freq / mel sumf(max(0, difft(log |X(t, f)|))) 40 30 20 10 0 0 5 10 time / sec 15 • Autocorr. + window → global tempo estimate 0 168.5100BPM 200 0 300 400 Beat-Chroma Representations - Ellis 500 600 700 800 900 1000 lag / 4 ms samples 2007-05-23 - 5 /23 Beat Tracking (2) • Dynamic Programming finds beat times {t } i optimizes i O(ti) + i W((ti+1 – ti – p)/) where O(t) is onset strength envelope (local score) W(t) is a log-Gaussian window (transition cost) p is the default beat period per measured tempo incrementally find best predecessor at every time backtrace from largest final score to get beats C*(t) O(t) τ t C*(t) = γ O(t) + (1–γ)max{W((τ – τp)/β)C*(τ)} τ P(t) = argmax{W((τ – τp)/β)C*(τ)} τ Beat-Chroma Representations - Ellis 2007-05-23 - 6 /23 Beat Tracking Results • DP will bridge gaps (non-causal) there is always a best path ... freq / Bark band Alanis Morissette - All I Want - gap + beats 40 30 20 10 • 2nd place in MIREX 2006 Beat Tracking 182 184 186 188 190 192 time / sec compared to McKinney & Moelants human data freq / Bark band Subject # test 2 (Bragg) - McKinney + Moelants Subject data 40 40 30 20 10 20 0 0 5 Beat-Chroma Representations - Ellis 10 time / s 2007-05-23 - 7 /23 15 Beat-Synchronous Chroma Features • Beat + chroma features / 30ms frames → average chroma within each beat compact; sufficient? 34,5-.-6,7 &# %# $# 89/,)-/)4,9:); "# # 0;48+2-1*9/ "$ "# ( ' & $ # ! "# )*+,-.-/,0 "! 0;48+2-1*9/ "$ "# ( ' & $ ! "# "! Beat-Chroma Representations - Ellis $# $! %# %! )*+,-.-1,2)/ 2007-05-23 - 8 /23 3. Cover Song Detection with Graham Poliner • “Cover Songs” = reinterpretation of a piece different instrumentation, character no match with “timbral” features Let It Be - Nick Cave Let It Be / Beatles / verse 1 4 freq / kHz freq / kHz Let It Be - The Beatles 3 2 1 0 Let It Be / Nick Cave / verse 1 4 3 2 1 0 • Need a different representation! 2 4 6 8 10 time / sec 2 4 6 8 10 time / sec beat-synchronous chroma features Beat-sync chroma features G F chroma chroma Beat-sync chroma features D C A G F D C A 5 10 15 20 25 beats Beat-Chroma Representations - Ellis 5 10 15 20 25 beats 2007-05-23 - 9 /23 Matching (1): Little Fragments • Cover versions may change song structure multiple local matches at different alignments • Match query and target as many small pieces? how big are the pieces? G E D C A 100 200 extract 300 400 beats 500 how do we combine individual scores? do we have all day? cross-correlate Candidate chroma bins chroma bins Query G E D C A 100 200 300 Beat-Chroma Representations - Ellis 400 500 beats 2007-05-23 - 10/23 Matching (2): Global Correlation • Cross-correlate entire beat-chroma matrices ... at all possible transpositions implicit combination of match quality and duration chroma bins Elliott Smith - Between the Bars G E D C A skew / semitones chroma bins 100 200 300 400 Glen Phillips - Between the Bars 500 beats @281 BPM G E D C A Cross-correlation +5 0 -5 -500 -400 -300 -200 -100 0 100 200 300 400 skew / beats • One good matching fragment is sufficient...? Beat-Chroma Representations - Ellis 2007-05-23 - 11/23 Filtered Cross-Correlation • Raw correlation not as important as precise local match skew / semitones looking for large contrast at ±1 beat skew i.e. high-pass filter Cross-correlation +5 0 -5 -500 -400 -300 -200 -100 0 100 200 Cross-correlation @ skew = +2 semitones 0.6 300 400 skew / beats 300 400 skew / beats raw 0.4 0.2 filtered 0 -500 -400 -300 -200 -100 Beat-Chroma Representations - Ellis 0 100 200 2007-05-23 - 12/23 Results (1): Ellis 23 set • 23 pairs of cover songs from uspop2002 +... one correct match per query Cover Songs - dpwe23 - 12/23 correct Take_Me_To_The_River/annie_lennox Let_It_Be/nick_cave I_Love_You/faith_hill I_Can_t_Get_No_Satisfaction/rolling_stones Hush/milli_vanilli Grand_Illusion/styx Gold_Dust_Woman/sheryl_crow God_Only_Knows/brian_wilson Query Faith/limp_bizkit Enjoy_The_Silence/tori_amos Day_Tripper/cheap_trick Come_Together/beatles Cocaine/nazareth Claudette/roy_orbison Cecilia/simon_and_garfunkel Caroline_No/brian_wilson Blue_Collar_Man/styx Between_The_Bars/glen_phillips Before_You_Accuse_Me/eric_clapton America/simon_and_garfunkel All_Along_The_Watchtower/dave_matthews_band Addicted_To_Love/tina_turner Abracadabra/sugar_ray Ab Ad Al Am Be Be Bl Ca Ce Cl Co Co Da En Fa Go Go Gr Hu I_ I_ Le Ta Test Beat-Chroma Representations - Ellis 2007-05-23 - 13/23 Results (2): MIREX 06 • Cover song contest • Found 761/3300 = 23% recall next best: 11% guess: 3% Beat-Chroma Representations - Ellis song-set (each row is one query song) 30 songs x 11 versions of each (!) (data has not been disclosed) # true covers in top 10 8 systems compared (4 cover song + 4 similarity) MIREX 06 Cover Song Results: # Covers retrieved per song per system 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 8 6 4 2 0 correct matches retrieved CS DE KL1 KL2 KWL KWT LR TP cover song systems similarity systems 2007-05-23 - 14/23 Where are the matches? • Look inside global cross-correlation to find matching fragments... xcorr = t f (C1(t, f)⋅C2(t, f)) - view along time chroma Let It Be / Beatles (beats 11-441) G F D C chroma A 50 100 150 200 250 Let It Be / Nick Cave (beats 13-443) 50 100 150 200 50 100 150 200 300 350 400 time / beats 250 300 350 400 time / beats 250 300 350 400 time / beats G F D C A 0.4 0.2 0 -0.2 0 Beat-Chroma Representations - Ellis 2007-05-23 - 15/23 What are the mistakes? • False reject - missed true match cover version is too different, beat tracking wrong ... • False alarm - invalid match “Cocaine” (Clapton) vs. “Satisfaction” (Stones) chroma Eric Clapton - Cocaine - beats 17:1027 G F D C A 100 200 300 400 500 600 700 800 900 1000 chroma Rolling Stones - Satisfaction - beats 1:1011 G F D C A 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 2 1 0 -1 -2 0 Beat-Chroma Representations - Ellis 2007-05-23 - 16/23 4. Artist Identification (AID) • Baseline system: “Bag of (timbral) frames” MFCC frames, model as Gaussian or GMM distance by likelihood or KL • Dataset: [Mandel et al. 2006] 18 artists x 5 or 6 albums each 18x3 albums for training, 18x2 for test, 10x1 dev u2 tina_turner roxette rolling_stones queen pink_floyd metallica madonna green_day genesis garth_brooks fleetwood_mac depeche_mode dave_matthews_band ence_clearwater_revival bryan_adams beatles aerosmith 15 10 true 5 0 -5 -10 ae be br crdade fl gage gr mamepi qu ro ro t track Beat-Chroma Representations - Ellis u2 ti ro ro qu pi me ma gr ge ga fl de da cr br be ae 20 15 10 5 aebe br cr dade fl gage grmamepi qu ro ro t recog 2007-05-23 - 17/23 0 Beat Chroma Features for AID? • Artists may use tonality in particular ways... density, variety particular chords (influence of instruments on chroma features) Northern Lad (1998) @ 1:35 (tatum=238 BPM) Cars and Guitars (2005) @ 1:05 (tatum=333 BPM) 12 12 10 10 8 8 6 6 4 4 2 2 20 40 60 80 20 40 60 80 • Try bag-of-frames on beat-chroma rep’n use several consecutive beats? key-normalization of each piece? Beat-Chroma Representations - Ellis 2007-05-23 - 18/23 Key Normalization • Could try matching at all possible rotations.. • .. or just transpose every piece initially Taxman aligned chroma single Gaussian model of one piece find ML rotation of other pieces model all transposed pieces iterate until convergence I'm Only Sleeping G F G F G F D D D C C C A A A A Love You To C D F G A Aligned Global model G F G F D D D C C C A A A C D She Said She Said C D F G Good Day Sunshine A G F G F D D D C C C A C D F G C D F G And Your Bird Can Sing G F A F G A A F G C D Yellow Submarine G F A Beat-Chroma Representations - Ellis Eleanor Rigby A A C D F G 2007-05-23 - 19/23 A C D F G aligned chroma fitting a tures to he likeclass is rotation ew sinrotated ated for changes atch the s likeliy larger, e Gaus- Timbre+Chroma AID • Preliminary Mandel18 Artist ID accuracy: Feature Model T win MFCC20 FullCov 1 MFCC20 64 GMM 1 Chroma FullCov 1 Chroma FullCov 4 Chroma 64GMM 1 Chroma 64GMM 4 ChromaKN FullCov 1 ChromaKN FullCov 4 ChromaKN 64GMM 1 ChromaKN 64GMM 4 MFCC + Chroma fusion Beat-Chroma - Ellis Table 1. Representations Artist identification figurations. Acc 48% 33% 15% 14% 24% 15% 17% 14% 25% 16% 52% Exec. time 212 s 1952 s 46 s 117 s 850 s 2242 s 110 s 258 s 2533 s 5803 s - 20/23 results for 2007-05-23 various con“T win” is the size of temporal context Artist Fragments • Idea: Find the most discriminant with Courtenay Cotton beat-chroma fragments per artist k-means cluster 16 beat fragments within piece ! !"#$% &$! "#$%&'! ()'*+$),! %(! '! -)'*.),! ,%/01! 23*#! *#)! /3/)! 4567)'*! keep fragments largest ratio ($'0&)/*,!)8*$'9*):!7;!9.+,*)$3/0!#30#.30#*):<! ! (avg. similarity to same artist)/(avg. sim. to others) ! classify test pieces by ID of best-scoring fragment '$%()*+,*-./0% ! '$&$%1232453% % Beat-Chroma Representations - Ellis 2',! *),*):! %/! '! ,+7,)*! 2007-05-23 =#)! &)*#%:! :),9$37):! #)$)! %(! *#)! - 21/23 +,>%>?@@?! :'*',)*! ABC<! ! D+)! *%! *3&)! 9%/,*$'3/*,! '/:! *#)! !"#$% 6$! P8'&>.)! 2!$+3! Artist Fragment Results • Preliminary, 5 way artist ID, ~32% correct • • • ! ! !"#$% /$! "*+',%&*+! <$)-&@! '*-! )71! K! $-)&%)%;! (*<9&+&+0! )1%)! $+3!/$#&3$)&*+!-1%,#)%5! ! ! Beat-Chroma Representations - Ellis <&%(#$%%&'&13;! $+3! )71&-! '-$0<1+)%! $-1! +*)! 1$%&#.! ,%13! )*! <&%(#$%%&'.! *)71-! $-)&%)%! 1&)71-5! ! 47&%! &%! $! 0**3! 1@$<:#1! *'! need to search more fragments way to choose phrase beginnings? a basis set for all tonal content? 2007-05-23 - 22/23 Conclusions and Future Work • Beat-synchronous chroma features are successful for matching cover songs captures melody-harmony, not instruments • Further uses: Beat-chroma fragments as musical building blocks e.g. VQ over large body of music find recurrent motifs artist identification? • Code available! Google “matlab chroma features” Beat-Chroma Representations - Ellis 2007-05-23 - 23/23