ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 10: Beat Tracking 1. 2. 3. 4. Rhythm Perception Onset Extraction Beat Tracking Dynamic Programming Dan Ellis Dept. Electrical Engineering, Columbia University dpwe@ee.columbia.edu E4896 Music Signal Processing (Dan Ellis) http://www.ee.columbia.edu/~dpwe/e4896/ 2013-04-01 - 1 /19 1. Rhythm Perception • What is rhythm? aspects, origin? E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - 2 /19 Rhythm Perception Experiments McKinney & Moelants 2006 • Tapping experiment ambiguous; hierarchy 40 30 20 10 0 Frequency 7671 2416 761 240 0 5 mirex06/train10 Harnoncourt 10 E4896 Music Signal Processing (Dan Ellis) 15 20 25 time / sec 30 2013-04-01 - 3 /19 Rhythm Tracking Systems • Two main components front end: extract ‘events’ from audio back end: find plausible beat sequence to match Audio Onset Event detection Beat marking Beat times etc. Musical knowledge • Other outputs tempo time signature metrical level(s) E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - 4 /19 • 2. Onset detection Simplest thing is energy envelope Bello et al. 2005 W/2 2 e(n0 ) = n= W/2 w[n] |x(n + n0 )| freq / kHz Harnoncourt Maracatu 8 6 4 f 2 0 level / dB emphasis on high frequencies? |X(f, t)| 60 50 40 30 20 10 level / dB 0 f f · |X(f, t)| 60 50 40 30 20 10 0 E4896 Music Signal Processing (Dan Ellis) 0 2 4 6 8 10 time / sec 0 2 4 6 8 10 time / sec 2013-04-01 - 5 /19 Multiband Derivative Puckette et. al 1998 • Sometimes energy just “shifts” calculate & sum onset in multiple bands use ratio instead of difference - normalize energy freq / kHz o(t) = f 8 |X(f, t)| W (f ) max(0, |X(f, t 1)| 1) bonk~ 6 4 2 level / dB 0 60 50 40 30 20 10 0 0 2 4 6 E4896 Music Signal Processing (Dan Ellis) 8 10 time / sec 0 2 4 6 8 10 time / sec 2013-04-01 - 6 /19 Phase Deviation • When amplitudes don’t change much, Bello et al. 2005 phase discontinuity may signal new note 0.15 0.1 0.05 0 −0.05 −0.1 8.96 8.97 8.98 8.99 9 9.01 9.02 9.03 9.04 ^ X(f, tn+1) • Can detect by comparing actual phase with extrapolation from past X(f, tn ) X̂(f, tn+1 ) = X(f, tn ) X(f, tn 1 ) combine with amplitude? E4896 Music Signal Processing (Dan Ellis) 9.05 9.06 Im{X} prediction Complex deviation X(f, tn) ∆Θ ∆Θ X(f, tn-1) Re{X} actual X(f, tn+1) 2013-04-01 - 7 /19 3. Rhythm Tracking Desain & Honing 1999 • Earliest systems were rule based based on musicology Longuet-Higgins and Lee, 1982 inspired by linguistic grammars - Chomsky input: event sequence (MIDI) output: quarter notes, downbeats E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - 8 /19 Resonators • How to address: Scheirer 1998 build-up of rhythmic evidence “ghost events” (audio input) • Seems more like a comb filter... resonant filterbank of y(t) = y(t T ) + (1 )x(t) for all possible T E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - 9 /19 Multi-Hypothesis Systems Goto & Muraoka 1994 • Beat is ambiguous Goto 2001 Dixon 2001 → develop several alternatives Drumsound finder cy um n e r Compact disc requ pect F s f Manager t Musical audio signals A/D conversion Onsettime finders Higher-level checkers Onsettime vectorizers Frequency analysis Beat information Agents Beat prediction Beat information transmission inputs: music audio outputs: beat times, downbeats, BD/SD patterns... E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - 10/19 4. Dynamic Programming • Re-cast beat tracking as optimization: Ellis 2007 Find beat times {ti} to maximize N C({ti }) = N O(ti ) + i=1 F (ti ti 1, p) i=2 O(t) is onset strength function F ( t, ) is tempo consistency score e.g. 2 t F ( t, ) = log • Looks like an exponential search over all {t } i ... but Dynamic Programming saves us E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - 11/19 Dynamic Programming (DP) • DP is a general algorithm for optimizing “optimal substructure” problems i.e. where optimal total solution can be built from optimal partial solutions • e.g. best path through cost matrix T10 D(i,j) = d(i,j) + min D(i-1,j) T01 11 D(i-1,j) T Reference frames rj Lowest cost to (i,j) D(i-1,j) { D(i-1,j) + T10 D(i,j-1) + T01 D(i-1,j-1) + T11 } Local match cost Best predecessor (including transition cost) Input frames fi path after (i,j) is independent of how we got there E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - 12/19 Tempo Estimation • Algorithm needs global tempo period τ otherwise problem is not “optimal substructure” Onset Strength Envelope (part) 4 • 2 Pick peak in onset envelope autocorrelation after applying “human preference” window check for subbeat 0 -2 8 8.5 9 9.5 10 10.5 11 11.5 3 3.5 Raw Autocorrelation 12 time / s 400 200 0 Windowed Autocorrelation 200 100 0 -100 0 E4896 Music Signal Processing (Dan Ellis) 0.5 1 1.5 Secondary Tempo Period Primary Tempo Period 2 2.5 lag / s 4 2013-04-01 - 13/19 Beat Tracking by DP • To optimize N N C({ti }) = F (ti O(ti ) + i=1 ti 1, p) i=2 define C*(t) as best score up to time t then build up recursively (with traceback P(t)) O(t) C*(t) τ t C*(t) = O(t) + max{αF(t – τ, τp) + C*(τ)} τ P(t) = argmax{αF(t – τ, τp) + C*(τ)} τ final beat sequence {ti} is best C* + back-trace E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - 14/19 beatsimple • Beat tracking in 15 lines of Matlab function beats = beat_simple(onset, osr, tempo, alpha) % beats = beat_simple(onset, osr, tempo, alpha) % Core of the DP-based beat tracker % <onset> is the onset strength envelope at frame rate <osr> % <tempo> is the target tempo (in BPM) % <alpha> is weight applied to transition cost % <beats> returns the chosen beat sample times (in sec). % 2007–06–19 Dan Ellis dpwe@ee.columbia.edu if nargin < 4; alpha = 100; end % backlink(time) is best predecessor for this point % cumscore(time) is total cumulated score to this point localscore = onset; backlink = -ones(1,length(localscore)); cumscore = zeros(1,length(localscore)); % convert bpm to samples period = (60/tempo)*osr; % Search range for previous beat prange = round(-2*period):-round(period/2); % Log-gaussian window over that range txwt = (-alpha*abs((log(prange/-period)).^2)); for i = max(-prange + 1):length(localscore) timerange = i + prange; % Search over all possible predecessors % and apply transition weighting scorecands = txwt + cumscore(timerange); % Find best predecessor beat [vv,xx] = max(scorecands); % Add on local score cumscore(i) = vv + localscore(i); % Store backtrace backlink(i) = timerange(xx); end % Start backtrace from best cumulated score [vv,beats] = max(cumscore); % .. then find all its predecessors while backlink(beats(1)) > 0 beats = [backlink(beats(1)),beats]; end % convert to seconds beats = (beats-1)/osr; E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - 15/19 vary tempo estimate τ mean output BPM mean output BPM 200 150 100 400 300 200 150 50 40 100 = 400 = 100 30 = 400 = 100 70 Beat accuracy accuracy Beat accuracy 1 0.5 0 1 0.5 0 / of inter-beat-intervals 0.2 0.1 0 / of inter-beat-intervals / vary tradeoff weight α bonus6 (Jazz): BPM target vs. BPM out 70 accuracy against human tapping data bonus1 (Choral): BPM target vs. BPM out / • Verify Results 0.2 0.1 30 40 50 E4896 Music Signal Processing (Dan Ellis) 70 100 150 200 target BPM 0 70 100 150 200 300 400 target BPM 2013-04-01 - 16/19 Downbeat Detection • Downbeat = start of “bar” one level up in metrical hierarchy • Approaches Goto’94 BTS: Pop music SD/BD template Jehan’05: Trained classifier E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - 17/19 Summary • Rhythm perception Innate and strong, hierarchic • Beat tracking models Need to account for buildup & persistence • Dynamic Programming Neat way to maintain multiple hypotheses E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - 18/19 References • • • • • • • • • • J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, M. B. Sandler, “A Tutorial on Onset Detection in Music Signals,” IEEE Tr. Speech and Audio Proc., vol.13, no. 5, pp. 1035-1047, September 2005. P. Desain & H. Honing, “Computational models of beat induction: The rule-based approach,” J. New Music Research, vol. 28 no. 1, pp. 29-42, 1999. Simon Dixon, “Automatic extraction of tempo and beat from expressive performances,” J. New Music Research, vol. 30 no. 1, pp. 39-58, 2001. D. Ellis, “Beat Tracking by Dynamic Programming,” J. New Music Research, vol. 36 no. 1, pp. 51-60, March 2007. Tristan Jehan, “Creating Music By Listening,” Ph.D Thesis, MIT Media Lab, 2005. Masataka Goto & Yoichi Muraoka “A Beat Tracking System for Acoustic Signals of Music,” ACM Multimedia, pp.365-372, October 1994. Masataka Goto, “An Audio-based Real-time Beat Tracking System for Music With or Without Drumsounds,” J. New Music Research, vol. 30 no. 2, pp.159-171, June 2001. M. F. McKinney and D. Moelants, “Ambiguity in tempo perception: What draws listeners to different metrical levels?” Music Perception, vol. 24 no. 2 pp. 155-166, December 2006. M. Puckette, T. Apel, D. Zicarelli, “Real-time audio analysis tools for Pd and MSP,” Proc. Int. Comp. Music Conf., Ann Arbor, pp. 109–112, October 1998. Eric. D. Scheirer, “Tempo and beat analysis of acoustic musical signals,” J. Acoust. Soc. Am., vol. 103, pp. 588-601, 1998. E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - 19/19