Lecture 10: Beat Tracking 1. Rhythm Perception 2. Onset Extraction

advertisement
ELEN E4896 MUSIC SIGNAL PROCESSING
Lecture 10:
Beat Tracking
1.
2.
3.
4.
Rhythm Perception
Onset Extraction
Beat Tracking
Dynamic Programming
Dan Ellis
Dept. Electrical Engineering, Columbia University
dpwe@ee.columbia.edu
E4896 Music Signal Processing (Dan Ellis)
http://www.ee.columbia.edu/~dpwe/e4896/
2013-04-01 - 1 /19
1. Rhythm Perception
• What is rhythm?
aspects, origin?
E4896 Music Signal Processing (Dan Ellis)
2013-04-01 - 2 /19
Rhythm Perception Experiments
McKinney & Moelants 2006
• Tapping experiment
ambiguous; hierarchy
40
30
20
10
0
Frequency
7671
2416
761
240
0
5
mirex06/train10 Harnoncourt
10
E4896 Music Signal Processing (Dan Ellis)
15
20
25
time / sec
30
2013-04-01 - 3 /19
Rhythm Tracking Systems
• Two main components
front end: extract ‘events’ from audio
back end: find plausible beat sequence to match
Audio
Onset Event
detection
Beat
marking
Beat times etc.
Musical
knowledge
• Other outputs
tempo
time signature
metrical level(s)
E4896 Music Signal Processing (Dan Ellis)
2013-04-01 - 4 /19
•
2. Onset detection
Simplest thing is
energy envelope
Bello et al. 2005
W/2
2
e(n0 ) =
n= W/2
w[n] |x(n + n0 )|
freq / kHz
Harnoncourt
Maracatu
8
6
4
f
2
0
level / dB
emphasis on
high frequencies?
|X(f, t)|
60
50
40
30
20
10
level / dB
0
f
f · |X(f, t)|
60
50
40
30
20
10
0
E4896 Music Signal Processing (Dan Ellis)
0
2
4
6
8
10
time / sec
0
2
4
6
8
10
time / sec
2013-04-01 - 5 /19
Multiband Derivative Puckette et. al 1998
• Sometimes energy just “shifts”
calculate & sum onset in multiple bands
use ratio instead of difference - normalize energy
freq / kHz
o(t) =
f
8
|X(f, t)|
W (f ) max(0,
|X(f, t 1)|
1)
bonk~
6
4
2
level / dB
0
60
50
40
30
20
10
0
0
2
4
6
E4896 Music Signal Processing (Dan Ellis)
8
10
time / sec
0
2
4
6
8
10
time / sec
2013-04-01 - 6 /19
Phase Deviation
• When amplitudes don’t change much,
Bello et al. 2005
phase discontinuity may signal new note
0.15
0.1
0.05
0
−0.05
−0.1
8.96
8.97
8.98
8.99
9
9.01
9.02
9.03
9.04
^
X(f, tn+1)
• Can detect by comparing
actual phase with
extrapolation from past
X(f, tn )
X̂(f, tn+1 ) = X(f, tn )
X(f, tn 1 )
combine with amplitude?
E4896 Music Signal Processing (Dan Ellis)
9.05
9.06
Im{X}
prediction
Complex
deviation
X(f, tn)
∆Θ
∆Θ
X(f, tn-1)
Re{X}
actual
X(f, tn+1)
2013-04-01 - 7 /19
3. Rhythm Tracking Desain & Honing 1999
• Earliest systems were rule based
based on musicology Longuet-Higgins and Lee, 1982
inspired by linguistic grammars - Chomsky
input: event sequence (MIDI)
output: quarter notes, downbeats
E4896 Music Signal Processing (Dan Ellis)
2013-04-01 - 8 /19
Resonators
• How to address:
Scheirer 1998
build-up of rhythmic
evidence
“ghost events”
(audio input)
• Seems more like a
comb filter...
resonant filterbank of
y(t) = y(t
T ) + (1
)x(t)
for all possible T
E4896 Music Signal Processing (Dan Ellis)
2013-04-01 - 9 /19
Multi-Hypothesis Systems
Goto & Muraoka 1994
• Beat is ambiguous
Goto 2001
Dixon 2001
→ develop several alternatives
Drumsound
finder
cy um
n
e
r
Compact disc requ pect
F s
f
Manager
t
Musical
audio signals
A/D conversion
Onsettime
finders
Higher-level
checkers
Onsettime
vectorizers
Frequency
analysis
Beat information
Agents
Beat prediction
Beat information
transmission
inputs: music audio
outputs: beat times, downbeats, BD/SD patterns...
E4896 Music Signal Processing (Dan Ellis)
2013-04-01 - 10/19
4. Dynamic Programming
• Re-cast beat tracking as optimization:
Ellis 2007
Find beat times {ti} to maximize
N
C({ti }) =
N
O(ti ) +
i=1
F (ti
ti
1, p)
i=2
O(t) is onset strength function
F ( t, ) is tempo consistency score e.g.
2
t
F ( t, ) =
log
• Looks like an exponential search over all {t }
i
... but Dynamic Programming saves us
E4896 Music Signal Processing (Dan Ellis)
2013-04-01 - 11/19
Dynamic Programming (DP)
• DP is a general algorithm for optimizing
“optimal substructure” problems
i.e. where optimal total solution can be built from
optimal partial solutions
• e.g. best path through cost matrix
T10
D(i,j) = d(i,j) + min
D(i-1,j)
T01
11
D(i-1,j)
T
Reference frames rj
Lowest cost to (i,j)
D(i-1,j)
{
D(i-1,j) + T10
D(i,j-1) + T01
D(i-1,j-1) + T11
}
Local match cost
Best predecessor
(including transition cost)
Input frames fi
path after (i,j) is independent of how we got there
E4896 Music Signal Processing (Dan Ellis)
2013-04-01 - 12/19
Tempo Estimation
• Algorithm needs global tempo period τ
otherwise problem is not “optimal substructure”
Onset Strength Envelope (part)
4
•
2
Pick peak in
onset envelope
autocorrelation
after applying
“human
preference”
window
check for
subbeat
0
-2
8
8.5
9
9.5
10
10.5
11
11.5
3
3.5
Raw Autocorrelation
12
time / s
400
200
0
Windowed Autocorrelation
200
100
0
-100
0
E4896 Music Signal Processing (Dan Ellis)
0.5
1
1.5
Secondary Tempo Period
Primary Tempo Period
2
2.5
lag / s
4
2013-04-01 - 13/19
Beat Tracking by DP
• To optimize
N
N
C({ti }) =
F (ti
O(ti ) +
i=1
ti
1, p)
i=2
define C*(t) as best score up to time t
then build up recursively (with traceback P(t))
O(t)
C*(t)
τ
t
C*(t) = O(t) + max{αF(t – τ, τp) + C*(τ)}
τ
P(t) = argmax{αF(t – τ, τp) + C*(τ)}
τ
final beat sequence {ti} is best C* + back-trace
E4896 Music Signal Processing (Dan Ellis)
2013-04-01 - 14/19
beatsimple
• Beat tracking in 15 lines of Matlab
function beats = beat_simple(onset, osr, tempo,
alpha)
% beats = beat_simple(onset, osr, tempo, alpha)
%
Core of the DP-based beat tracker
%
<onset> is the onset strength envelope at
frame rate <osr>
%
<tempo> is the target tempo (in BPM)
%
<alpha> is weight applied to transition cost
%
<beats> returns the chosen beat sample times
(in sec).
% 2007–06–19 Dan Ellis dpwe@ee.columbia.edu
if nargin < 4; alpha = 100; end
% backlink(time) is best predecessor for this
point
% cumscore(time) is total cumulated score to this
point
localscore = onset;
backlink = -ones(1,length(localscore));
cumscore = zeros(1,length(localscore));
% convert bpm to samples
period = (60/tempo)*osr;
% Search range for previous beat
prange = round(-2*period):-round(period/2);
% Log-gaussian window over that range
txwt = (-alpha*abs((log(prange/-period)).^2));
for i = max(-prange + 1):length(localscore)
timerange = i + prange;
% Search over all possible predecessors
% and apply transition weighting
scorecands = txwt + cumscore(timerange);
% Find best predecessor beat
[vv,xx] = max(scorecands);
% Add on local score
cumscore(i) = vv + localscore(i);
% Store backtrace
backlink(i) = timerange(xx);
end
% Start backtrace from best cumulated score
[vv,beats] = max(cumscore);
% .. then find all its predecessors
while backlink(beats(1)) > 0
beats = [backlink(beats(1)),beats];
end
% convert to seconds
beats = (beats-1)/osr;
E4896 Music Signal Processing (Dan Ellis)
2013-04-01 - 15/19
vary
tempo
estimate
τ
mean output BPM
mean output BPM
200
150
100
400
300
200
150
50
40
100
= 400
= 100
30
= 400
= 100
70
Beat accuracy
accuracy
Beat accuracy
1
0.5
0
1
0.5
0
/ of inter-beat-intervals
0.2
0.1
0
/ of inter-beat-intervals
/
vary
tradeoff
weight
α
bonus6 (Jazz): BPM target vs. BPM out
70
accuracy
against
human
tapping
data
bonus1 (Choral): BPM target vs. BPM out
/
• Verify
Results
0.2
0.1
30
40 50
E4896 Music Signal Processing (Dan Ellis)
70
100
150 200
target BPM
0
70
100
150
200
300 400
target BPM
2013-04-01 - 16/19
Downbeat Detection
• Downbeat = start of “bar”
one level up in metrical hierarchy
• Approaches
Goto’94 BTS:
Pop music
SD/BD template
Jehan’05:
Trained classifier
E4896 Music Signal Processing (Dan Ellis)
2013-04-01 - 17/19
Summary
• Rhythm perception
Innate and strong, hierarchic
• Beat tracking models
Need to account for buildup & persistence
• Dynamic Programming
Neat way to maintain multiple hypotheses
E4896 Music Signal Processing (Dan Ellis)
2013-04-01 - 18/19
References
•
•
•
•
•
•
•
•
•
•
J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, M. B. Sandler, “A Tutorial on Onset Detection in
Music Signals,” IEEE Tr. Speech and Audio Proc., vol.13, no. 5, pp. 1035-1047, September 2005.
P. Desain & H. Honing, “Computational models of beat induction: The rule-based approach,” J. New
Music Research, vol. 28 no. 1, pp. 29-42, 1999.
Simon Dixon, “Automatic extraction of tempo and beat from expressive performances,” J. New Music
Research, vol. 30 no. 1, pp. 39-58, 2001.
D. Ellis, “Beat Tracking by Dynamic Programming,” J. New Music Research, vol. 36 no. 1, pp. 51-60, March
2007.
Tristan Jehan, “Creating Music By Listening,” Ph.D Thesis, MIT Media Lab, 2005.
Masataka Goto & Yoichi Muraoka “A Beat Tracking System for Acoustic Signals of Music,” ACM
Multimedia, pp.365-372, October 1994.
Masataka Goto, “An Audio-based Real-time Beat Tracking System for Music With or Without Drumsounds,” J. New Music Research, vol. 30 no. 2, pp.159-171, June 2001.
M. F. McKinney and D. Moelants, “Ambiguity in tempo perception: What draws listeners to different
metrical levels?” Music Perception, vol. 24 no. 2 pp. 155-166, December 2006.
M. Puckette, T. Apel, D. Zicarelli, “Real-time audio analysis tools for Pd and MSP,” Proc. Int. Comp. Music
Conf., Ann Arbor, pp. 109–112, October 1998.
Eric. D. Scheirer, “Tempo and beat analysis of acoustic musical signals,” J. Acoust. Soc. Am., vol. 103, pp.
588-601, 1998.
E4896 Music Signal Processing (Dan Ellis)
2013-04-01 - 19/19
Download