Cover Song ID with Beat-Synchronous Chroma Features

advertisement
Lab
Cover Song ID with Beat-Synchronous Chroma Features
ROSA
Dan Ellis • Columbia University • dpwe@ee.columbia.edu
Summary: Beat-synchronous chroma features capture melodic-harmonic content of music audio, and successfully
detect cover versions of songs. We describe our system, including beat tracking by dynamic-programming.
MIREX-06 Cover Song Identification:
Evaluation Results & Analysis
Tempo Estimation by Onset Autocorrelation
• Cover song system requires beat tracking (see bottom right panel)
- our beat tracker first estimates tempo
- did ~OK in MIREX06 tempo eval.
freq / Mel
mirex06train/train2.wav (Bragg)
• Music Audio is converted to an
Onset Strength envelope by:
- Mel-freq. log-mag. spectrogram
- Half-wave rectify (negative
values → zero)
- Sum across all channels
- Highpass at ~ 0.5 Hz
- Smooth at ~ 20 Hz
• Task: Identify Cover Songs (alternate versions of the same musical piece)
within a database of music audio recordings
40
20
0
• Data: 30 songs with 11 different versions of each (330 songs),
from mixed genres (“classical, jazz, gospel, rock, folk-rock, etc.”)
half-wave rectified log Mel magnitude envelopes
• Metric: How many of the 10 other versions of each query song are among
the 10 most similar found by each algorithm? (3300 max. total)
(also used Mean Reciprocal Rank, not reported here)
onset strength envelope (hwr envelopes summed, smoothed, high-pass)
0
10
10.5
11
11.5
12
12.5
13
13.5
14
time / sec
global autocorrelation, weighted @ 120 bpm with 1 octave log-t gaussian
• Autocorrelate out to lag of ~ 4 s
0
• Weight autocorrelation by Gaussian
on log-time axis (centered on 120
bpm for human tapping tempo, or 240 bpm for chroma features)
0
0.5
1
1.5
2
2.5
MIREX 06 Cover Song Results:
# Covers retrieved per song per system
3
song-set (each row is one query song)
C
UNIVERSITY
OLUMBIA
Laboratory for the Recognition and
Organization of Speech and Audio
3.5
lag / s
4
• Pick highest peak as main period & 2'dry peak from [0.33, 0.5, 2, 3] × main
• Results: 4 cover song systems and 4 music similarity systems tested
- Our system (DE) is best with 761 covers found (23% recall)
- Music similarity systems score a little above random (≈ 100 covers)
→ Cover song task is very different from current similarity approaches
• Analysis: Many song-sets difficult for all systems (e.g. 5, 11, 18, 21, 30)
- most hits from a few song sets (2, 14, 19 for DE; 2, 17, 26 for KL)
- different systems have different strengths
Beat Tracking by Dynamic Programming
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
8
6
4
2
0
correct
matches
retrieved
CS DE KL1 KL2 KWL KWT LR TP
cover song systems
similarity systems
• Beat tracker takes global tempo period tB (from above) as input
• Beat tracking score is calculated recursively for every time sample as:
SB(t) = (1 – α)O(t) + α max W(t – τp) SB(τp)
30
from above (local match)
W(t) is a log-time Gaussian
window between tB/2 and 2tB
centered on tB (width
determines tempo tolerance)
best predecessor τp is
recorded for each time t
largest SB near end of track is traced back for entire beat time sequence
20
10
-
smoothed onset envelope + picked beats
10
5
0
McKinney & Moelants smoothed ground truth
4
-
2
0
0
-
5
10
time / s 15
• Beat tracker came 2nd of 5 systems compared to human ground truth in
MIREX06 beat tracking evaluation
for MIREX@ISMIR'06 • 2006-10-11 dpwe@ee.columbia.edu
• Cover versions may differ in tempo, instrumentation, style
- devise features to normalize these variations
• Chroma features map whole spectrum to one octave
- preserve essence of melody/harmony without detail
- high-resolution instantaneous frequency finds peaks
• Beat-synchronous features factor out tempo variations
- ... provided beat tracker finds matching metrical levels
- average chroma bin energy over whole beat
• Cross-correlation of entire beats × semitones matrix
has sharp maxima when large subsections match
- repeat at all 12 chroma rotations
- high-pass filter to emhasize peaks
chroma bins
40
F
E
D
B
A
chroma bins
O(t) is onset strength envelope
freq / Mel
-
Fleetwood Mac - Gold Dust Woman - Beat Sync Chroma Ftrs
mirex06train/train2.wav (Bragg)
100
200
300
400
500
600
700
800
900
1000
1100
+6
chroma shift
τp
0
-5
Cross-correlation at chroma shift +2, high-pass @ w = 0.1 rad/s
0
-600
-400
-200
0
200
400
600
relative timing / beats
Download Matlab code from: http://labrosa.ee.columbia.edu/projects/coversongs/
Download