Beat-Synchronous Chroma Representations for Music Analysis 1.

advertisement
Beat-Synchronous
Chroma Representations
for Music Analysis
Dan Ellis
Laboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Eng., Columbia Univ., NY USA
dpwe@ee.columbia.edu
1.
2.
3.
4.
http://labrosa.ee.columbia.edu/
Chroma Features
Beat Tracking
Matching Cover Songs
Artist Identification
Beat-Chroma Representations - Ellis
2007-05-23 - 1 /23
Beyond MFCCs...
• MFCCs have been useful in Audio Music IR
“timbral similarity”
artist ID, segmentation, thumbnailing, singing ...
• Separate tradition of Symbolic MIR
melody matching, chord detection, meter analysis
• It’s time to bring them together
Let It Be / Beatles / verse 1
4
freq / kHz
freq / kHz
... with robust audio mid-level representations
... that capture tonal (melodic-harmonic) content
3
2
1
0
Let It Be / Nick Cave / verse 1
4
3
2
1
2
4
6
8
10 time / sec
0
2
4
6
8
10
time / sec
= beat-synchronous chroma features
Beat-Chroma Representations - Ellis
2007-05-23 - 2 /23
1. Chroma Features
• Chroma features map spectral energy
into one canonical octave
i.e. 12 semitone bins
chroma
freq / kHz
IF chroma
3
2
G
F
D
C
1
A
0
2
4
6
8
10
100
time / sec
200
300
400
500
• Can resynthesize as “Shepard Tones”
600
700
time / frames
all octaves at once
0
-10
-20
-30
-40
-50
-60
12 Shepard tone spectra
freq / kHz
level / dB
Piano
scale
Piano chromatic scale
4
Shepard tone resynth
4
3
2
1
0
500
1000
1500
2000
2500 freq / Hz
Beat-Chroma Representations - Ellis
0
2
4
6
8
10 time / sec
2007-05-23 - 3 /23
Calculating Chroma Features
G
F
D
C
A
4
chroma
freq / kHz
chroma
1: Map every STFT bin
• Method
blurs non-tonal energy
3
G
F
2
D
1
C
0
A
2: Map only STFT peaks
• Method
still blurry at low frequencies
2
G
F
D
C
A
4
6
8
10 time / sec
4
50
chroma
freq / kHz
chroma
50 100 150 200 fft bin
3
150
200
250
300
time / frame
G
F
2
D
1
C
0
100
A
3: Instantaneous Frequency /t
• Method
escapes frequency resolution limit
(
G
F
D
C
A
0
2000
4000
4
6
8
10 time / sec
)
4
3
2
1
0
2
4
6
8
Beat-Chroma Representations - Ellis
10 time / sec
chroma
2
freq / kHz
chroma
50 100 150 200 fft bin
→
50
100
150
200
250
300
time / frame
50
100
150
200
250
300
time / frame
G
F
D
C
A
2007-05-23 - 4 /23
2. Beat Tracking (1)
• Goal: One feature vector per ‘beat’ (tatum)
for tempo normalization, efficiency
• “Onset Strength Envelope”
freq / mel
sumf(max(0, difft(log |X(t, f)|)))
40
30
20
10
0
0
5
10
time / sec 15
• Autocorr. + window → global tempo estimate
0
168.5100BPM 200
0
300
400
Beat-Chroma Representations - Ellis
500
600
700
800
900
1000
lag / 4 ms samples
2007-05-23 - 5 /23
Beat Tracking (2)
• Dynamic Programming finds beat times {t }
i
optimizes i O(ti) +  i W((ti+1 – ti – p)/)
where O(t) is onset strength envelope (local score)
W(t) is a log-Gaussian window (transition cost)
p is the default beat period per measured tempo
incrementally find best predecessor at every time
backtrace from largest final score to get beats
C*(t)
O(t)
τ
t
C*(t) = γ O(t) + (1–γ)max{W((τ – τp)/β)C*(τ)}
τ
P(t) = argmax{W((τ – τp)/β)C*(τ)}
τ
Beat-Chroma Representations - Ellis
2007-05-23 - 6 /23
Beat Tracking Results
• DP will bridge gaps (non-causal)
there is always a best path ...
freq / Bark band
Alanis Morissette - All I Want - gap + beats
40
30
20
10
• 2nd place in MIREX 2006 Beat Tracking
182
184
186
188
190
192
time / sec
compared to McKinney & Moelants human data
freq / Bark band
Subject #
test 2 (Bragg) - McKinney + Moelants Subject data
40
40
30
20
10
20
0
0
5
Beat-Chroma Representations - Ellis
10
time / s
2007-05-23 - 7 /23
15
Beat-Synchronous Chroma Features
• Beat + chroma features / 30ms frames
→ average chroma within each beat
compact; sufficient?
34,5-.-6,7
&#
%#
$#
89/,)-/)4,9:);
"#
#
0;48+2-1*9/
"$
"#
(
'
&
$
#
!
"#
)*+,-.-/,0 "!
0;48+2-1*9/
"$
"#
(
'
&
$
!
"#
"!
Beat-Chroma Representations - Ellis
$#
$!
%#
%!
)*+,-.-1,2)/
2007-05-23 - 8 /23
3. Cover Song Detection
with Graham Poliner
• “Cover Songs” = reinterpretation of a piece
different instrumentation, character
no match with “timbral” features
Let It Be - Nick Cave
Let It Be / Beatles / verse 1
4
freq / kHz
freq / kHz
Let It Be - The Beatles
3
2
1
0
Let It Be / Nick Cave / verse 1
4
3
2
1
0
• Need a different representation!
2
4
6
8
10 time / sec
2
4
6
8
10
time / sec
beat-synchronous chroma features
Beat-sync chroma features
G
F
chroma
chroma
Beat-sync chroma features
D
C
A
G
F
D
C
A
5
10
15
20
25
beats
Beat-Chroma Representations - Ellis
5
10
15
20
25
beats
2007-05-23 - 9 /23
Matching (1): Little Fragments
• Cover versions may change song structure
multiple local matches at different alignments
• Match query and target as many small pieces?
how big are the
pieces?
G
E
D
C
A
100
200
extract
300
400
beats
500
how do we
combine individual
scores?
do we have all day?
cross-correlate
Candidate
chroma bins
chroma bins
Query
G
E
D
C
A
100
200
300
Beat-Chroma Representations - Ellis
400
500
beats
2007-05-23 - 10/23
Matching (2): Global Correlation
• Cross-correlate entire beat-chroma matrices
... at all possible transpositions
implicit combination of match quality and duration
chroma bins
Elliott Smith - Between the Bars
G
E
D
C
A
skew / semitones
chroma bins
100
200
300
400
Glen Phillips - Between the Bars
500
beats @281 BPM
G
E
D
C
A
Cross-correlation
+5
0
-5
-500
-400
-300
-200
-100
0
100
200
300
400 skew / beats
• One good matching fragment is sufficient...?
Beat-Chroma Representations - Ellis
2007-05-23 - 11/23
Filtered Cross-Correlation
• Raw correlation not as important as precise
local match
skew / semitones
looking for large contrast at ±1 beat skew
i.e. high-pass filter
Cross-correlation
+5
0
-5
-500
-400
-300
-200
-100
0
100
200
Cross-correlation @ skew = +2 semitones
0.6
300
400 skew / beats
300
400 skew / beats
raw
0.4
0.2
filtered
0
-500
-400
-300
-200
-100
Beat-Chroma Representations - Ellis
0
100
200
2007-05-23 - 12/23
Results (1): Ellis 23 set
• 23 pairs of cover songs from uspop2002 +...
one correct match per query
Cover Songs - dpwe23 - 12/23 correct
Take_Me_To_The_River/annie_lennox
Let_It_Be/nick_cave
I_Love_You/faith_hill
I_Can_t_Get_No_Satisfaction/rolling_stones
Hush/milli_vanilli
Grand_Illusion/styx
Gold_Dust_Woman/sheryl_crow
God_Only_Knows/brian_wilson
Query
Faith/limp_bizkit
Enjoy_The_Silence/tori_amos
Day_Tripper/cheap_trick
Come_Together/beatles
Cocaine/nazareth
Claudette/roy_orbison
Cecilia/simon_and_garfunkel
Caroline_No/brian_wilson
Blue_Collar_Man/styx
Between_The_Bars/glen_phillips
Before_You_Accuse_Me/eric_clapton
America/simon_and_garfunkel
All_Along_The_Watchtower/dave_matthews_band
Addicted_To_Love/tina_turner
Abracadabra/sugar_ray
Ab Ad Al Am Be Be Bl Ca Ce Cl Co Co Da En Fa Go Go Gr Hu I_
I_ Le Ta
Test
Beat-Chroma Representations - Ellis
2007-05-23 - 13/23
Results (2): MIREX 06
• Cover song contest
• Found 761/3300
= 23% recall
next best: 11%
guess: 3%
Beat-Chroma Representations - Ellis
song-set (each row is one query song)
30 songs x 11
versions of each (!)
(data has not been
disclosed)
# true covers in top 10
8 systems compared
(4 cover song
+ 4 similarity)
MIREX 06 Cover Song Results:
# Covers retrieved per song per system
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
8
6
4
2
0
correct
matches
retrieved
CS DE KL1 KL2 KWL KWT LR TP
cover song systems
similarity systems
2007-05-23 - 14/23
Where are the matches?
• Look inside global cross-correlation to find
matching fragments...
xcorr = t f (C1(t, f)⋅C2(t, f)) - view along time
chroma
Let It Be / Beatles (beats 11-441)
G
F
D
C
chroma
A
50
100
150
200
250
Let It Be / Nick Cave (beats 13-443)
50
100
150
200
50
100
150
200
300
350
400 time / beats
250
300
350
400 time / beats
250
300
350
400 time / beats
G
F
D
C
A
0.4
0.2
0
-0.2
0
Beat-Chroma Representations - Ellis
2007-05-23 - 15/23
What are the mistakes?
• False reject - missed true match
cover version is too different, beat tracking wrong ...
• False alarm - invalid match
“Cocaine” (Clapton) vs. “Satisfaction” (Stones)
chroma
Eric Clapton - Cocaine - beats 17:1027
G
F
D
C
A
100
200
300
400
500
600
700
800
900
1000
chroma
Rolling Stones - Satisfaction - beats 1:1011
G
F
D
C
A
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000
2
1
0
-1
-2
0
Beat-Chroma Representations - Ellis
2007-05-23 - 16/23
4. Artist Identification (AID)
• Baseline system: “Bag of (timbral) frames”
MFCC frames, model as Gaussian or GMM
distance by likelihood or KL
• Dataset: [Mandel et al. 2006]
18 artists x 5 or 6 albums each
18x3 albums for training, 18x2 for test, 10x1 dev
u2
tina_turner
roxette
rolling_stones
queen
pink_floyd
metallica
madonna
green_day
genesis
garth_brooks
fleetwood_mac
depeche_mode
dave_matthews_band
ence_clearwater_revival
bryan_adams
beatles
aerosmith
15
10
true
5
0
-5
-10
ae be br crdade fl gage gr mamepi qu ro ro t
track
Beat-Chroma Representations - Ellis
u2
ti
ro
ro
qu
pi
me
ma
gr
ge
ga
fl
de
da
cr
br
be
ae
20
15
10
5
aebe br cr dade fl gage grmamepi qu ro ro t
recog
2007-05-23 - 17/23
0
Beat Chroma Features for AID?
• Artists may use tonality in particular ways...
density, variety
particular chords
(influence of instruments on chroma features)
Northern Lad (1998) @ 1:35 (tatum=238 BPM)
Cars and Guitars (2005) @ 1:05 (tatum=333 BPM)
12
12
10
10
8
8
6
6
4
4
2
2
20
40
60
80
20
40
60
80
• Try bag-of-frames on beat-chroma rep’n
use several consecutive beats?
key-normalization of each piece?
Beat-Chroma Representations - Ellis
2007-05-23 - 18/23
Key Normalization
• Could try matching at all possible rotations..
• .. or just transpose every piece initially
Taxman
aligned chroma
single Gaussian
model of one piece
find ML rotation
of other pieces
model all
transposed pieces
iterate until
convergence
I'm Only Sleeping
G
F
G
F
G
F
D
D
D
C
C
C
A
A
A
A
Love You To
C D
F G
A
Aligned Global model
G
F
G
F
D
D
D
C
C
C
A
A
A
C D
She Said She Said
C D
F G
Good Day Sunshine
A
G
F
G
F
D
D
D
C
C
C
A
C D
F G
C D
F G
And Your Bird Can Sing
G
F
A
F G
A
A
F G
C D
Yellow Submarine
G
F
A
Beat-Chroma Representations - Ellis
Eleanor Rigby
A
A
C D
F G
2007-05-23 - 19/23
A
C D
F G
aligned chroma
fitting a
tures to
he likeclass is
rotation
ew sinrotated
ated for
changes
atch the
s likeliy larger,
e Gaus-
Timbre+Chroma AID
• Preliminary Mandel18 Artist ID accuracy:
Feature
Model
T win
MFCC20
FullCov
1
MFCC20
64 GMM
1
Chroma
FullCov
1
Chroma
FullCov
4
Chroma
64GMM
1
Chroma
64GMM
4
ChromaKN FullCov
1
ChromaKN FullCov
4
ChromaKN 64GMM
1
ChromaKN 64GMM
4
MFCC + Chroma fusion
Beat-Chroma
- Ellis
Table 1. Representations
Artist identification
figurations.
Acc
48%
33%
15%
14%
24%
15%
17%
14%
25%
16%
52%
Exec. time
212 s
1952 s
46 s
117 s
850 s
2242 s
110 s
258 s
2533 s
5803 s
- 20/23
results for 2007-05-23
various con“T win” is the size of temporal context
Artist Fragments
• Idea: Find the most discriminant
with Courtenay Cotton
beat-chroma fragments per artist
k-means cluster 16 beat fragments within piece
!
!"#$% &$! "#$%&'! ()'*+$),! %(! '! -)'*.),! ,%/01! 23*#! *#)! /3/)! 4567)'*!
keep
fragments largest ratio
($'0&)/*,!)8*$'9*):!7;!9.+,*)$3/0!#30#.30#*):<!
!
(avg.
similarity to same artist)/(avg. sim. to others)
!
classify test pieces
by ID of best-scoring fragment
'$%()*+,*-./0%
!
'$&$%1232453%
%
Beat-Chroma
Representations
- Ellis 2',! *),*):! %/! '! ,+7,)*!
2007-05-23
=#)! &)*#%:!
:),9$37):! #)$)!
%(! *#)! - 21/23
+,>%>?@@?! :'*',)*! ABC<! ! D+)! *%! *3&)! 9%/,*$'3/*,! '/:! *#)! !"#$% 6$! P8'&>.)!
2!$+3!
Artist Fragment Results
• Preliminary, 5 way artist ID, ~32% correct
•
•
•
!
!
!"#$% /$! "*+',%&*+! <$)-&@! '*-! )71! K! $-)&%)%;! (*<9&+&+0! )1%)!
$+3!/$#&3$)&*+!-1%,#)%5!
!
!
Beat-Chroma Representations - Ellis
<&%(#$%%&'&13;! $+3! )71&-! '-$0<1+)%! $-1! +*)! 1$%&#.! ,%13! )*!
<&%(#$%%&'.! *)71-! $-)&%)%! 1&)71-5! ! 47&%! &%! $! 0**3! 1@$<:#1! *'!
need to search
more fragments
way to choose phrase
beginnings?
a basis set for all
tonal content?
2007-05-23 - 22/23
Conclusions and Future Work
• Beat-synchronous chroma features
are successful for matching cover songs
captures melody-harmony, not instruments
• Further uses:
Beat-chroma fragments
as musical building blocks
e.g. VQ over large body of music
find recurrent motifs
artist identification?
• Code available!
Google “matlab chroma features”
Beat-Chroma Representations - Ellis
2007-05-23 - 23/23
Download