Lecture 12: Alignment and Matching 1. Music Alignment 2. Cover Song Detection

advertisement
ELEN E4896 MUSIC SIGNAL PROCESSING
Lecture 12:
Alignment and Matching
1. Music Alignment
2. Cover Song Detection
3. Echo Nest Analyze
Dan Ellis
Dept. Electrical Engineering, Columbia University
dpwe@ee.columbia.edu
E4896 Music Signal Processing (Dan Ellis)
http://www.ee.columbia.edu/~dpwe/e4896/
2013-04-15 - 1 /22
1. Music Alignment
• Often have
Kurth et al., 2007
versions of the
same music with
unmatched time
axes
different
performances
performance
vs. score
• Various applications for aligning them
synchronizing different tracks (with TSM)
synchronized score display
ground truth transcriptions
E4896 Music Signal Processing (Dan Ellis)
2013-04-15 - 2 /22
The Similarity Matrix
Foote 1999
48 time / beats
• Point-to-point comparison of sequences
yj (k)|2
or normalized inner
product (cosine distance)
xi (k)yj (k)
2
2
k |xi (k)|
k |yj (k)|
k
5
4
3
2
1
B
A
G
E
D
C
dcos (i, j) = 1
6
32
k
|xi (k)
j
B
A
G
E
D
C
E4896 Music Signal Processing (Dan Ellis)
dij
16
deuc (i, j) =
Let It Be - The Beatles
e.g. Euclidean distance
Euclidean Distance
Let It Be - Nick Cave
16
32
i
48 time / beats
2013-04-15 - 3 /22
Dynamic Programming
Bellman 1957
• Find best path combining local + transitions
works for any kind of similarity matrix
Local costs dij ; C* ; paths
j
Allowable transitions
{ik, jk}
T(1,0) = 0.1
T(1,1)
= 0.0
T(0,1)
= 0.1
{ik-1, jk-1}
Finds path {ik, jk}
to minimize cost ...
Cimax ,jmax =
2.8
2.5
2.3
2.0
1.7
dij
8
2.7
1.6
7
2.1
2.2
2.0
1.8
1.5
1.3
1.2
1.6
6
1.4
1.5
1.3
1.2
1.0
0.9
1.3
1.5
5
1.2
1.2
1.0
1.0
0.7
0.9
1.2
1.5
1.5
0.9
4
0.8
0.8
0.7
0.6
0.7
1.2
1.8
ik
1 , jk
jk
1)
... recursively
Ci,j =
0.6
3
0.5
0.5
0.4
0.5
0.9
1.5
2.2
2.7
2
0.3
0.3
0.4
0.6
1.0
1.5
2.1
2.6
1
0.1
0.4
0.6
0.8
1.1
1.6
2.3
2.9
0.4
1
min
0.7
2.4
k
x,y={(1,1),(1,0),(0,1)}
0.8
0.5
d(ik , jk )
+T (ik
1
d(i, j) + T (x, y) + Ci
E4896 Music Signal Processing (Dan Ellis)
2
3
4
5
6
7
8
i
0.3
0.2
0.1
0
x,j y
2013-04-15 - 4 /22
Audio-to-Audio Alignment
• Dynamic programming to get time mapping
+ phase vocoder time scaling
500
450
400
350
300
250
200
150
100
50
50
E4896 Music Signal Processing (Dan Ellis)
100
150
200
250
300
350
400
2013-04-15 - 5 /22
Audio-Score Alignment
• Aligning a score representation (e.g. MIDI)
is a proxy for polyphonic transcription
Let It Be + aligned MIDI labels
1000
freq / Hz
800
600
400
200
0
0
2
4
E4896 Music Signal Processing (Dan Ellis)
6
8
10
12
time / sec
14
16
18
20
22
2013-04-15 - 6 /22
Peak Structure Distance
Orio & Schwartz 2001
• How do we match spectra to score notes?
Synthesized audio
freq / kHz
MIDI “Piano roll”
note
synthesize audio from MIDI & compare audio?
“Peak Structure distance”:
k M [k]|X[k]|
dpsd = 1
is energy where we expect?
k |X[k]|
C6
C5
C4
C3
C2
E4896 Music Signal Processing (Dan Ellis)
freq / bins
freq / bins
“Peak Structure”
= energy blw mask
50
100
150
200
250
300
350
400
1
time / frames
0.5
0
Predicted spectrum
= mask M[k]
0
0
5
10
15
20 time / sec
80
60
40
20
80
60
40
20
0
0
50
100
150
200
250
300
350
400
50
100
150
200
250
300
350
400
450
time / frames
2013-04-15 - 7 /22
2. Cover Song Detection
• Musicians are fond of ‘cover versions’
usually alter melody, harmony, instrumentation,
rhythm, style
can be hard to spot even for a human!
• Can try to match via alignment
.. with some threshold on best alignment cost?
E4896 Music Signal Processing (Dan Ellis)
2013-04-15 - 8 /22
Smith-Waterman Local Alignment
different number,
ordering of verse/
chorus/brige
want to find any large
aligned regions
Beatles vs. Carol Woods − cosine dist
200
180
160
140
120
100
80
60
40
20
50
100
• “Local alignment” measure
Si,j = max max{0, s(i, j)
x,y
Smith−Waterman cd/2,.96.1.2
time / beats
•
Cover version may
have different form
150
P (x, y) + Si
50
100
time / beats
x,j y }
want largest score S*
similarity s(i, j) must exceed penalty P(x,y) on avg.
(e.g. 0.96 for diagonal, 1.2 for off-diagonal)
E4896 Music Signal Processing (Dan Ellis)
2013-04-15 - 9 /22
Local Alignment Cover Detection
Serrà & Gòmez, 2008
• Smith-Waterman needs predictable values
use binary similarity based on best transposition
Euclidean
E4896 Music Signal Processing (Dan Ellis)
Binary
Non-cover
2013-04-15 - 10/22
Cross-correlation Covers System
• DP is good for time-warping, but expensive
beat-timing is tempo independent (if it works)
simply cross-correlate beat-chroma patches?
chroma bins
Query
how big are the
pieces?
G
E
D
C
A
100
200
extract
300
400
beats
500
how do we
combine individual
scores?
also expensive
cross-correlate
chroma bins
Candidate
G
E
D
C
A
100
200
E4896 Music Signal Processing (Dan Ellis)
300
400
500
beats
2013-04-15 - 11/22
Global Cross-Correlation
Ellis & Poliner, 2007
• Cross-correlate entire beat-chroma matrices
... at all possible transpositions (circular)
implicit combination of match quality and duration
chroma bins
Elliott Smith - Between the Bars
G
E
D
C
A
skew / semitones
chroma bins
100
200
300
400
Glen Phillips - Between the Bars
500
beats @281 BPM
G
E
D
C
A
Cross-correlation
+5
0
-5
-500
• One good matching fragment is sufficient...?
-400
-300
E4896 Music Signal Processing (Dan Ellis)
-200
-100
0
100
200
300
400 skew / beats
2013-04-15 - 12/22
Filtered Cross-Correlation
• Raw correlation not as important as precise
local match
skew / semitones
looking for large contrast at ±1 beat skew
i.e. high-pass filter
Cross-correlation
+5
0
-5
-500
-400
-300
-200
-100
0
100
200
Cross-correlation @ skew = +2 semitones
0.6
300
400 skew / beats
300
400 skew / beats
raw
0.4
0.2
filtered
0
-500
-400
-300
-200
E4896 Music Signal Processing (Dan Ellis)
-100
0
100
200
2013-04-15 - 13/22
Cover Song Results
• 23 Covers found in 8700 song ‘uspop2002’
Cover Songs - dpwe23 - 12/23 correct
Take_Me_To_The_River/annie_lennox
Let_It_Be/nick_cave
I_Love_You/faith_hill
I_Can_t_Get_No_Satisfaction/rolling_stones
Hush/milli_vanilli
Grand_Illusion/styx
Gold_Dust_Woman/sheryl_crow
God_Only_Knows/brian_wilson
Query
Faith/limp_bizkit
Enjoy_The_Silence/tori_amos
Day_Tripper/cheap_trick
Come_Together/beatles
Cocaine/nazareth
Claudette/roy_orbison
Cecilia/simon_and_garfunkel
Caroline_No/brian_wilson
Blue_Collar_Man/styx
Between_The_Bars/glen_phillips
Before_You_Accuse_Me/eric_clapton
America/simon_and_garfunkel
All_Along_The_Watchtower/dave_matthews_band
Addicted_To_Love/tina_turner
Abracadabra/sugar_ray
Ab Ad Al Am Be Be Bl Ca Ce Cl Co Co Da En Fa Go Go Gr Hu I_
I_ Le Ta
Test
popular ‘decoys’ – normalization issues
E4896 Music Signal Processing (Dan Ellis)
2013-04-15 - 14/22
Analyzing Cover Song Correlation
• Look inside global cross-correlation to find
matching fragments...
xcorr = t f (C1(t, f)⋅C2(t, f)) - view along time
chroma
Let It Be / Beatles (beats 11-441)
G
F
D
C
chroma
A
50
100
150
200
250
Let It Be / Nick Cave (beats 13-443)
50
100
150
200
50
100
150
200
300
350
400 time / beats
250
300
350
400 time / beats
250
300
350
400 time / beats
G
F
D
C
A
0.4
0.2
0
-0.2
0
E4896 Music Signal Processing (Dan Ellis)
2013-04-15 - 15/22
Cover Song False Alarm
• Correlation can be weak
“Cocaine” (Clapton) vs. “Satisfaction” (Stones)
chroma
Eric Clapton - Cocaine - beats 17:1027
G
F
D
C
A
100
200
300
400
500
600
700
800
900
1000
chroma
Rolling Stones - Satisfaction - beats 1:1011
G
F
D
C
A
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000
2
1
0
-1
-2
0
E4896 Music Signal Processing (Dan Ellis)
2013-04-15 - 16/22
3. Echo Nest Analyze
• Web service to provide
beat, chroma, ...
analysis (and much
more)
register for free API
key
freq / Hz
TRKUYPW128F92E1FC0 - Tori Amos - Smells Like Teen Spirit
Original
761
240
B
A
G
http://
developer.echonest.c
om/account/register/
EN
Features
E
D
C
freq / Hz
upload MP3, get
back XML with
analysis data
2416
2416
Resynth
761
240
0
E4896 Music Signal Processing (Dan Ellis)
2
4
6
8
10
12 time / sec 14
2013-04-15 - 17/22
EN Analyze Usage
• Matlab wrapper function
E4896 Music Signal Processing (Dan Ellis)
2013-04-15 - 18/22
Million Song Dataset (MSD)
Thierry Bertin-Mahieux
• Commercial-scale dataset
available to MIR researchers
1M pop songs
250 GB of features
(6 years of listening)
• EN Analyze features +...
Lyrics,
Tags,
Covers,
Listeners ...
http://labrosa.ee.columbia.edu/millionsong
E4896 Music Signal Processing (Dan Ellis)
2013-04-15 - 19/22
MSD Metadata
EN Metadata
Last.fm Tags
artist:
release:
title:
id:
key:
mode:
loudness:
tempo:
time_signature:
duration:
sample_rate:
audio_md5:
7digitalid:
familiarity:
year:
100.0 – cover
57.0 – covers
43.0 – female vocalists
42.0 – piano
34.0 – alternative
14.0 – singer-songwriter
11.0 – acoustic
8.0 – tori amos
7.0 – beautiful
6.0 – rock
6.0 – pop
6.0 – Nirvana
6.0 – female vocalist
6.0 – 90s
5.0 – out of genre covers
'Tori Amos'
'LIVE AT MONTREUX'
'Smells Like Teen Spirit'
'TRKUYPW128F92E1FC0'
5
0
-16.6780
87.2330
4
216.4502
22050
'8'
5764727
0.8500
1992
SHS Covers
%5489,4468, Smells
TRTUOVJ128E078EE10
TRFZJOZ128F4263BE3
TRJHCKN12903CDD274
TRELTOJ128F42748B7
TRJKBXL128F92F994D
TRIHLAW128F429BBF8
TRKUYPW128F92E1FC0
5.0
4.0
4.0
4.0
4.0
3.0
3.0
3.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
cover songs
soft rock
nirvana cover
Mellow
alternative rock
chick rock
Ballad
Awesome Covers
melancholic
k00l ch1x
indie
female vocalistist
female
cover song
american
MxM Lyric Bag-of-Words
Like Teen Spirit
Nirvana
Weird Al Yankovic
Pleasure Beach
The Flying Pickets
Rhythms Del Mundo feat. Shanade
The Bad Plus
Tori Amos
E4896 Music Signal Processing (Dan Ellis)
12
11
10
9
7
6
6
6
hello
i
a
and
it
are
we
now
6
6
6
4
4
4
3
3
here
us
entertain
the
feel
yeah
to
my
3
3
3
3
3
3
3
3
is
with
oh
out
an
light
less
danger
2013-04-15 - 20/22
Summary
• Music Alignment
Dynamic Programming finds correspondence
• Cover Songs
DP, or cross-correlation for efficiency
• EN Analyze
Web service to analyze audio
E4896 Music Signal Processing (Dan Ellis)
2013-04-15 - 21/22
References
•
•
•
•
•
•
R. Bellman, Dynamic Programming, Princeton University Press. 1957.
D. Ellis and G. Poliner, “Identifying Cover Songs With Chroma Features and
Dynamic Programming Beat Tracking,” Proc. ICASSP-07, Hawai'i, pp.
IV-1429-1432, 2007.
J. Foote, “Visualizing Music and Audio using Self-Similarity,” In Proc. ACM
Multimedia, Orlando, pp. 77-80, 1999.
Frank Kurth, Meinard Müller, Christian Fremerey,Yoon ha Chang, and Michael
Clausen, “Automated synchronization of scanned sheet music with audio
recordings,” Proc. 8th International Conference on Music Information Retrieval
(ISMIR),Vienna, pp. 261-266, 2007.
N. Orio & D. Schwarz, “Alignment of monophonic and polyphonic music to a
score,” Proc. Int. Comp. Music Conf., Havana, pp.155-158, 2001.
J. Serrà, E. Gómez, P. Herrera, X. Serra, “Chroma Binary Similarity and Local
Alignment Applied to Cover Song Identification,” IEEE Trans. on Audio, Speech and
Lang. Proc., 16(6), pp. 1138-1151, 2008.
E4896 Music Signal Processing (Dan Ellis)
2013-04-15 - 22/22
Download