Searching for Similar Phrases in Music Audio Dan Ellis

advertisement
Searching for
Similar Phrases
in Music Audio
Dan Ellis
Laboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Engineering, Columbia University, NY USA
http://labrosa.ee.columbia.edu/
1.
2.
3.
4.
Motivation: Similar Phrases
Phrase Matching System
Experiments
Conclusions & Future
Similar Phrases in Music - Ellis
2007-12-18
p. 1 /12
1. Motivation: Similar Phrases
• Idea: Music is a sequence of reused pieces
e.g. melodic runs,
chord sequences, ...
• Can we identify them in
large music databases?
... which we have
i.e. machine learning
• Applications
classification and matching of pieces
compressed representation
data-driven musicology
Similar Phrases in Music - Ellis
2007-12-18
p. 2 /12
Common Phrase Discovery
Beat
tracking
Music
audio
Chroma
features
Key
normalization
Landmark
identification
Locality
Sensitive
Hash Table
• Chop up music into short descriptions of
musical content
24-beat beat-chroma matrices?
• Choose a few that appear to be “starts”
• Put into LSH table (similar items fall in same
bin)
• Find the bins with most entries
Similar Phrases in Music - Ellis
2007-12-18
p. 3 /12
2. Phrase Matching: Beat Tracking
Ellis ’06,’07
• Goal: One feature vector per ‘beat’ (tatum)
for tempo normalization, efficiency
• “Onset Strength Envelope”
freq / mel
sumf(max(0, difft(log |X(t, f)|)))
40
30
20
10
0
0
5
10
time / sec 15
• Autocorr. + window → global tempo estimate
0
168.5100BPM 200
0
300
Similar Phrases in Music - Ellis
400
500
600
700
800
900
1000
lag / 4 ms samples
2007-12-18
p. 4 /12
Chroma Features
• Chroma features convert spectral energy
into musical weights in a canonical octave
i.e. 12 semitone bins
IF chroma
chroma
3
2
G
F
D
C
1
A
0
2
4
6
8
10
100
time / sec
200
300
400
500
600
700
time / frames
• Can resynthesize as “Shepard Tones”
0
-10
-20
-30
-40
-50
-60
all octaves at once
12 Shepard tone spectra
freq / kHz
level / dB
Piano
scale
freq / kHz
Piano chromatic scale
4
Shepard tone resynth
4
3
2
1
0
500
1000
1500
2000
2500 freq / Hz
Similar Phrases in Music - Ellis
0
2
4
6
2007-12-18
8
10 time / sec
p. 5 /12
Key Estimation
• Covariance of chroma reflects key
• Normalize by transposing for best fit
single Gaussian
model of one piece
find ML rotation
of other pieces
model all
transposed pieces
iterate until
convergence
aligned chroma
Taxman
Eleanor Rigby
I'm Only Sleeping
G
F
G
F
G
F
D
D
D
C
C
C
A
A
A
A
Love You To
C D
F G
A
Aligned Global model
G
F
G
F
D
D
D
C
C
C
A
A
A
C D
She Said She Said
C D
F G
A
Good Day Sunshine
G
F
G
F
D
D
D
C
C
C
A
C D
F G
C D
F G
And Your Bird Can Sing
G
F
A
F G
A
A
F G
C D
Yellow Submarine
G
F
A
Similar Phrases in Music - Ellis
Ellis ICASSP ’07
A
A
C D
F G
2007-12-18
A
p. 6 /12
C D
F G
aligned chroma
Landmark Location
• Looking for “beginnings” of phrases
e.g. abrupt change in harmony, instruments, etc.
use likelihood ratio test:
weighted windows either side of boundary vs. all
• Choose top 10
freq / kHz
Come Together - Spectrogram, Beat-sync chromogram, and top 10 segment points
4
3
locally-normalized
peaks
2
1
0
12
chroma bin
.. to control
data size
5
10
8
6
4
2
0
Similar Phrases in Music - Ellis
50
100
2007-12-18
150
p. 7 /12
200
time / sec
250
Locality Sensitive Hashes
• Goal: Quantize high-dimensional data so
‘similar’ items fall into same bin
.. for fast and scalable nearest-neighbor search
• Idea: Multiple random scalar projections
each one will tend to keep neighbors nearby
items close
together in all
projections
are probably
neighbors
from Slaney & Casey ‘08
Similar Phrases in Music - Ellis
2007-12-18
p. 8 /12
• Data
3. Experiments
“artist20” - 20 artist x 6 albums = 1413 tracks
(up to) 10 landmarks/track = 14,078 patches
each patch = 12 chroma bins x 24 beats (288 dims)
• Performance
90
80
70
60
count
feature calculation:
~ 60 min
LSH 14k NNs:
~ 30 sec
51 patches have
>10 NNs
within r = 2.0
# neighbors within 2.0 - 14078 a20 patches
100
50
40
30
20
10
0
Similar Phrases in Music - Ellis
0
5
10
15
20
# near neighbors
2007-12-18
25
30
p. 9 /12
35
40
Results - artist20
green day 02-Blood Sex And Booze 122.3-129.8s
12
12
10
10
8
8
chroma bin
chroma bin
radiohead 01-You 192.7-198.5s
6
6
4
4
2
2
radiohead 11-Genchildren hidden 30.9-35.0s
12
12
10
10
8
8
chroma bin
chroma bin
radiohead 07-Ripcord 177.4-182.5s
6
6
4
4
2
2
5
10
beat
15
20
5
10
beat
15
20
mainly sustained notes
Similar Phrases in Music - Ellis
2007-12-18
p. 10/12
Results - Beatles
• Only the 86 Beatles tracks
• All beat offsets = 41,705 patches
LSH takes 300 sec - approx NlogN in patches?
• High-pass
Song filter
remove hits
in same
track
12
10
10
8
8
chroma bin
12
6
6
4
4
2
2
09-Martha My Dear 90.9-98.6s
chroma bin
•
to avoid
sustained
notes
05-Here There And Everywhere 12.1-20.5s
12-Piggies 22.0-29.6s
12
12
10
10
8
8
chroma bin
along time
chroma bin
02-I Should Have Known Better 92.4-97.7s
6
6
4
4
2
2
Similar Phrases in Music - Ellis
5
10
beat
15
20
5
2007-12-18
10
beat
15
p. 11/12
20
Summary / Conclusions
Beat
tracking
Music
audio
Chroma
features
Key
normalization
Locality
Sensitive
Hash Table
Landmark
identification
of data
• Lots
find motifs by counting near neighbors
patterns
• Common
e.g. melodic/harmonic-beat sequences
• Future
different features and/or pre-emphasis
better landmark points
complete dictionary
Similar Phrases in Music - Ellis
2007-12-18
p. 12/12
Download