Mining Large-Scale Music Data Sets 1. 2.

advertisement
Mining Large-Scale
Music Data Sets
Dan Ellis & Thierry Bertin-Mahieux
Laboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Eng., Columbia Univ., NY USA
{dpwe,thierry}@ee.columbia.edu
1.
2.
3.
4.
http://labrosa.ee.columbia.edu/
Music Audio Representations
The Million Song Dataset
Finding Similar Items (Cover Songs)
Current Work
LabROSA Overview - Dan Ellis
2012-02-09
1 /15
The Problem
Query track
One Million Songs
4000
Frequency
3000
2000
1000
0
0
1
2
3
4
5
Time
6
7
8
9
10
“Similar” tracks
4000
Frequency
3000
2000
4000
1000
4000
0
3000
1
2
3
4
5
Time
6
7
8
9
10
Frequency
0
Frequency
3000
2000
2000
1000
1000
0
0
LabROSA Overview - Dan Ellis
0
1
2
3
4
5
Time
6
7
8
9
0
1
2
3
4
5
Time
6
7
8
9
10
10
2012-02-09
2 /15
1. Music Audio Representations
• We need a description of the audio that
contains “suitable” detail
• Speech Recognition uses Mel-Frequency
Cepstral Coefficients (MFCCs)
Let It Be (LIB-1) - log-freq specgram
freq / Hz
5915
1396
329
Coefficient
MFCCs
12
10
8
6
4
2
Noise excited MFCC resynthesis (LIB-2)
freq / Hz
5915
1396
329
0
LabROSA Overview - Dan Ellis
5
10
15
20
25
time / sec
2012-02-09
3 /15
Warren et al. 2003
Chroma Features
• We’d like to preserve the notes
at least within one octave
Let It Be - log-freq specgram (LIB-1)
freq / Hz
6000
1400
chroma bin
300
Chroma features
B
A
G
E
D
C
Shepard tone resynthesis of chroma (LIB-3)
freq / Hz
6000
1400
300
MFCC-filtered shepard tones (LIB-4)
freq / Hz
6000
1400
300
0
LabROSA Overview - Dan Ellis
5
10
15
20
25
time / sec
2012-02-09
4 /15
Beat-Synchronous Chroma
• Perform Beat Tracking
.. and record one chroma vector per beat
compact representation of harmonies
Let It Be - log-freq specgram (LIB-1)
freq / Hz
6000
1400
300
chroma bin
Onset envelope + beat times
Beat-synchronous chroma
B
A
G
E
D
C
Beat-synchronous chroma + Shepard resynthesis (LIB-6)
freq / Hz
6000
1400
300
0
LabROSA Overview - Dan Ellis
5
10
15
20
25
time / sec
2012-02-09
5 /15
2. Million Song Dataset (MSD)
Thierry Bertin-Mahieux
• Commercial-scale dataset
available to MIR researchers
1M pop songs
250 GB of features
(6 years of listening)
• Many facets:
Features,
Lyrics,
Tags,
Covers,
Listeners ...
http://labrosa.ee.columbia.edu/millionsong
LabROSA Overview - Dan Ellis
2012-02-09
6 /15
MSD Audio Features
• Use Echo Nest “Analyze” features
segment audio into variable-length “events”
supports
a crude
resynthesis:
freq / Hz
2416
Origina
761
240
B
A
G
EN
Featur
E
D
C
freq / Hz
represent by
12 chroma +
12 “timbre”
TRKUYPW128F92E1FC0 - Tori Amos - Smells Like Teen Spirit
2416
Resyn
761
240
0
LabROSA Overview - Dan Ellis
2
4
6
8
10
12 time / sec 14
2012-02-09
7 /15
MSD Metadata
EN Metadata
Last.fm Tags
artist:
release:
title:
id:
key:
mode:
loudness:
tempo:
time_signature:
duration:
sample_rate:
audio_md5:
7digitalid:
familiarity:
year:
100.0 – cover
57.0 – covers
43.0 – female vocalists
42.0 – piano
34.0 – alternative
14.0 – singer-songwriter
11.0 – acoustic
8.0 – tori amos
7.0 – beautiful
6.0 – rock
6.0 – pop
6.0 – Nirvana
6.0 – female vocalist
6.0 – 90s
5.0 – out of genre covers
'Tori Amos'
'LIVE AT MONTREUX'
'Smells Like Teen Spirit'
'TRKUYPW128F92E1FC0'
5
0
-16.6780
87.2330
4
216.4502
22050
'8'
5764727
0.8500
1992
SHS Covers
%5489,4468, Smells
TRTUOVJ128E078EE10
TRFZJOZ128F4263BE3
TRJHCKN12903CDD274
TRELTOJ128F42748B7
TRJKBXL128F92F994D
TRIHLAW128F429BBF8
TRKUYPW128F92E1FC0
5.0
4.0
4.0
4.0
4.0
3.0
3.0
3.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
cover songs
soft rock
nirvana cover
Mellow
alternative rock
chick rock
Ballad
Awesome Covers
melancholic
k00l ch1x
indie
female vocalistist
female
cover song
american
MxM Lyric Bag-of-Words
Like Teen Spirit
Nirvana
Weird Al Yankovic
Pleasure Beach
The Flying Pickets
Rhythms Del Mundo feat. Shanade
The Bad Plus
Tori Amos
LabROSA Overview - Dan Ellis
12
11
10
9
7
6
6
6
hello
i
a
and
it
are
we
now
6
6
6
4
4
4
3
3
here
us
entertain
the
feel
yeah
to
my
3
3
3
3
3
3
3
3
is
with
oh
out
an
light
less
danger
2012-02-09
8 /15
3. Finding Similar Items
Avery Wang ’03
• If it’s really the same, we can use
freq / Hz
fingerprinting (e.g. Shazam):
Query audio
4000
3500
3000
2500
find local “landmarks”
quantize by pairs
inverted index
2000
1500
1000
500
0
Match: 05−Full Circle at 0.032 sec
4000
3500
3000
2500
2000
1500
1000
500
0
LabROSA Overview - Dan Ellis
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
2012-02-09
time / sec
9 /15
Dynamic Programming Alignment
Serra, Gomez et al. ’08
• We can match the chroma representations
by DP alignment
robust to timing changes
quite efficient
best performer in MIREX Cover Song evaluations
LabROSA Overview - Dan Ellis
2012-02-09 10/15
Large-Scale Cover Matching
Bertin-Mahieux & Ellis ’11
• How can we find covers in 1M songs?
@ 1 sec / comparison, one search = 11.5 CPU-days
full N2 mining = 16,000 CPU-years
• Hashing?
landmarks from chroma patches (like Shazam)
represent item by distribution of contour fragments
one stage in a filtering hierarchy...
LabROSA Overview - Dan Ellis
2012-02-09 11/15
Euclidean Cover Matching
Ellis & Poliner ’08
Bertin-Mahieux & Ellis ’12
• Reduce cover matching to nearest-neighbor
search in fixed-size beat-chroma patches
• Right “distance”?
10
8
principal components
learned weightings
6
4
2
120
Alignment/
segmentation?
music segmentation
multiple probes
framing-insensitive
representation
LabROSA Overview - Dan Ellis
130
140
150
160
170
Between the Bars − Glenn Phillips − pwr .25 − −17 beats − trsp 2
12
10
8
6
4
2
120
chroma bin
•
Between the Bars − Elliot Smith − pwr .25
12
130
140
150
160
170
160
170
pointwise product (sum(12x300) = 19.34)
12
10
8
6
4
2
120
130
140
150
time / beats
2012-02-09 12/15
4. Other Work: Pattern Mining
• Cluster beat-synchronous chroma patches
#32273 - 13 instances
chroma bins
G
F
D
C
A
G
F
D
C
A
G
F
D
C
A
G
F
D
C
A
5
LabROSA Overview - Dan Ellis
a20-top10x5-cp4-4p0
#51917 - 13 instances
#65512 - 10 instances
#9667 - 9 instances
#10929 - 7 instances
#61202 - 6 instances
#55881 - 5 instances
#68445 - 5 instances
10
15
20
5
10
15
time / beats
2012-02-09 13/15
Other Work: MSD Challenge
with Brian McFee
• Using “Taste Profile” data
User-Song-playcount
b80344d063b5ccb3212f76538f3d9e43d87dca9e
b80344d063b5ccb3212f76538f3d9e43d87dca9e
b80344d063b5ccb3212f76538f3d9e43d87dca9e
b80344d063b5ccb3212f76538f3d9e43d87dca9e
b80344d063b5ccb3212f76538f3d9e43d87dca9e
b80344d063b5ccb3212f76538f3d9e43d87dca9e
SOAKIMP12A8C130995
SOAPDEY12A81C210A9
SOBBMDR12A8C13253B
SOCNMUH12A6D4F6E6D
SODACBL12A8C13C273
SODDNQT12A6D4F5F7E
1
1
2
1
1
5
• Challenge:
Rank 1M songs to complete half a playlist
using audio, metadata, lyrics, whatever
• Kaggle.com based contest
self-service evaluation, leader board
• Results at MIREX, AdMIRe
LabROSA Overview - Dan Ellis
2012-02-09 14/15
Summary
•
Finding Musical Similarity at Large Scale
•
Beat Chroma Representation
•
Million Song Dataset
•
MSD Challenge
LabROSA Overview - Dan Ellis
2012-02-09 15/15
Download