Music Research at Lab ROSA Dan Ellis

advertisement
Music Research
at LabROSA
Dan Ellis
Laboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Engineering, Columbia University, NY USA
http://labrosa.ee.columbia.edu/
1.
2.
3.
4.
Motivation: Music Collections
Music Information
Music Similarity
Music Structure Discovery
Music @ LabROSA - Ellis
2008-03-14
p. 1 /11
LabROSA Overview
Information
Extraction
Music
Recognition
Environment
Separation
Machine
Learning
Retrieval
Signal
Processing
Speech
Music @ LabROSA - Ellis
2008-03-14
p. 2 /11
The Challenges of Music Audio
• A lot of music data available
e.g. 60G of MP3 ≈ 1000 hr of audio, 15k tracks
• Challenges
can computers help manage?
can we learn something?
• Application scenarios
personal music collection
discovering new music
“music placement” music
• ‘Data-driven musicology’?
Music @ LabROSA - Ellis
2008-03-14
p. 3 /11
Transcription as Classification
• Exchange signal models for data
Graham Poliner
transcription as pure classification problem:
feature representation
feature vector
Training data and features:
•MIDI, multi-track recordings,
playback piano, & resampled audio
(less than 28 mins of train audio).
•Normalized magnitude STFT.
classification posteriors
Classification:
•N-binary SVMs (one for ea. note).
•Independent frame-level
classification on 10 ms grid.
•Dist. to class bndy as posterior.
hmm smoothing
Temporal Smoothing:
•Two state (on/off) independent
HMM for ea. note. Parameters
learned from training data.
•Find Viterbi sequence for ea. note.
Music @ LabROSA - Ellis
2008-03-14
p. 4 /11
Singing Voice Modeling & Alignment
• How are phonemes sung?
e.g. “vowel modification” in classical voice
Christine Smit
Johanna Devaney
• Collect the data
.. by identifying solos
.. by aligning libretto
to recordings
e.g. align
Karaoke MIDI files
to original recordings
• Lyric Transcription?
Music @ LabROSA - Ellis
2008-03-14
p. 5 /11
MajorMiner: Semantic Tags
Mike Mandel
• Describe segment in human-relevant terms
e.g. anchor space, but more so
• Need ground truth...
what words to people use?
• MajorMiner
game:
400 users
7500 unique tags
70,000 taggings
2200 10-sec clips used
• Train classifiers...
Music @ LabROSA - Ellis
2008-03-14
p. 6 /11
Cover Song Matching: Correlation
• Cross-correlate entire song beat-chroma matrices
... at all possible transpositions
implicit combination of match quality and duration
chroma bins
Elliott Smith - Between the Bars
G
E
D
C
A
skew / semitones
chroma bins
100
200
300
400
Glen Phillips - Between the Bars
500
beats @281 BPM
G
E
D
C
A
Cross-correlation
+5
0
-5
-500
• One good matching fragment is sufficient...?
-400
-300
Music @ LabROSA - Ellis
-200
-100
0
100
200
300
400 skew / beats
2008-03-14
p. 7 /11
Cross-Correlation Similarity
• Use correlation to find similarity?
Courtenay Cotton
Mike Mandel
e.g. similar note/instrumentation sequence
ssigns a single time anchor or boundary within each segTable
Results of the subjective similarity evaluation.
may sound
similar
to1. judges
then calculates the correlation
only at the very
single skew
• Evaluate by subjective tests
aligns the time anchors. We use the BIC method [8] to
he boundary time within the feature matrix that maxis the likelihood advantage of fitting separate Gaussians
e features each side of the boundary compared to fithe entire sequence with a single Gaussian i.e. the time
that divides the feature array into maximally dissimiarts. While almost certain to miss some of the matching
ments, an approach of this simplicity may be the only
e option when searching in databases consisting of milof tracks.
Counts are the number of times the best hit returned by each
algorithm was rated as similar by a human rater. Each algorithm provided one return for each of 30 queries, and was
judged by 6 raters, hence the counts are out of a maximum
possible of 180.
Algorithm
Similar count
(1) Xcorr, chroma
48/180 = 27%
(2) Xcorr, MFCC
48/180 = 27%
(3) Xcorr, combo
55/180 = 31%
(4) Xcorr, combo + tempo
34/180 = 19%
(5) Xcorr, combo at boundary 49/180 = 27%
(6) Baseline, MFCC
81/180 = 45%
(7) Baseline, rhythmic
49/180 = 27%
(8) Baseline, combo
88/180 = 49%
Random choice 1
22/180 = 12%
Random choice 2
28/180 = 16%
modeled after MIREX similarity
3. EXPERIMENTS AND RESULTS
major challenge in developing music similarity systems
rforming any kind of quantitative analysis. As noted
e, the genre and artist classification tasks that have been
as proxies in the past most likely fall short of accounting
ubjective similarity, particularly in the case of a system
@structural
LabROSA
- Ellis
as ours which aims Music
to match
detail
instead of
ll statistics. Thus, we conducted a small subjective lisg test of our own, modeled after the MIREX music simi-
8 /11 boundary of
2008-03-14
combined score evaluated
only at the p.
reference
section 2.1. To these, we added three additional hits from a
more conventional feature statistics system using (6) MFCC
Beat Chroma Fragment Clustering
• Idea: Build a dictionary of harmonic/melodic
fragments by clustering a large corpus
Beat
tracking
Music
audio
Chroma
features
Key
normalization
Locality
Sensitive
Hash Table
Landmark
identification
• 86 Beatles tracks ⇒ 41,705 patches (12x24)
12-Piggies 22.0-29.6s
12
12
10
10
8
8
chroma bin
chroma bin
LSH takes
~300 sec
High-pass
along time
Song filter
09-Martha My Dear 90.9-98.6s
6
6
4
4
2
2
5
10
15
20
5
beat
Music @ LabROSA - Ellis
10
15
beat
2008-03-14
p. 9 /11
20
MEAPsoft
• Music Engineering Art Projects
collaboration between EE
and Computer Music Center
with Douglas Repetto,
Ron Weiss, and the rest
of the MEAP team
• MEAPsoft combines music IR analysis
with wacky resequencing algorithms
also some neat visualizations...
Music @ LabROSA - Ellis
2008-03-14
p. 10/11
Summary
• Lots of data
+ noisy transcription
+ weak clustering
musical insights?
•
Low-level
features
Classification
and Similarity
browsing
discovery
production
Music
Structure
Discovery
modeling
generation
curiosity
Melody
and notes
Music
audio
Key
and chords
Tempo
and beat
Music @ LabROSA - Ellis
2008-03-14
p. 11 /11
Download