Music Research at LabROSA Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Engineering, Columbia University, NY USA http://labrosa.ee.columbia.edu/ 1. 2. 3. 4. Motivation: Music Collections Music Information Music Similarity Music Structure Discovery Music @ LabROSA - Ellis 2008-03-14 p. 1 /11 LabROSA Overview Information Extraction Music Recognition Environment Separation Machine Learning Retrieval Signal Processing Speech Music @ LabROSA - Ellis 2008-03-14 p. 2 /11 The Challenges of Music Audio • A lot of music data available e.g. 60G of MP3 ≈ 1000 hr of audio, 15k tracks • Challenges can computers help manage? can we learn something? • Application scenarios personal music collection discovering new music “music placement” music • ‘Data-driven musicology’? Music @ LabROSA - Ellis 2008-03-14 p. 3 /11 Transcription as Classification • Exchange signal models for data Graham Poliner transcription as pure classification problem: feature representation feature vector Training data and features: •MIDI, multi-track recordings, playback piano, & resampled audio (less than 28 mins of train audio). •Normalized magnitude STFT. classification posteriors Classification: •N-binary SVMs (one for ea. note). •Independent frame-level classification on 10 ms grid. •Dist. to class bndy as posterior. hmm smoothing Temporal Smoothing: •Two state (on/off) independent HMM for ea. note. Parameters learned from training data. •Find Viterbi sequence for ea. note. Music @ LabROSA - Ellis 2008-03-14 p. 4 /11 Singing Voice Modeling & Alignment • How are phonemes sung? e.g. “vowel modification” in classical voice Christine Smit Johanna Devaney • Collect the data .. by identifying solos .. by aligning libretto to recordings e.g. align Karaoke MIDI files to original recordings • Lyric Transcription? Music @ LabROSA - Ellis 2008-03-14 p. 5 /11 MajorMiner: Semantic Tags Mike Mandel • Describe segment in human-relevant terms e.g. anchor space, but more so • Need ground truth... what words to people use? • MajorMiner game: 400 users 7500 unique tags 70,000 taggings 2200 10-sec clips used • Train classifiers... Music @ LabROSA - Ellis 2008-03-14 p. 6 /11 Cover Song Matching: Correlation • Cross-correlate entire song beat-chroma matrices ... at all possible transpositions implicit combination of match quality and duration chroma bins Elliott Smith - Between the Bars G E D C A skew / semitones chroma bins 100 200 300 400 Glen Phillips - Between the Bars 500 beats @281 BPM G E D C A Cross-correlation +5 0 -5 -500 • One good matching fragment is sufficient...? -400 -300 Music @ LabROSA - Ellis -200 -100 0 100 200 300 400 skew / beats 2008-03-14 p. 7 /11 Cross-Correlation Similarity • Use correlation to find similarity? Courtenay Cotton Mike Mandel e.g. similar note/instrumentation sequence ssigns a single time anchor or boundary within each segTable Results of the subjective similarity evaluation. may sound similar to1. judges then calculates the correlation only at the very single skew • Evaluate by subjective tests aligns the time anchors. We use the BIC method [8] to he boundary time within the feature matrix that maxis the likelihood advantage of fitting separate Gaussians e features each side of the boundary compared to fithe entire sequence with a single Gaussian i.e. the time that divides the feature array into maximally dissimiarts. While almost certain to miss some of the matching ments, an approach of this simplicity may be the only e option when searching in databases consisting of milof tracks. Counts are the number of times the best hit returned by each algorithm was rated as similar by a human rater. Each algorithm provided one return for each of 30 queries, and was judged by 6 raters, hence the counts are out of a maximum possible of 180. Algorithm Similar count (1) Xcorr, chroma 48/180 = 27% (2) Xcorr, MFCC 48/180 = 27% (3) Xcorr, combo 55/180 = 31% (4) Xcorr, combo + tempo 34/180 = 19% (5) Xcorr, combo at boundary 49/180 = 27% (6) Baseline, MFCC 81/180 = 45% (7) Baseline, rhythmic 49/180 = 27% (8) Baseline, combo 88/180 = 49% Random choice 1 22/180 = 12% Random choice 2 28/180 = 16% modeled after MIREX similarity 3. EXPERIMENTS AND RESULTS major challenge in developing music similarity systems rforming any kind of quantitative analysis. As noted e, the genre and artist classification tasks that have been as proxies in the past most likely fall short of accounting ubjective similarity, particularly in the case of a system @structural LabROSA - Ellis as ours which aims Music to match detail instead of ll statistics. Thus, we conducted a small subjective lisg test of our own, modeled after the MIREX music simi- 8 /11 boundary of 2008-03-14 combined score evaluated only at the p. reference section 2.1. To these, we added three additional hits from a more conventional feature statistics system using (6) MFCC Beat Chroma Fragment Clustering • Idea: Build a dictionary of harmonic/melodic fragments by clustering a large corpus Beat tracking Music audio Chroma features Key normalization Locality Sensitive Hash Table Landmark identification • 86 Beatles tracks ⇒ 41,705 patches (12x24) 12-Piggies 22.0-29.6s 12 12 10 10 8 8 chroma bin chroma bin LSH takes ~300 sec High-pass along time Song filter 09-Martha My Dear 90.9-98.6s 6 6 4 4 2 2 5 10 15 20 5 beat Music @ LabROSA - Ellis 10 15 beat 2008-03-14 p. 9 /11 20 MEAPsoft • Music Engineering Art Projects collaboration between EE and Computer Music Center with Douglas Repetto, Ron Weiss, and the rest of the MEAP team • MEAPsoft combines music IR analysis with wacky resequencing algorithms also some neat visualizations... Music @ LabROSA - Ellis 2008-03-14 p. 10/11 Summary • Lots of data + noisy transcription + weak clustering musical insights? • Low-level features Classification and Similarity browsing discovery production Music Structure Discovery modeling generation curiosity Melody and notes Music audio Key and chords Tempo and beat Music @ LabROSA - Ellis 2008-03-14 p. 11 /11