Extracting Information from Music Audio Dan Ellis

Extracting Information from Music Audio Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Engineering, Columbia University, NY USA http://labrosa.ee.columbia.edu/ 1. Learning Music 2. Melody Extraction 3. Music Similarity Music Info Extraction - Ellis 2005-09-20 p. 1 /25 1. Learning from Music • A lot of music data available e.g. 60G of MP3 ≈ 1000 hr of audio, 15k tracks • What can we do with it? implicit definition of ‘music’ • Quality vs. quantity Speech recognition lesson: 10x data, 1/10th annotation, twice as useful • Motivating Applications music similarity / classification computer (assisted) music generation insight into music Music Info Extraction - Ellis 2005-09-20 p. 2 /25 Ground Truth Data File: /Users/dpwe/projects/aclass/aimee.wav music data available manual annotation is much rarer 7000 6500 6000 5500 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 t 0:02 0:04 f 9 Printed: Tue Mar 11 13:04:28 • A lot of unlabeled Hz 0:06 0:08 0:10 0:12 • Unsupervised structure discovery possible .. but labels help to indicate what you want • Weak annotation sources mus artist-level descriptions symbol sequences without timing (MIDI) errorful transcripts • Evaluation requires ground truth limiting factor in Music IR evaluations? Music Info Extraction - Ellis 2005-09-20 p. 3 /25 0:14 vox 0:16 0:18 mu Talk Roadmap 4 Similarity/ recommend'n Anchor models Music audio 1 Semantic bases Melody extraction Drums extraction 2 3 Event extraction Music Info Extraction - Ellis Fragment clustering Eigenrhythms Synthesis/ generation ? 2005-09-20 p. 4 /25 2. Melody Transcription with Graham Poliner • Audio → Score very desirable for data compression, searching, learning • Full solution is elusive signal separation of overlapping voices music constructed to frustrate! • Simplified problem: “Dominant Melody” at each time frame Frequency 4000 3000 2000 1000 0 0 0.5 1 1.5 2 2.5 3 3.5 4 Music Info Extraction - Ellis 4.5 5 Time 2005-09-20 p. 5 /25 Break tracks - need to detect new ‘onset’ at single frequencies 500 time / s Conventional Transcription 0.06 0.04 0.02 0 0 0 0 1 2 3 4 • Pitched notes have harmonic spectra 0.5 →1 transcribe by searching for harmonics 1.5 time / s ram Modeling e.g. sinusoid modeling + grouping Group by onset & freq / Hz common 3000 harmonicity find sets2500 of tracks that start ut-signal around 2000 the same time 1500 - + stable1000 harmonic pattern 500 Pass on to 0constraint‘onset’ 0 1 2 3 based filtering... es APR - Dan Ellis e/s 4 time / s • Explicit expert-derived knowledge L10 - Music Analysis Music Info Extraction - Ellis 2005-04-06 - 5 2005-09-20 p. 6 /25 Transcription as Classification • Signal models typically used for transcription harmonic spectrum, superposition • But ... trade domain knowledge for data transcription as pure classification problem: Audio Trained classifier p("C0"|Audio) p("C#0"|Audio) p("D0"|Audio) p("D#0"|Audio) p("E0"|Audio) p("F0"|Audio) single N-way discrimination for “melody” per-note classifiers for polyphonic transcription Music Info Extraction - Ellis 2005-09-20 p. 7 /25 Melody Transcription Features • Short-time Fourier Transform Magnitude (Spectrogram)                                           • Standardize over 50 pt frequency window  Music Info Extraction - Ellis    2005-09-20   p. 8 /25   Training Data • Need {data, label} pairs for classifier training • Sources: freq / kHz pre-mixing multitrack recordings + hand-labeling? synthetic music (MIDI) + forced-alignment? 2 1.5 1 0.5 30 0 20 2 10 1.5 0 1 0.5 0 0 0.5 1 Music Info Extraction - Ellis 1.5 2 2.5 3 3.5 time / sec 2005-09-20 p. 9 /25 Melody Transcription Results • Trained on 17 examples .. plus transpositions out to +/- 6 semitones SMO SVM (Weka) • Tested on ISMIR MIREX 2005 set includes foreground/background detection Example... Music Info Extraction - Ellis 2005-09-20 p. 10/25 Melody Clustering • Goal: Find ‘fragments’ that recur in melodies .. across large music database .. trade data for model sophistication Training data Melody extraction 5 second fragments VQ clustering Top clusters • Data sources pitch tracker, or MIDI training data • Melody fragment representation DCT(1:20) - removes average, smoothes detail Music Info Extraction - Ellis 2005-09-20 p. 11/25 Melody clustering results • Clusters match underlying contour: • Some interesting matches: e.g. Pink + Nsync Music Info Extraction - Ellis 2005-09-20 p. 12/25 3. Music Similarity • Can we predict which songs with Mike Mandel and Adam Berenzweig “sound alike” to a listener? .. based on the audio waveforms? many aspects to subjective similarity • Applications query-by-example automatic playlist generation discovering new music • Problems the right representation modeling individual similarity Music Info Extraction - Ellis 2005-09-20 p. 13/25 Music Similarity Features • Need “timbral” features: log-domain          auditory-like frequency warping  Mel-Frequency Cepstral Coeffs (MFCCs)           discrete  cosine     transform   orthogonalization  Music Info Extraction - Ellis     2005-09-20   p. 14/25   Timbral Music Similarity • Measure similarity of feature distribution i.e. collapse across time to get density p(xi) compare by e.g. KL divergence • e.g. Artist Identification learn artist model p(xi | artist X) (e.g. as GMM) classify unknown song to closest model Training Artist 1 MFCCs GMMs KL Artist 2 Min Artist KL Test Song Music Info Extraction - Ellis 2005-09-20 p. 15/25 “Anchor Space” • Acoustic features describe each song .. but from a signal, not a perceptual, perspective .. and not the differences between songs • Use genre classifiers to define new space prototype genres are “anchors” n-dimensional vector in "Anchor Space" Anchor Anchor Audio Input (Class i) p(a1|x) AnchorAnchor Anchor Audio Input (Class j) |x) p(a2n-dimensional vector in "Anchor Space" GMM Modeling Similarity Computation p(a1|x)p(an|x) p(a2|x) Anchor Conversion to Anchorspace GMM Modeling KL-d, EMD, etc. p(an|x) Conversion to Anchorspace Music Info Extraction - Ellis 2005-09-20 p. 16/25 Anchor Space • Frame-by-frame high-level categorizations 0 0.6 0.4 0.2 Electronica fifth cepstral coef compare to raw features? Anchor Space Features Cepstral Features 0 0.2 0.4 0.6 madonna bowie 0.8 1 0.5 0 third cepstral coef 5 10 15 madonna bowie 15 0.5 properties in distributions? dynamics? Music Info Extraction - Ellis 2005-09-20 10 Country p. 17/25 5 ‘Playola’ Similarity Browser Music Info Extraction - Ellis 2005-09-20 p. 18/25 Ground-truth data • Hard to evaluate Playola’s ‘accuracy’ user tests... ground truth? • “Musicseer” online survey: ran for 9 months in 2002 > 1,000 users, > 20k judgments http://labrosa.ee.columbia.edu/ projects/musicsim/ Music Info Extraction - Ellis 2005-09-20 p. 19/25 Evaluation • Compare Classifier measures against Musicseer subjective results “triplet” agreement percentage Top-N ranking agreement score: ! " 13 1 αr = 2 N si = ∑ αrrαkcr r=1 αc = α2r First-place agreement percentage Top rank agreement test - simple significance 80 70 60 SrvKnw 4789x3.58 % 50 SrvAll 6178x8.93 40 GamKnw 7410x3.96 30 GamAll 7421x8.92 20 10 0 cei cmb erd Music Info Extraction - Ellis e3d opn kn2 rnd ANK 2005-09-20 p. 20/25 Using SVMs for Artist ID • Support Vector Machines (SVMs) find hyperplanes in a high-dimensional space relies only on matrix of distances between points much ‘smarter’ than nearest-neighbor/overlap want diversity of reference vectors... (w  x) + b = + 1 (w x) + b = –1 x1 x2 yi = – 1 y i = +1 w (w x) + b = 0 Music Info Extraction - Ellis 2005-09-20 p. 21/25 Song-Level SVM Artist ID • Instead of one model per artist/genre, use every training song as an ‘anchor’ then SVM finds best support for each artist Training MFCCs Song Features Artist 1 D D D Artist DAG SVM Artist 2 D D D Test Song Music Info Extraction - Ellis 2005-09-20 p. 22/25 Artist ID Results • ISMIR/MIREX 2005 also evaluated Artist ID • 148 artists, 1800 files (split train/test) from ‘uspop2002’ • Song-level SVM clearly dominates using only MFCCs! e 4: Results of the formal MIREX 2005 Audio Artist ID evaluation (USPOP2002) from http://www.musicMIREX 05 Audio Artist (USPOP2002) /evaluation/mirex-results/audio-artist/. Rank 1 2 3 4 5 6 7 Participant Mandel Bergstra Pampalk West Tzanetakis Logan Lidy Raw Accuracy Normalized 68.3% 68.0% 59.9% 60.9% 56.2% 56.0% 41.0% 41.0% 28.6% 28.5% 14.8% 14.8% Did not complete Music Info Extraction - Ellis References -Julien Aucouturier and Francois Pachet. Improving Runtime / s 10240 86400 4321 26871 2443 ? 2005-09-20 p. 23/25 John C. Platt, Nello Cristianini, and John Shawe-Ta Large margin dags for multiclass classification. In Playlist Generation • SVMs are well suited to “active learning” solicit labels on items closest to current boundary • Automatic player with “skip” = Ground truth data collection active-SVM automatic playlist generation Music Info Extraction - Ellis 2005-09-20 p. 24/25 Conclusions Similarity/ recommend'n Anchor models Semantic bases Music audio Melody extraction Drums extraction Event extraction Fragment clustering Synthesis/ generation Eigenrhythms ? • Lots of data + noisy transcription + weak clustering musical insights? Music Info Extraction - Ellis 2005-09-20 p. 25/25

Extracting Information from Music Audio Dan Ellis

Related documents

Products

Support

Extracting Information from Music Audio Dan Ellis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib