Music Information Retrieval for Jazz 1. 2.

Music Information Retrieval for Jazz Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu 1. 2. 3. 4. MIR for Jazz - Dan Ellis http://labrosa.ee.columbia.edu/ Music Information Retrieval Automatic Tagging Musical Content Future Work 2012-11-15 1 /29 Machine Listening • Extracting useful information from sound Describe Automatic Narration Emotion Music Recommendation Classify Environment Awareness ASR Music Transcription “Sound Intelligence” VAD Speech/Music Environmental Sound Speech Dectect Task ... like (we) animals do MIR for Jazz - Dan Ellis Music Domain 2012-11-15 2 /29 1. The Problem • We have a lot of music Can computers help? • Applications archive organization musicological insight? MIR for Jazz - Dan Ellis ◦ music recommendation 2012-11-15 3 /29 Music Information Retrieval (MIR) • Small field that has grown since ~2000 musicologists, engineers, librarians significant commercial interest • MIR as musical analog of text IR find stuff in large archives • Popular tasks genre classification chord, melody, full transcription music recommendation • Annual evaluations “Standard” test corpora - pop MIR for Jazz - Dan Ellis 2012-11-15 4 /29 2. Automatic Tagging • Statistical Pattern Recognition: Finding matches to training examples Sensor signal Pre-processing/ segmentation 1600 segment 1400 Feature extraction 1200 feature vector 1000 Classification 800 • Need: 600 200 Feature design • Applications class 400 600 800 1000 1200 Post-processing ◦ Labeled training examples Genre ◦ Instrumentation ◦ Artist ◦ Studio ... MIR for Jazz - Dan Ellis 2012-11-15 5 /29 Features: MFCC • Mel-Frequency Cepstral Coefficients the standard features from speech recognition 0.5 Sound 0 FFT X[k] spectra −0.5 0.25 0.255 0.26 0.265 0.27 time / s 10 Mel scale freq. warp log |X[k]| audspec 5 0 4 0x 10 1000 2000 3000 freq / Hz 0 5 10 15 freq / Mel 0 5 10 15 freq / Mel 0 10 20 30 quefrency 2 1 0 100 50 IFFT cepstra 0 200 0 Truncate −200 MFCCs MIR for Jazz - Dan Ellis 2012-11-15 6 /29 Representing Audio • MFCCs are short-time features (25 ms) • Sound is a “trajectory” in MFCC space 8 7 6 5 4 3 2 1 0 VTS_04_0001 - Spectrogram 20 10 0 -10 MFCC covariance -20 1 2 3 4 5 6 7 8 9 time / sec 20 18 16 14 12 10 8 6 4 2 1 MIR for Jazz - Dan Ellis MFCC Covariance Matrix 30 2 3 4 5 6 7 8 9 time / sec 50 20 level / dB 18 16 20 15 10 5 0 -5 -10 -15 -20 value MFCC dimension MFCC features MFCC bin Audio freq / kHz • Describe whole track by its statistics 14 12 0 10 8 6 4 2 5 10 15 MFCC dimension 20 -50 2012-11-15 7 /29 MFCCs for Music • Can resynthesize MFCCs by shaping noise gives an idea about the information retained freq / Hz 3644 1770 860 418 203 mel quefrency 12 10 8 6 4 2 freq / Hz Resynthesis MFCCs Original Freddy Freeloader 3644 1770 860 418 203 0 MIR for Jazz - Dan Ellis 10 20 30 40 50 60 time / s 2012-11-15 8 /29 Ground Truth Mandel & Ellis ’08 • MajorMiner: Free-text tags for 10s clips 400 users, 7500 unique tags, 70,000 taggings • Example: drum, bass, piano, jazz, slow, instrumental, saxophone, soft, quiet, club, ballad, smooth, soulful, easy_listening, swing, improvisation, 60s, cool, light MIR for Jazz - Dan Ellis 2012-11-15 9 /29 Classification Ground truth Sound Chop into 10 sec blocks MFCC (20 dims) Mean ∆ ∆2 Covariance Standardize across (60 dims) train set Σ (399 samples) µ C, γ One-vs-all SVM Average Prec. • MFCC features + human ground truth + standard machine learning tools MIR for Jazz - Dan Ellis 2012-11-15 10/29 Classification Results • Classifiers trained from top 50 tags freq / Hz 01 Soul Eyes 2416 1356 761 427 240 135 _90s club trance end drum_bass singing horns punk samples silence quiet noise solo strings indie house alternative r_b funk soft ambient british distortion drum_machine country keyboard saxophone fast instrumental electronica 80s voice beat slow rap hip_hop jazz piano techno dance female bass vocal pop electronic rock synth male guitar drum MIR for Jazz - Dan Ellis 50 100 150 200 250 300 1.5 1 0.5 0 −0.5 −1 −1.5 −2 40 80 120 160 200 240 280 time / s 320 2012-11-15 11/29 3. Musical Content don’t respect pitch pitch is important visible in spectrogram freq / Hz • MFCCs (and speech recognizers) 3644 1770 860 418 203 46 48 50 52 54 time / s • Pitch-related tasks note transcription chord transcription matching by musical content (“cover songs”) MIR for Jazz - Dan Ellis 2012-11-15 12/29 Note Transcription feature representation Poliner & Ellis ‘05,’06,’07 feature vector Training data and features: •MIDI, multi-track recordings, playback piano, & resampled audio (less than 28 mins of train audio). •Normalized magnitude STFT. classification posteriors Classification: •N-binary SVMs (one for ea. note). •Independent frame-level classification on 10 ms grid. •Dist. to class bndy as posterior. hmm smoothing Temporal Smoothing: •Two state (on/off) independent HMM for ea. note. Parameters learned from training data. •Find Viterbi sequence for ea. note. MIR for Jazz - Dan Ellis 2012-11-15 13/29 Polyphonic Transcription • Real music excerpts + ground truth MIREX 2007 Frame-level transcription Estimate the fundamental frequency of all notes present on a 10 ms grid 1.25 1.00 0.75 0.50 0.25 0 Precision Recall Acc Etot Esubs Emiss Efa Note-level transcription Group frame-level predictions into note-level transcriptions by estimating onset/offset 1.25 1.00 0.75 0.50 0.25 0 Precision MIR for Jazz - Dan Ellis Recall Ave. F-measure Ave. Overlap 2012-11-15 14/29 Chroma Features Fujishima 1999 Warren et al. 2003 • Idea: Project onto 12 semitones regardless of octave maintains main “musical” distinction invariant to musical equivalence no need to worry about harmonics? chroma freq / kHz chroma 4 G F G F 3 2 D C 1 A 0 50 100 150 fft bin D C 2 4 6 8 time / sec A 50 100 150 200 250 time / frame NM C(b) = B(12 log2 (k/k0 ) b)W (k)|X[k]| k=0 W(k) is weighting, B(b) selects every ~ mod12 MIR for Jazz - Dan Ellis 2012-11-15 15/29 Chroma Resynthesis Ellis & Poliner 2007 • Chroma describes the notes in an octave ... but not the octave • Can resynthesize by presenting all octaves ... with a smooth envelope “Shepard tones” - octave is ambiguous M 0 -10 -20 -30 -40 -50 -60 12 Shepard tone spectra freq / kHz level / dB b b o+ 12 yb (t) = W (o + ) cos 2 w0 t 12 o=1 Shepard tone resynth 4 3 2 1 0 500 1000 1500 2000 2500 freq / Hz 0 2 4 6 8 10 time / sec endless sequence illusion MIR for Jazz - Dan Ellis 2012-11-15 16/29 Chroma Example • Simple Shepard tone resynthesis can also reimpose broad spectrum from MFCCs 3644 1770 860 418 203 freq / Hz chroma freq / Hz Freddie A G E D C 3644 1770 860 418 203 0 MIR for Jazz - Dan Ellis 10 20 30 40 50 60 time / s 2012-11-15 17/29 • Onset detection Simplest thing is energy envelope Bello et al. 2005 W/2 2 e(n0 ) = n= W/2 w[n] |x(n + n0 )| freq / kHz Harnoncourt Maracatu 8 6 4 f 2 0 level / dB emphasis on high frequencies? |X(f, t)| 60 50 40 30 20 10 level / dB 0 f f · |X(f, t)| 60 50 40 30 20 10 0 MIR for Jazz - Dan Ellis 0 2 4 6 8 10 time / sec 0 2 4 6 8 10 time / sec 2012-11-15 18/29 Tempo Estimation • Beat tracking (may) need global tempo period τ otherwise problem is not “optimal substructure” • Pick peak in onset envelope autocorrelation after applying “human preference” window check for subbeat MIR for Jazz - Dan Ellis Onset Strength Envelope (part) 4 2 0 -2 8 8.5 9 9.5 10 10.5 11 11.5 3 3.5 Raw Autocorrelation 12 time / s 400 200 0 Windowed Autocorrelation 200 100 0 -100 0 0.5 1 1.5 Secondary Tempo Period Primary Tempo Period 2 2.5 lag / s 4 2012-11-15 19/29 Beat Tracking by Dynamic Programming • To optimize N N C({ti }) = F (ti O(ti ) + i=1 ti 1, p) i=2 define C*(t) as best score up to time t then build up recursively (with traceback P(t)) O(t) τ C*(t) t C*(t) = O(t) + max{αF(t – τ, τp) + C*(τ)} τ P(t) = argmax{αF(t – τ, τp) + C*(τ)} τ final beat sequence {ti} is best C* + back-trace MIR for Jazz - Dan Ellis 2012-11-15 20/29 Beat Tracking Results • Prefers drums & steady tempo Soul Eyes 0 0 MIR for Jazz - Dan Ellis 5 100 10 200 300 15 400 500 20 600 25 700 800 time / s 900 period / 4 ms samp 2012-11-15 21/29 Beat-Synchronous Chroma • Record one chroma vector per beat compact representation of harmonies 3644 1770 860 418 203 Resynth B A G E D C 3644 1770 860 418 203 Resynth +MFCC Beat-sync chroma freq / Hz Freddy 3644 1770 860 418 203 0 MIR for Jazz - Dan Ellis 20 10 40 60 20 80 30 100 40 120 50 time / beat 60 time / s 2012-11-15 22/29 • Beat synchronous chroma look like chords G 5 A-C-D-F 0 A-C-E A B-D-G E D C C-E-G chroma bin Chord Recognition ... 10 15 20 time / sec can we transcribe them? • Two approaches manual templates (prior knowledge) learned models (from training data) MIR for Jazz - Dan Ellis 2012-11-15 23/29 Chord Recognition System • Analogous to speech recognition Sheh & Ellis 2003 Gaussian models of features for each chord Hidden Markov Models for chord transitions Beat track Audio test 100-1600 Hz BPF Chroma 25-400 Hz BPF Chroma train Labels beat-synchronous chroma features chord labels HMM Viterbi C maj B A G Root normalize Gaussian Unnormalize 24 Gauss models E D C C D E G A B C D E G A B c min B A G E D Resample C b a g Count transitions 24x24 transition matrix f e d c B A G F E D C MIR for Jazz - Dan Ellis C D EF G A B c d e f g a b 2012-11-15 24/29 Chord Recognition freq / Hz • Often works: Audio Let It Be/06-Let It Be 2416 761 Beatsynchronous chroma G F E D C B A G A:min C G F C C C 0 MIR for Jazz - Dan Ellis G 2 a F 4 C 6 G 8 F 10 C 12 A:min G • But only about 60% of the time Recognized A:min/b7 F:maj7 C F:maj6 Ground truth chord A:min/b7 F:maj7 240 14 G a 16 18 2012-11-15 25/29 What did the models learn? • Chord model centers (means) indicate chord ‘templates’: PCP_ROT family model means (train18) 0.4 DIM DOM7 MAJ MIN MIN7 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 C MIR for Jazz - Dan Ellis 0 D 5 E F 10 G A 15 20 BC 25 (for C-root chords) 2012-11-15 26 /29 Chords for Jazz freq / Hz • How many types? Freddy − logf sgram 2416 761 240 chroma 0 10 20 B A# A G# G F# F E D# D C# C 30 40 50 Freddy − beat−sync chroma 60 time / sec chords a# g# f# e d c A# G# F# E D C chroma Freddy − chord likelihoods + Viterbi path B A# A G# G F# F E D# D C# C MIR for Jazz - Dan Ellis Freddy − chord−based chroma reconstruction 20 40 60 80 100 120 time / beats 2012-11-15 27/29 Future Work • Matching items Between the Bars − Elliot Smith − pwr .25 12 10 8 6 cover songs / standards similar instruments, styles 4 2 120 130 140 150 160 170 Between the Bars − Glenn Phillips − pwr .25 − −17 beats − trsp 2 12 10 8 6 4 2 chroma bin 120 130 140 150 160 170 160 170 pointwise product (sum(12x300) = 19.34) 12 10 8 6 4 2 120 130 140 150 time / beats • Analyzing musical content solo transcription & modeling musical structure • And so much more... MIR for Jazz - Dan Ellis 2012-11-15 28/29 Summary • Finding Musical Similarity at Large Scale Low-level features Classification and Similarity browsing discovery production Music Structure Discovery modeling generation curiosity Melody and notes Music audio Key and chords Tempo and beat MIR for Jazz - Dan Ellis 2012-11-15 29/29

Music Information Retrieval for Jazz 1. 2.

Related documents

Products

Support

Music Information Retrieval for Jazz 1. 2.

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib