Perceptually-Inspired Music Audio Analysis Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. 2. 3. 4. 5. http://labrosa.ee.columbia.edu/ Perceptually-Inspired Analysis The Acoustic Structure of Music Music Scene Analysis Large Music Collections Open Issues Music Audio Analysis - Dan Ellis 2012-01-07 1 /55 1. Perceptually-Inspired Analysis • Machine Listening: Describe Automatic Narration Emotion Music Recommendation Classify Environment Awareness ASR Music Transcription “Sound Intelligence” VAD Speech/Music Environmental Sound Speech Dectect Task Extracting useful information from sound Music Audio Analysis - Dan Ellis Music Domain 2012-01-07 2 /55 Listening to Mixtures Bregman ’90 • The world is cluttered sound is transparent mixtures are inevitable • Useful information is structured by ‘sources’ specific definition of a ‘source’: intentional independence Music Audio Analysis - Dan Ellis 2012-01-07 3 /55 Scene Analysis • Detect separate events freq / Hz common onset common harmonicity 8000 6000 Pierce ’83 4000 2000 0 0 1 2 3 4 5 6 7 8 9 time / s instruments & timbre Music Audio Analysis - Dan Ellis 2012-01-07 4 /55 freq / kHz 2. The Acoustic Structure of Music 4 3 2 freq / kHz 1 0 4 3 2 freq / kHz 1 0 4 3 2 freq / kHz 1 0 4 3 2 1 0 0 2 4 6 8 10 • Pitches,Voices, Rhythm Music Audio Analysis - Dan Ellis 12 14 16 18 20 time / sec 2012-01-07 5 /55 Pitch, Harmony, Consonance • Musical intervals relate to harmonic proximity • Pitch Helix Music Audio Analysis - Dan Ellis 2012-01-07 6 /55 Timbre Grey ’75 • The property that distinguishes instruments spectrum, noise, onset, dynamics, ... Music Audio Analysis - Dan Ellis 2012-01-07 7 /55 Rhythm • Periodic “events” perceived as a structure Harnoncourt 40 McKinney & Moelants ’06 30 20 10 0 Frequency 7671 2416 761 240 0 5 10 15 20 25 time / sec 30 hierarchy, swing Music Audio Analysis - Dan Ellis 2012-01-07 8 /55 Sequences & Streaming • Perceptual effects of sequences e.g. streaming frequency TRT: 60-150 ms 1 kHz Δf: –2 octaves time • Music is built of sequences at many different levels Music Audio Analysis - Dan Ellis 2012-01-07 9 /55 Music and Scene Analysis harmonic relations → overlapped harmonics rhythmic playing → synchronized onsets co-ordinated ensembles → mutual dependence of sources Natural Scene 8 city22.wav 7 6 5 4 3 2 1 0 freq / kHz auditory scene analysis freq / kHz • Music appears designed to “defeat” 0 1 2 3 4 Musical Scene 8 5 6 7 8 9 10 6 7 8 9 10 time / sec brodsky.wav 7 6 5 4 3 2 1 0 0 1 2 3 4 5 • Maybe that’s why we like it! Music Audio Analysis - Dan Ellis 2012-01-07 10/55 3. Music Scene Analysis • Interesting music audio analysis tasks are Signal freq / kHz perceptually defined Let it Be (final verse) 4 20 0 2 -20 0 162 what do listeners hear? Melody C5 C4 C3 C2 Piano C5 C4 C3 C2 164 166 168 170 172 level / dB 174 time / s Onsets & Beats G Per-frame chroma E D C 1 0.75 0.5 0.25 0 intensity A Per-beat normalized chroma G E D C A 390 Music Audio Analysis - Dan Ellis 395 400 405 410 415 time / beats 2012-01-07 11/55 Note Transcription • Goal: Recover the score (notes, timing, voices) Frequency 4000 3000 2000 1000 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time musicians can (be trained to) do it • Framework: find the best-matching synthesis parameters? Note events {tk, pk, ik} synthesis Music Audio Analysis - Dan Ellis ? Observations X[k,n] note template 2-D convolution 2012-01-07 12/55 freq / kHz Note Transcription Problems “Oh, I’m just about to lose my mind...” 4 3 noise / multiple f0s 2 1 0 0.5 1 1.5 2 40 20 20 0 0 dB 40 -20 ampl. 2.5 0 1000 2000 3000 4000 -20 0.2 0.2 0 0 0.1 3.5 time / s 0 1000 2000 3000 4000 freq / Hz note segmentation -0.1 -0.2 Voice Activity Detection unclear f0 0.1 -0.1 0.75 3 -0.2 0.76 0.77 0.78 0.79 Music Audio Analysis - Dan Ellis 3.3 3.31 3.32 3.33 3.34 time / s 2012-01-07 13/55 Pitch Templates • Harmonic series as patterns on log-frequency spectrograms * look for largest peak? matched filters can enhance fundamental 0 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 0 0 20 40 = 0 8000 4000 2000 1000 500 250 8000 4000 2000 1000 500 250 0.5 1 Music Audio Analysis - Dan Ellis 1.5 2 2.5 3 3.5 time / s 2012-01-07 14/55 60 80 • Notes generate multiple harmonics in sinusoid analysis find pitches by grouping them? freq / Hz Sinusoid Tracks Maher & Beauchamp ’94 3000 2500 2000 1500 1000 500 0 0 1 2 3 4 time / s • Problems when to “break tracks” 0.06 0.04 0.02 0 0 0.5 1 1.5 time / s how to group (harmonicity, onset) Music Audio Analysis - Dan Ellis 2012-01-07 15/55 Iterative Removal Klapuri ’01, ’06 • At each frame: estimate dominant f0 by checking for harmonics cancel it from spectrum repeat until no f0 is prominent Subtract & iterate audio frame level / dB Harmonics enhancement Predominant f0 estimation f0 spectral smoothing 60 40 dB dB 20 10 0 -20 0 20 30 0 1000 2000 3000 frq / Hz 20 10 10 0 0 0 -10 0 1000 2000 3000 4000 frq / Hz 0 0 100 200 300 1000 2000 3000 frq/Hz f0 / Hz Stop when no more prominent f0s Music Audio Analysis - Dan Ellis 2012-01-07 16/55 Probabilistic Model Goto ’97, ’00 • Generative model: p(x(f )) = w(F, m)p(x(f )|F , m) dF m spectrum = weighted combination of tone models at specific f0s Tone Model (m=2) p(x | F,2, µ(t)(F,2)) p(x,1 | F,2, µ(t)(F,2)) c(t)(1|F,2) c(t)(2|F,2) Tone Model (m=1) p(x | F,1, µ(t)(F,1)) c(t)(1|F,1) c(t)(3|F,2) c(t)(4|F,2) c(t)(2|F,1) F fundamental frequency F+1200 c(t)(3|F,1) c(t)(4|F,1) F+1902 F+2400 x [cent] x [cent] ‘knowledge’ in models & prior distributions for f0 • Is it perceptually relevant? Music Audio Analysis - Dan Ellis 2012-01-07 17/55 Trained Pitch Classifier • Exchange signal models for data Poliner & Ellis ‘05,’06,’07 transcription as pure classification problem feature representation feature vector Training data and features: •MIDI, multi-track recordings, playback piano, & resampled audio (less than 28 mins of train audio). •Normalized magnitude STFT. classification posteriors Classification: •N-binary SVMs (one for ea. note). •Independent frame-level classification on 10 ms grid. •Dist. to class bndy as posterior. hmm smoothing Temporal Smoothing: •Two state (on/off) independent HMM for ea. note. Parameters learned from training data. •Find Viterbi sequence for ea. note. Music Audio Analysis - Dan Ellis 2012-01-07 18/55 Instrument Modeling • Use NMF to model spectrum as templates + activation Grindlay & Ellis ’09, ’11 X=W·H • Eigeninstrument bases constrain instrument spectra Music Audio Analysis - Dan Ellis 2012-01-07 19/55 Rhythm Tracking • Rhythm/Beat tracking has 2 main components: front end: extract ‘events’ from audio back end: find plausible beat sequence to match Audio Onset Event detection Beat marking Beat times etc. Musical knowledge • Other outputs tempo time signature metrical level(s) Music Audio Analysis - Dan Ellis 2012-01-07 20/55 Onset Detection • Simplest thing is energy envelope W/2 2 e(n0 ) = n= W/2 w[n] |x(n + n0 )| 0 |X(f, t)| f f · |X(f, t)| 6 2 60 50 40 30 20 10 0 level / dB f 60 50 40 30 20 10 0 Music Audio Analysis - Dan Ellis Maracatu 4 level / dB emphasis on high frequencies? freq / kHz Harnoncourt 8 0 2 4 6 8 10 time / sec 0 2 4 6 8 10 time / sec 2012-01-07 21/55 Multiband Derivatives • Sometimes energy just “shifts” Puckette et. al ’98 calculate & sum onset in multiple bands use ratio instead of difference - normalize energy o(t) = freq / kHz f |X(f, t)| W (f ) max(0, |X(f, t 1)| 1) bonk~ 8 6 4 2 level / dB 0 60 50 40 30 20 10 0 0 2 4 Music Audio Analysis - Dan Ellis 6 8 10 time / sec 0 2 4 6 8 10 time / sec 2012-01-07 22/55 Phase Deviation Bello et al. ’05 • When amplitudes don’t change much, phase discontinuity may signal new note 0.15 0.1 0.05 0 −0.05 −0.1 8.96 8.97 8.98 8.99 9 9.01 • Can detect by comparing 9.02 actual phase with extrapolation from past X(f, tn ) X̂(f, tn+1 ) = X(f, tn ) X(f, tn 1 ) combine with amplitude... Music Audio Analysis - Dan Ellis 9.03 9.04 9.05 ^ X(f, tn+1) 9.06 Im{X} prediction Complex deviation X(f, tn) ∆Θ ∆Θ X(f, tn-1) Re{X} actual X(f, tn+1) 2012-01-07 23/55 Rhythm Tracking • Earliest systems were rule based Desain & Honing 1999 based on musicology Longuet-Higgins and Lee, 1982 inspired by linguistic grammars - Chomsky input: event sequence (MIDI) output: quarter notes, downbeats Music Audio Analysis - Dan Ellis 2012-01-07 24/55 Tempo Estimation • Perception of beat comes from regular spacing .. the kind of thing we detect with autocorrelation • Pick peak in onset envelope autocorrelation after applying “human preference” window check for subbeat Music Audio Analysis - Dan Ellis Onset Strength Envelope (part) 4 2 0 -2 8 8.5 9 9.5 10 10.5 11 11.5 3 3.5 Raw Autocorrelation 12 time / s 400 200 0 Windowed Autocorrelation 200 100 0 -100 0 0.5 1 1.5 Secondary Tempo Period Primary Tempo Period 2 2.5 lag / s 2012-01-07 25/55 4 Resonators for Beat Tracking • How to address: Scheirer ’98 build-up of rhythmic evidence “ghost events” (audio input) • Reminiscent of a comb filter... resonant filterbank of y(t) = y(t T ) + (1 )x(t) for all possible T Music Audio Analysis - Dan Ellis 2012-01-07 26/55 Multi-Hypothesis Systems • Beat is ambiguous Goto & Muraoka 1994 Goto 2001 Dixon 2001 → develop several alternatives Drumsound finder cy um n e r Compact disc requ pect F s f Manager t Musical audio signals A/D conversion Onsettime finders Higher-level checkers Onsettime vectorizers Frequency analysis Beat information Agents Beat prediction Beat information transmission inputs: music audio outputs: beat times, downbeats, BD/SD patterns... Music Audio Analysis - Dan Ellis 2012-01-07 27/55 Objective Function Optimization Ellis 2007 • Re-cast beat tracking as optimization: Find beat times {ti} to maximize N C({ti }) = N O(ti ) + i=1 F (ti ti 1, p) i=2 O(t) is onset strength function F ( t, ) is tempo consistency score e.g. 2 t F ( t, ) = log (needs tempo for ø) • Looks like an exponential search over all {t } i ... but Dynamic Programming saves us Music Audio Analysis - Dan Ellis 2012-01-07 28/55 Beat Tracking by DP • To optimize C({t }) = N N F (ti O(ti ) + i i=1 ti 1, p) i=2 define C*(t) as best score up to time t then build up recursively (with traceback P(t)) O(t) C*(t) τ t C*(t) = O(t) + max{αF(t – τ, τp) + C*(τ)} τ P(t) = argmax{αF(t – τ, τp) + C*(τ)} τ final beat sequence {ti} is best C* + back-trace Music Audio Analysis - Dan Ellis 2012-01-07 29/55 % <beats> returns the chosen beat sample times. % 2007!06!19 Dan Ellis dpwe@ee.columbia.edu • beatsimple % backlink(time) is best predecessor for this point % cumscore(time) is total cumulated score to this poin backlink = !ones(1,length(localscore)); cumscore = localscore; Search range for previous beat Beat tracking in 15 lines of%prange Matlab = round(!2*period):!round(period/2); function beats = beatsimple(localscore, period, alpha) % beats = beatsimple(localscore, period, alpha) % Core of the DP!based beat tracker % <localscore> is the onset strength envelope % <period> is the target tempo period (in samples) % <alpha> is weight applied to transition cost % <beats> returns the chosen beat sample times. % 2007!06!19 Dan Ellis dpwe@ee.columbia.edu % backlink(time) is best predecessor for this point % cumscore(time) is total cumulated score to this point backlink = !ones(1,length(localscore)); cumscore = localscore; % Search range for previous beat prange = round(!2*period):!round(period/2); % Log!gaussian window over that range txcost = (!alpha*abs((log(prange/!period)).^2)); f o r i = max(!prange + 1):length(localscore) timerange = i + prange; % Search over all possible predecessors % and apply transition weighting Music Audio Analysis Ellis scorecands = txcost -+Dan cumscore(timerange); % Find best predecessor beat % Log!gaussian window over that range txcost = (!alpha*abs((log(prange/!period)).^2)); f o r i = max(!prange + 1):length(localscore) timerange = i + prange; % Search over all possible predecessors % and apply transition weighting scorecands = txcost + cumscore(timerange); % Find best predecessor beat [vv,xx] = max(scorecands); % Add on local score cumscore(i) = vv + localscore(i); % Store backtrace backlink(i) = timerange(xx); end % Start backtrace from best cumulated score [vv,beats] = max(cumscore); % .. then find all its predecessors while backlink(beats(1)) > 0 beats = [backlink(beats(1)),beats]; end 2012-01-07 30/55 Results mean output BPM bonus6 (Jazz): BPM target vs. BPM out 200 150 100 70 400 300 200 150 50 40 100 = 400 = 100 30 = 400 = 100 70 / vary tempo estimate τ Beat accuracy accuracy Beat accuracy 1 0.5 0 1 0.5 0 / of inter-beat-intervals 0.2 0.1 0 / of inter-beat-intervals / vary tradeoff weight α mean output BPM against human tapping data accuracy • Verify bonus1 (Choral): BPM target vs. BPM out 0.2 0.1 30 Music Audio Analysis - Dan Ellis 40 50 70 100 150 200 target BPM 0 70 100 150 200 300 400 target BPM 2012-01-07 31/55 Chord Recognition • Do people hear simultaneous notes or do they learn the sound of chords? music limits the likely combinations chords have a definite “color” Beatles - Beatles For Sale - Eight Days a Week (4096pt) • Recognize chords labeled data available analogous to speech recognition G # 100 E 80 # D 60 # 40 C 20 B 0 intensity # A 16.27 true align recog Music Audio Analysis - Dan Ellis 120 F pitch class instead of notes? # 24.84 time / sec E G D E G DBm E G Bm Bm G G Am Em7 Bm Em7 2012-01-07 32/55 Chord Features: Chroma Fujishima 1999 • Idea: Project all energy onto 12 semitones regardless of octave G F 4 chroma freq / kHz chroma maintains main “musical” distinction invariant to musical equivalence no need to worry about harmonics? 3 G F D 2 D C 1 C A 0 50 100 150 fft bin A 2 NM C(b) = 4 6 8 time / sec B(12 log2 (k/k0 ) 50 100 150 200 250 time / frame b)W (k)|X[k]| k=0 W(k) is weighting, B(b) selects every ~ mod12 Music Audio Analysis - Dan Ellis 2012-01-07 33/55 Chroma Resynthesis • Chroma describes the notes in an octave ... but not the octave • Can resynthesize by presenting all octaves Ellis & Poliner 2007 ... with a smooth envelope “Shepard tones” - octave is ambiguous M 0 -10 -20 -30 -40 -50 -60 12 Shepard tone spectra freq / kHz level / dB b b o+ 12 yb (t) = W (o + ) cos 2 w0 t 12 o=1 Shepard tone resynth 4 3 2 1 0 500 1000 1500 2000 2500 freq / Hz endless sequence illusion Music Audio Analysis - Dan Ellis 0 2 4 6 8 10 time / sec 2012-01-07 34/55 Chroma Resynthesis • Resynthesis illustrates what has been captured can combine with MFCC features for coarse spectrum freq / Hz Let It Be - log-freq specgram (LIB-1) 6000 1400 300 chroma bin Chroma features B A G E D C freq / Hz Shepard tone resynthesis of chroma (LIB-3) 6000 1400 300 freq / Hz MFCC-filtered shepard tones (LIB-4) 6000 1400 300 0 5 Music Audio Analysis - Dan Ellis 10 15 20 25 time / sec 2012-01-07 35/55 Beat-Synchronous Chroma • Store just one chroma frame per beat Bartsch & Wakefield ’01 a compact representation of musical content freq / Hz Let It Be - log-freq specgram (LIB-1) 6000 1400 300 Onset envelope + beat times chroma bin Beat-synchronous chroma B A G E D C freq / Hz Beat-synchronous chroma + Shepard resynthesis (LIB-6) 6000 1400 300 0 5 Music Audio Analysis - Dan Ellis 10 15 20 25 time / sec 2012-01-07 36/55 Chord Recognition System • Analogous to speech recognition Sheh & Ellis ’03 Gaussian models of features for each chord Hidden Markov Models for chord transitions Beat track Audio test 100-1600 Hz BPF Chroma 25-400 Hz BPF Chroma train Labels beat-synchronous chroma features chord labels HMM Viterbi C maj B A G Root normalize Gaussian Unnormalize 24 Gauss models E D C C D E G A B C D E G A B c min B A G E D Resample C b a g Count transitions 24x24 transition matrix f e d c B A G F E D C Music Audio Analysis - Dan Ellis C D EF G A B c d e f g a b 2012-01-07 37/55 Key Normalization • Chord transitions depend on key of piece dominant, relative minor, etc... probabilities should be key-relative estimate main key of piece rotate all chroma features learn models aligned chroma • Chord transition Taxman I'm Only Sleeping G F G F G F D D D C C C A A A C D F G Love You To A A C D F G A Aligned Global model G F G F D D D C C C A A A C D She Said She Said C D F G A Good Day Sunshine G F G F D D D C C C A C D F G C D F G And Your Bird Can Sing G F A F G A A F G C D Yellow Submarine G F A Music Audio Analysis - Dan Ellis Eleanor Rigby A A C D F G A C D F G aligned chroma 2012-01-07 38/55 Chord Recognition freq / Hz • Often works: Audio Let It Be/06-Let It Be 2416 761 G C G G F BeatE synchronous D chroma C B A Recognized 0 A:min 2 a F 4 6 C G F C C G F C 8 10 12 C A:min/b7 F:maj7 C F:maj6 Ground truth chord A:min/b7 F:maj7 240 A:min G G 14 a 16 18 • But not always: Music Audio Analysis - Dan Ellis 2012-01-07 39/55 Eigenrhythms: Drum Pattern Space • Pop songs built on repeating “drum loop” Ellis & Arroyo ‘04 variations on a few bass, snare, hi-hat patterns • Eigen-analysis (or ...) to capture variations? by analyzing lots of (MIDI) data, or from audio • Applications music categorization “beat box” synthesis insight Music Audio Analysis - Dan Ellis 2012-01-07 40/55 Aligning the Data • Need to align patterns prior to modeling... tempo (stretch): by inferring BPM & normalizing downbeat (shift): correlate against ‘mean’ template Music Audio Analysis - Dan Ellis 2012-01-07 41/55 Eigenrhythms (PCA) • Need 20+ Eigenvectors for good coverage of 100 training patterns (1200 dims) • Eigenrhythms both add and subtract Music Audio Analysis - Dan Ellis 2012-01-07 42/55 Posirhythms (NMF) Posirhythm 1 Posirhythm 2 HH HH SN SN BD BD Posirhythm 3 Posirhythm 4 HH HH SN SN BD BD 0.1 Posirhythm 5 Posirhythm 6 HH HH SN SN BD BD 0 50 100 150 200 250 300 350 400 0 -0.1 0 50 100 150 200 250 300 350 samples (@ 200 Hz) beats (@ 120 BPM) • Nonnegative: only adds beat-weight • Capturing some structure... Music Audio Analysis - Dan Ellis 2012-01-07 43/55 Eigenrhythms for Classification • Projections in Eigenspace / LDA space PCA(1,2) projection (16% corr) LDA(1,2) projection (33% corr) 10 6 blues 4 country disco hiphop 2 house newwave rock 0 pop punk -2 rnb 5 0 -5 -10 -20 -10 0 10 -4 -8 -6 -4 -2 0 2 • 10-way Genre classification (nearest nbr): PCA3: 20% correct LDA4: 36% correct Music Audio Analysis - Dan Ellis 2012-01-07 44/55 Eigenrhythm BeatBox • Resynthesize rhythms from eigen-space Music Audio Analysis - Dan Ellis 2012-01-07 45/55 4. Large Music Audio Datasets • Music Information Retrieval (MIR) Bertin-Mahieux et al ’11 is a vibrant new field many commercial opportunities • But: music audio is hard to share copyright owners have been burned researchers use personal collections... • Idea: Million Song Dataset (MSD) commercial scale available to all many different “facets” http://labrosa.ee.columbia.edu/millionsong Music Audio Analysis - Dan Ellis 2012-01-07 46/55 • Features, MSD Facets Lyrics, Tags, Covers, Listeners ... Music Audio Analysis - Dan Ellis 2012-01-07 47/55 MSD Audio Features • Use Echo Nest “Analyze” features segment audio into variable-length “events” freq / Hz represent by 12 chroma + 12 “timbre” TRKUYPW128F92E1FC0 - Tori Amos - Smells Like Teen Spirit 2416 Original 761 240 B A G supports a crude resynthesis: EN Features freq / Hz E D C 2416 Resynth 761 240 0 Music Audio Analysis - Dan Ellis 2 4 6 8 10 12 time / sec 14 2012-01-07 48/55 MSD Metadata EN Metadata Last.fm Tags artist: release: title: id: key: mode: loudness: tempo: time_signature: duration: sample_rate: audio_md5: 7digitalid: familiarity: year: 100.0 – cover 57.0 – covers 43.0 – female vocalists 42.0 – piano 34.0 – alternative 14.0 – singer-songwriter 11.0 – acoustic 8.0 – tori amos 7.0 – beautiful 6.0 – rock 6.0 – pop 6.0 – Nirvana 6.0 – female vocalist 6.0 – 90s 5.0 – out of genre covers 'Tori Amos' 'LIVE AT MONTREUX' 'Smells Like Teen Spirit' 'TRKUYPW128F92E1FC0' 5 0 -16.6780 87.2330 4 216.4502 22050 '8' 5764727 0.8500 1992 SHS Covers %5489,4468, Smells TRTUOVJ128E078EE10 TRFZJOZ128F4263BE3 TRJHCKN12903CDD274 TRELTOJ128F42748B7 TRJKBXL128F92F994D TRIHLAW128F429BBF8 TRKUYPW128F92E1FC0 5.0 4.0 4.0 4.0 4.0 3.0 3.0 3.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 – – – – – – – – – – – – – – – cover songs soft rock nirvana cover Mellow alternative rock chick rock Ballad Awesome Covers melancholic k00l ch1x indie female vocalistist female cover song american MxM Lyric Bag-of-Words Like Teen Spirit Nirvana Weird Al Yankovic Pleasure Beach The Flying Pickets Rhythms Del Mundo feat. Shanade The Bad Plus Tori Amos Music Audio Analysis - Dan Ellis 12 11 10 9 7 6 6 6 hello i a and it are we now 6 6 6 4 4 4 3 3 here us entertain the feel yeah to my 3 3 3 3 3 3 3 3 is with oh out an light less danger 2012-01-07 49/55 Melodic-Harmonic Mining • What can you find in a million songs? Bertin-Mahieux et al. ’10 what characterizes the content? Beat tracking Music audio • Frequent Chroma features Key normalization Landmark identification Locality Sensitive Hash Table #1 (3491) #2 (2775)) #3 (2255) #4 (1241)) #5 (1224)) #6 (1218)) #7 (1092)) #8 (1084)) #9 (1080)) #10 (1035)) # (1021) #11 # (1005)) #12 #13 (974) #14 (942)) #15 (936)) #16 (924)) #17 (920)) #18 (913)) #19 (901)) #20 (897) #21 (887) #22 (882)) #23 (881) #24 (881)) #25 (879)) #26 (875)) #27 (875)) #28 (874)) #29 (868)) #30 (844) #31 (839) #32 (839)) #33 (794) #34 (786)) #35 (785)) #36 (747)) #37 (731)) #38 (714)) #39 (706)) #40 (698) #41 (682) #42 (678)) #43 (675) #44 (657)) #45 (656)) #46 (651)) #47 (647)) #48 (638)) #49 (610)) #50 (593) #51 (592) #52 (591)) #53 (589) #54 (572)) #55 (571)) #56 (550)) #57 (549)) #58 (534)) #59 (534)) #60 (531) #61 (528) #62 (525)) #63 (522) #64 (514)) #65 (510)) #66 (507)) #67 (500)) #68 (497)) #69 (486)) #70 (479) Music Audio Analysis - Dan#71Ellis (476) #72 (468)) #73 (468) #74 (466)) #75 (463)) #76 (454)) #77 (453)) clusters of 12 x 8 binarized eventchroma 2012-01-07 50#80 /55 #78 (448)) #79 (441)) (440) Results - Beatles • Over 86 Beatles tracks • All beat offsets = 41,705 patches LSH takes 300 sec - approx NlogN in patches? • High-pass • Song filter remove hits in same track Music Audio Analysis - Dan Ellis 05-Here There And Everywhere 12.1-20.5s 10 10 8 8 chroma bin 12 6 6 4 4 2 2 09-Martha My Dear 90.9-98.6s 12-Piggies 22.0-29.6s 12 12 10 10 8 8 chroma bin to avoid sustained notes chroma bin along time chroma bin 02-I Should Have Known Better 92.4-97.7s 12 6 6 4 4 2 2 5 10 beat 15 20 5 10 beat 15 20 2012-01-07 51/55 5. Outstanding Issues • Perceptually Inspired? Music Perception is complex: Structure Expectation Memory Enjoyment • Many problems still to solve structure metrical hierarchy music similarity & preference Music Audio Analysis - Dan Ellis 2012-01-07 52/55 Summary • Machine Listening: Getting useful information from sound • Musical sound ... constructed to confound scene analysis? • Transcription tasks ... recover notes, beats, chords etc. • Million Song Dataset for research ... large-scale, multiple facets Music Audio Analysis - Dan Ellis 2012-01-07 53/55 References • • • • • • • • • • • M. A. Bartsch & G. H. Wakefield, “To catch a chorus: Using chroma-based representations for audio thumbnailing,” Proc. IEEE WASPAA, 15-18, Mohonk, 2001. J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, M. B. Sandler, “A Tutorial on Onset Detection in Music Signals,” IEEE Tr. Speech and Audio Proc., 13(5):1035-1047, 2005. T. Bertin-Mahieux, D. Ellis, B. Whitman, & P. Lamere, “The Million Song Dataset,” Int. Symp. Music Inf. Retrieval ISMIR, Miami, 2011. A. Bregman, Auditory Scene Analysis, MIT Press, 1990. P. Desain & H. Honing, “Computational models of beat induction: The rule-based approach,” J. New Music Research, 28(1):29-42, 1999. J. M. Grey, An Exploration of Musical Timbre, Ph.D. Dissertation, Stanford CCRMA, No. STAN-M-2, 1975. S. Dixon, “Automatic extraction of tempo and beat from expressive performances,” J. New Music Research, 30(1):39-58, 2001. D. Ellis, “Beat Tracking by Dynamic Programming,” J. New Music Research, 36(1):51-60, 2007. D. Ellis & J. Arroyo, “Eigenrhythms: Drum pattern basis sets for classification and generation,” Int. Symp. Music Inf. Retrieval ISMIR, 101-106, Barcelona, 2004. D. Ellis & G. Poliner, “Identifying Cover Songs With Chroma Features and Dynamic Programming Beat Tracking,” Proc. ICASSP, IV-1429-1432, Hawai'i, 2007. T. Fujishima, “Realtime chord recognition of musical sound: A system using common lisp music,” Proc. Int. Comp. Music Conf., 464–467, Beijing, 1999. M. Goto, “A Robust Predominant-F0 Estimation Method for Real-time Detection of Melody and Bass Lines in CD Recordings,” Proc. ICASSP, II-757-760, 2000. M. Goto, “An Audio-based Real-time Beat Tracking System for Music With or Without Drumsounds,” J. New Music Research, 30(2):159-171, 2001. Music Audio Analysis - Dan Ellis 2012-01-07 54/55 References • • • • • • • • • G. Grindlay & D. Ellis, “Transcribing Multi-instrument Polyphonic Music with Hierarchical Eigeninstruments”, IEEE J. Sel. Topics in Sig. Process., 5(6):1159-1169, 2011. A. Klapuri, “Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes”, Proc. Int. Symp. Music IR, 216-221, 2006. R. Maher & J. Beauchamp, “Fundamental frequency estimation of musical signals using a two-way mismatch procedure,” J. Acoust. Soc. Am. 95(4):2254-2263, 1994. M. F. McKinney & D. Moelants, “Ambiguity in tempo perception: What draws listeners to different metrical levels?” Music Perception, 24(2):155–166, 2006. J. R. Pierce, The Science of Musical Sound, Scientific American Press, 1983. G. Poliner & D. Ellis, “A Discriminative Model for Polyphonic Piano Transcription”, Eurasip Journal of Advances in Signal Processing, article id 48317, 2007. M. Puckette, T. Apel, D. Zicarelli, “Real-time audio analysis tools for Pd and MSP,” Proc. Int. Comp. Music Conf., Ann Arbor, 109–112, 1998. E. D. Scheirer, “Tempo and beat analysis of acoustic musical signals,” J. Acoust. Soc. Am., 103:588-601, 1998. A. Sheh & D. Ellis, “Chord Segmentation and Recognition using EM-Trained Hidden Markov Models,” Int. Symp. Music Inf. Retrieval ISMIR, 185-191, Baltimore, 2003. Music Audio Analysis - Dan Ellis 2012-01-07 55/55