Searching for Similar Phrases in Music Audio Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Engineering, Columbia University, NY USA http://labrosa.ee.columbia.edu/ 1. 2. 3. 4. Motivation: Similar Phrases Phrase Matching System Experiments Conclusions & Future Similar Phrases in Music - Ellis 2007-12-18 p. 1 /12 1. Motivation: Similar Phrases • Idea: Music is a sequence of reused pieces e.g. melodic runs, chord sequences, ... • Can we identify them in large music databases? ... which we have i.e. machine learning • Applications classification and matching of pieces compressed representation data-driven musicology Similar Phrases in Music - Ellis 2007-12-18 p. 2 /12 Common Phrase Discovery Beat tracking Music audio Chroma features Key normalization Landmark identification Locality Sensitive Hash Table • Chop up music into short descriptions of musical content 24-beat beat-chroma matrices? • Choose a few that appear to be “starts” • Put into LSH table (similar items fall in same bin) • Find the bins with most entries Similar Phrases in Music - Ellis 2007-12-18 p. 3 /12 2. Phrase Matching: Beat Tracking Ellis ’06,’07 • Goal: One feature vector per ‘beat’ (tatum) for tempo normalization, efficiency • “Onset Strength Envelope” freq / mel sumf(max(0, difft(log |X(t, f)|))) 40 30 20 10 0 0 5 10 time / sec 15 • Autocorr. + window → global tempo estimate 0 168.5100BPM 200 0 300 Similar Phrases in Music - Ellis 400 500 600 700 800 900 1000 lag / 4 ms samples 2007-12-18 p. 4 /12 Chroma Features • Chroma features convert spectral energy into musical weights in a canonical octave i.e. 12 semitone bins IF chroma chroma 3 2 G F D C 1 A 0 2 4 6 8 10 100 time / sec 200 300 400 500 600 700 time / frames • Can resynthesize as “Shepard Tones” 0 -10 -20 -30 -40 -50 -60 all octaves at once 12 Shepard tone spectra freq / kHz level / dB Piano scale freq / kHz Piano chromatic scale 4 Shepard tone resynth 4 3 2 1 0 500 1000 1500 2000 2500 freq / Hz Similar Phrases in Music - Ellis 0 2 4 6 2007-12-18 8 10 time / sec p. 5 /12 Key Estimation • Covariance of chroma reflects key • Normalize by transposing for best fit single Gaussian model of one piece find ML rotation of other pieces model all transposed pieces iterate until convergence aligned chroma Taxman Eleanor Rigby I'm Only Sleeping G F G F G F D D D C C C A A A A Love You To C D F G A Aligned Global model G F G F D D D C C C A A A C D She Said She Said C D F G A Good Day Sunshine G F G F D D D C C C A C D F G C D F G And Your Bird Can Sing G F A F G A A F G C D Yellow Submarine G F A Similar Phrases in Music - Ellis Ellis ICASSP ’07 A A C D F G 2007-12-18 A p. 6 /12 C D F G aligned chroma Landmark Location • Looking for “beginnings” of phrases e.g. abrupt change in harmony, instruments, etc. use likelihood ratio test: weighted windows either side of boundary vs. all • Choose top 10 freq / kHz Come Together - Spectrogram, Beat-sync chromogram, and top 10 segment points 4 3 locally-normalized peaks 2 1 0 12 chroma bin .. to control data size 5 10 8 6 4 2 0 Similar Phrases in Music - Ellis 50 100 2007-12-18 150 p. 7 /12 200 time / sec 250 Locality Sensitive Hashes • Goal: Quantize high-dimensional data so ‘similar’ items fall into same bin .. for fast and scalable nearest-neighbor search • Idea: Multiple random scalar projections each one will tend to keep neighbors nearby items close together in all projections are probably neighbors from Slaney & Casey ‘08 Similar Phrases in Music - Ellis 2007-12-18 p. 8 /12 • Data 3. Experiments “artist20” - 20 artist x 6 albums = 1413 tracks (up to) 10 landmarks/track = 14,078 patches each patch = 12 chroma bins x 24 beats (288 dims) • Performance 90 80 70 60 count feature calculation: ~ 60 min LSH 14k NNs: ~ 30 sec 51 patches have >10 NNs within r = 2.0 # neighbors within 2.0 - 14078 a20 patches 100 50 40 30 20 10 0 Similar Phrases in Music - Ellis 0 5 10 15 20 # near neighbors 2007-12-18 25 30 p. 9 /12 35 40 Results - artist20 green day 02-Blood Sex And Booze 122.3-129.8s 12 12 10 10 8 8 chroma bin chroma bin radiohead 01-You 192.7-198.5s 6 6 4 4 2 2 radiohead 11-Genchildren hidden 30.9-35.0s 12 12 10 10 8 8 chroma bin chroma bin radiohead 07-Ripcord 177.4-182.5s 6 6 4 4 2 2 5 10 beat 15 20 5 10 beat 15 20 mainly sustained notes Similar Phrases in Music - Ellis 2007-12-18 p. 10/12 Results - Beatles • Only the 86 Beatles tracks • All beat offsets = 41,705 patches LSH takes 300 sec - approx NlogN in patches? • High-pass Song filter remove hits in same track 12 10 10 8 8 chroma bin 12 6 6 4 4 2 2 09-Martha My Dear 90.9-98.6s chroma bin • to avoid sustained notes 05-Here There And Everywhere 12.1-20.5s 12-Piggies 22.0-29.6s 12 12 10 10 8 8 chroma bin along time chroma bin 02-I Should Have Known Better 92.4-97.7s 6 6 4 4 2 2 Similar Phrases in Music - Ellis 5 10 beat 15 20 5 2007-12-18 10 beat 15 p. 11/12 20 Summary / Conclusions Beat tracking Music audio Chroma features Key normalization Locality Sensitive Hash Table Landmark identification of data • Lots find motifs by counting near neighbors patterns • Common e.g. melodic/harmonic-beat sequences • Future different features and/or pre-emphasis better landmark points complete dictionary Similar Phrases in Music - Ellis 2007-12-18 p. 12/12