Novelty detection Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University 1 Novelty detection Energy burst • Find the start time (onset) of new events (notes) in the music signal. Short duration Steady state • Onset: single instant chosen to mark the start of the (attack) transient. Unstability 2 Applications • MIR: can be used for segmentation; one layer of rhythmic analysis (e.g. beat). • Computational Musicology: see the groundbreaking work at ÖFAI (1, 2) • Sound manipulation and synthesis: http://www.music.mcgill.ca/~hockman/ projects/ARTMA/index.html • Computational biology?!! http://isophonics.net/content/calcium-signalanalyser 3 Difficulties • extended transient (e.g. violin, flute) • ambiguous events (vibrato, tremolo, glissandi) • polyphonies -> (a)synchronous onsets • perceptual vs physical 4 Difficulties • extended transient (e.g. violin, flute) • ambiguous events (vibrato, tremolo, glissandi) • polyphonies -> (a)synchronous onsets • perceptual vs physical 5 Difficulties • extended transient (e.g. violin, flute) • ambiguous events (vibrato, tremolo, glissandi) • polyphonies -> (a)synchronous onsets • perceptual vs physical 6 Architecture Audio signal Reduction Novelty (detection) function Detection (peak picking) Onsets 7 Time-domain • Onsets: often characterized by an amplitude increase • Envelope following (full-wave rectification + smoothing): 1 E0 (m) = N N/2 ! n=−N/2 |x(n + mh)|w(n) • Squaring instead of rectifying, results in the local energy: 1 E(m) = N N/2 ! 2 (x(n + mh)) w(n) n=−N/2 8 Time-domain 9 Time-domain • We can use the derivative of energy w.r.t. time -> sharp peaks during energy rise • Detectable changes in loudness are proportional to the overall loudness of the sound. ∂E(m)/∂m ∂log(E(m)) = E(m) ∂m • Simulates the ear’s perception of loudness (Klapuri, 1999) 10 Time-domain 11 Frequency-domain • Impulsive noise in time -> wide band noise in frequency • More noticeable in high frequencies. • Linear weighting of Energy N/2 2 ! HF C(m) = |Xk (m)|2 k N k=0 12 Frequency-domain • We can also measure change (flux) in spectral content (Duxbury, 02). N/2 2 ! SF (m) = (|Xk (m)| −| Xk (m − 1)|) N k=0 • Use half-wave rectification to only take energy increases into account SFR (m) = 2 N !N/2 k=0 H (|Xk (m)| −| Xk (m − 1)|) H(x) = (x + |x|)/2 • Use the L2 norm. N/2 2 ! 2 SFRL2 (m) = [H (|Xk (m)| −| Xk (m − 1)|)] N k=0 13 Spectral Flux 14 Phase deviation • The change in instantaneous frequency in bin k can be used to detect onsets (Bello, 03) ∆φk (m − 1) ∆φk (m) N/2 N/2 2 ! !! 2 ! P D(m) = |φk (m)| = |∆φk (m) − ∆φk (m − 1)| N N k=0 k=0 N/2 2 ! = |princarg (φk (m) − 2φk (m − 1) + φk (m − 2)) | N k=0 15 Phase deviation • This function can be improved by weighting frequency bins by their magnitude (Dixon, 06): N/2 2 ! P DW (m) = |Xk (m)φ!!k (m)| N k=0 • and normalized: P DW N (m) = !N/2 !! |X (m)φ k k (m)| k=0 !N/2 k=0 |Xk (m)| 16 Phase deviation 17 Complex domain • We can combine the spectral flux and phase deviation strategies, such that: X̂k (m) = |X̂k (m)|ej φ̂k (m) where, |X̂k (m)| = |Xk (m − 1)| φ̂k (m) = princarg (2φk (m − 1) − φk (m − 2)) such that, CD(m) = 2 N !N/2 k=0 |Xk (m) − X̂k (m)| 18 Complex domain • As before, we can use half-wave rectification to improve the function (Dixon, 06): CD(m) = 2 N where, RCDk (m) = !N/2 k=0 " RCDk (m) Xk (m) 0 X̂k (m) if Xk (m) otherwise Xk (m 1) 19 Complex domain 20 Peak picking • The function is post-processed to facilitate peak picking: • Smoothing -> decrease jaggedness • Normalization -> generalization of threshold values • Thresholding -> eliminate spurious peaks 21 Peak picking • Adaptive thresholding is a more robust choice, typically defined as: δ(m) = α + f (m̄), m − βL ≤ m̄ ≤ m + L • where f can be, e.g., the local mean or median; β increases the window length before the peak; and α is an offset value • Peak picking reduces to selecting local maxima above the threshold 22 Peak picking 23 Comparing detection functions 24 Comparing detection functions 25 Comparing detection functions 26 Benchmarking false negative f- correct c false positive f+ c P = c + f+ c R= c + f− 2P R F = P +R 27 References • Dixon, S. “Onset Detection Revisited”. Proceedings of the 9th International Conference on Digital Audio Effects (DAFx06), Montreal, Canada, 2006. • Collins, N. “A Comparison of Sound Onset Detection Algorithms with Emphasis on Psycho-Acoustically Motivated Detection Functions”. Journal of the Audio Engineering Society, 2005. • Bello , J.P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M. and Sandler, M.B. “A tutorial on onset detection in music signals”. IEEE Transactions on Speech and Audio Processing. 13(5), Part 2, pages 1035-1047, September, 2005. • Bello , J.P., Duxbury, C., Davies, M. and Sandler, M. On the use of phase and energy for musical onset detection in the complex domain. IEEE Signal Processing Letters. 11(6), pages 553-556, June, 2004. • Klapuri, A. “Sound Onset Detection by Applying Psychoacoustic Knowledge”. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Phoenix, Arizona, 1999. 28