m - NYU

advertisement
Novelty detection
Juan Pablo Bello
MPATE-GE 2623 Music Information Retrieval
New York University
1
Novelty detection
Energy burst
• Find the start time (onset)
of new events (notes) in
the music signal.
Short duration
Steady state
• Onset: single instant
chosen to mark the start
of the (attack) transient.
Unstability
2
Applications
• MIR: can be used for segmentation; one layer of rhythmic analysis (e.g. beat).
• Computational Musicology: see the groundbreaking work at ÖFAI (1, 2)
• Sound manipulation and synthesis: http://www.music.mcgill.ca/~hockman/
projects/ARTMA/index.html
• Computational biology?!! http://isophonics.net/content/calcium-signalanalyser
3
Difficulties
• extended transient (e.g. violin,
flute)
• ambiguous events (vibrato,
tremolo, glissandi)
• polyphonies -> (a)synchronous
onsets
• perceptual vs physical
4
Difficulties
• extended transient (e.g. violin,
flute)
• ambiguous events (vibrato,
tremolo, glissandi)
• polyphonies -> (a)synchronous
onsets
• perceptual vs physical
5
Difficulties
• extended transient (e.g. violin,
flute)
• ambiguous events (vibrato,
tremolo, glissandi)
• polyphonies -> (a)synchronous
onsets
• perceptual vs physical
6
Architecture
Audio signal
Reduction
Novelty (detection) function
Detection
(peak picking)
Onsets
7
Time-domain
• Onsets: often characterized by an amplitude increase
• Envelope following (full-wave rectification + smoothing):
1
E0 (m) =
N
N/2
!
n=−N/2
|x(n + mh)|w(n)
• Squaring instead of rectifying, results in the local energy:
1
E(m) =
N
N/2
!
2
(x(n + mh)) w(n)
n=−N/2
8
Time-domain
9
Time-domain
• We can use the derivative of energy w.r.t. time -> sharp peaks during energy
rise
• Detectable changes in loudness are proportional to the overall loudness of
the sound.
∂E(m)/∂m
∂log(E(m))
=
E(m)
∂m
• Simulates the ear’s perception of loudness (Klapuri, 1999)
10
Time-domain
11
Frequency-domain
• Impulsive noise in time ->
wide band noise in frequency
• More noticeable in high
frequencies.
• Linear weighting of Energy
N/2
2 !
HF C(m) =
|Xk (m)|2 k
N
k=0
12
Frequency-domain
• We can also measure change (flux) in spectral content (Duxbury, 02).
N/2
2 !
SF (m) =
(|Xk (m)| −| Xk (m − 1)|)
N
k=0
• Use half-wave rectification to only take energy increases into account
SFR (m) =
2
N
!N/2
k=0
H (|Xk (m)| −| Xk (m − 1)|)
H(x) = (x + |x|)/2
• Use the L2 norm.
N/2
2 !
2
SFRL2 (m) =
[H (|Xk (m)| −| Xk (m − 1)|)]
N
k=0
13
Spectral Flux
14
Phase deviation
• The change in instantaneous frequency in bin k can be used to detect onsets
(Bello, 03)
∆φk (m − 1)
∆φk (m)
N/2
N/2
2 ! !!
2 !
P D(m) =
|φk (m)| =
|∆φk (m) − ∆φk (m − 1)|
N
N
k=0
k=0
N/2
2 !
=
|princarg (φk (m) − 2φk (m − 1) + φk (m − 2)) |
N
k=0
15
Phase deviation
• This function can be improved by weighting frequency bins by their
magnitude (Dixon, 06):
N/2
2 !
P DW (m) =
|Xk (m)φ!!k (m)|
N
k=0
• and normalized:
P DW N (m) =
!N/2
!!
|X
(m)φ
k
k (m)|
k=0
!N/2
k=0 |Xk (m)|
16
Phase deviation
17
Complex domain
• We can combine the spectral flux and phase deviation strategies, such that:
X̂k (m) = |X̂k (m)|ej φ̂k (m)
where,
|X̂k (m)| = |Xk (m − 1)|
φ̂k (m) = princarg (2φk (m − 1) − φk (m − 2))
such that,
CD(m) =
2
N
!N/2
k=0
|Xk (m) − X̂k (m)|
18
Complex domain
• As before, we can use half-wave rectification to improve the function (Dixon,
06):
CD(m) =
2
N
where,
RCDk (m) =
!N/2
k=0
"
RCDk (m)
Xk (m)
0
X̂k (m)
if Xk (m)
otherwise
Xk (m
1)
19
Complex domain
20
Peak picking
• The function is post-processed to facilitate peak picking:
• Smoothing -> decrease jaggedness
• Normalization -> generalization of threshold values
• Thresholding -> eliminate spurious peaks
21
Peak picking
• Adaptive thresholding is a more robust choice, typically defined as:
δ(m) = α + f (m̄),
m − βL ≤ m̄ ≤ m + L
• where f can be, e.g., the local mean or median; β increases the window
length before the peak; and α is an offset value
• Peak picking reduces to selecting local maxima above the threshold
22
Peak picking
23
Comparing detection functions
24
Comparing detection functions
25
Comparing detection functions
26
Benchmarking
false
negative
f-
correct
c
false
positive
f+
c
P =
c + f+
c
R=
c + f−
2P R
F =
P +R
27
References
• Dixon, S. “Onset Detection Revisited”. Proceedings of the 9th International
Conference on Digital Audio Effects (DAFx06), Montreal, Canada, 2006.
• Collins, N. “A Comparison of Sound Onset Detection Algorithms with Emphasis on
Psycho-Acoustically Motivated Detection Functions”. Journal of the Audio
Engineering Society, 2005.
• Bello , J.P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M. and Sandler, M.B. “A
tutorial on onset detection in music signals”. IEEE Transactions on Speech and
Audio Processing. 13(5), Part 2, pages 1035-1047, September, 2005.
• Bello , J.P., Duxbury, C., Davies, M. and Sandler, M. On the use of phase and energy
for musical onset detection in the complex domain. IEEE Signal Processing Letters.
11(6), pages 553-556, June, 2004.
• Klapuri, A. “Sound Onset Detection by Applying Psychoacoustic Knowledge”. IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP),
Phoenix, Arizona, 1999.
28
Download