Matthew D. Hoffman, David M. Blei, Perry R. Cook
Presented by Lu Ren
Electrical and Computer Engineering
Duke University
Outline
Introduction
Breaking audio spectrograms into separate sources of sound previous work
Identifying individual instruments and notes
Predicting hidden or distorted signals
Source separation
Specifying the number of sources---Bayesian Nonparametric
Gamma Process Nonnegative Matrix Factorization (GaP-NMF)
Computational challenge: non-conjugate pairs of distributions
• favor for spectrogram data, not for computational convenience
• bigger variational family analytic coordinate ascent algorithm
GaP-NMF Model
Observation: Fourier power sepctrogram of an audio signal
: M by N matrix of nonnegative reals
: power at time window n and frequency bin m
A window of
2(M-1) samples
DFT
Squared magnitude in each frequency bin
Keep only the first M bins
Assume K static sound sources
: describe these sources is the average amount of energy source k exhibits at frequency m
: amplitude of each source changing over time is the gain of source k at time n
GaP-NMF Model
Mixing K sound sources in the time domain (under certain assumptions), spectrogram is distributed 1
Infer both the characters and number of latent audio sources
: trunction level
1 Abdallah & Plumbley (2004) and Fevotte et al. (2009)
GaP-NMF Model
drawn from a gamma process
Number of elements greater than some is finite almost surely:
If is sufficiently large relative to , only a few elements of
θ are substantially greater than 0.
Setting :
Variational Inference
Variational distribution: expanded family
Generalized Inverse-Gaussian (GIG): denotes a modified Bessel function of the second kind
Gamma family is a special case of the GIG family where ,
Variational Inference
Lower bound of GaP-NMF model:
If :
GIG family sufficient statistics:
Gamma family sufficient statistics:
Variational Inference
The likelihood term expands to:
With Jensen’s inequality:
Variational Inference
With a first order Taylor approximation:
: an arbitrary positive point
Variational Inference
Tightening the likelihood bound
Optimizing the variational distributions
For example:
Evaluation
Compare GaP-NMF to two variations:
1. Finite Bayesian model
2. Finite non-Bayesian model
Itakura-Saito Nonnegative Matrix Factorization (IS-NMF)
: maximize the likelihood in the above fomula
Compare with another two NMF algorithms:
EU-NMF: minimize the sum of the squared Euclidean distance
KL-NMF: minimize the generalized KL-divergence
1. Synthetic Data
Evaluation
Evaluation
2. Marginal Likelihood & Bandwidth Expansion
Evaluation
3. Blind Monophonic Source Separation
Conclusions
Related work
Bayesian nonparametric model GaP-NMF
Applicable to other types of audio