Bayesian Nonparametric Matrix Factorization for Recorded Music Authors: Matthew D. Hoffman, David M. Blei, Perry R. cook Princeton University, Department of Computer Science, 35 olden St., Princeton, NJ, 08540 USA Reading Group Presenter: Shujie Hou Cognitive Radio Institute Friday, October 15, 2010 Outline ■ Introduction ■ Terminology ■ Problem statement and contribution of this paper ■ Gap-NMF Model(Gamma Process Nonnegative Matrix Factorization ) ■ Variational Inference ■ Definition ■ Variational Objective Function ■ Coordinate Ascent Optimization ■ Other Approaches ■ Evaluation Terminology(1) ■ Nonparametric Statistics: □ The term non-parametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance. ■ Nonnegative Matrix Factorization: □ Non-negative matrix factorization (NMF) is a group of algorithms in multivariate analysis and linear algebra where a matrix, is factorized into (usually) two matrices with all elements are greater than or equal to 0 X WH The above two definitions are cited from Wikipedia Terminology(2) ■ Variational Inference: □ Variational inference approximates the posterior distribution with a simpler distribution, whose parameters are optimized to be close to the true posterior. ■ Mean-field Variational Inference: □ In mean-field variational inference, each variable is given an independent distribution, usually of the same family as its prior. Outline ■ Introduction ■ Terminology ■ Problem statement and Contribution of this Paper ■ Gap-NMF Model ■ Variational Inference ■ Definition ■ Variational Objective Function ■ Coordinate Ascent Optimization ■ Other Approaches ■ Evaluation Problem Statement and Contribution ■ Research Topic: □ Breaking audio spectrograms into separate sources of sound using latent variable decompositions. E.g., matrix factorization. ■ A potential problem : □ The number of latent variables must be specified in advance which is not always possible. ■ Contribution of this paper □ The paper develops Gamma Process Nonnegative Matrix Factorization (GaP-NMF), a Bayesian nonparametric approach to decompose spectrograms. Outline ■ Introduction ■ Terminology ■ Problem statement and Contribution of this Paper ■ Gap-NMF Model ■ Variational Inference ■ Definition ■ Variational Objective Function ■ Coordinate Ascent Optimization ■ Other Approaches ■ Evaluation Dataset on GaP-NMF Model ■ What are given is a M by N matrix X in which Xmn is the power of audio signal at time window n and frequency bin m. If the number of latent variable is specified in advance: ■ Assuming the audio signal is composed of K static sound sources. The problem is to decompose X WH , in which W is M by K matrix, H is K by N matrix. In which cell Wmk is the average amount of energy source k exhibits at frequency m. cell H kn is the gain of source k at time n. ■ The problem is solved by GaP-NMF Model If the number of latent variable is not specified in advance: ■ GaP-NMF assumes that the data is drawn according to the following generative process: Based on the formula that (Abdallah&Plumbley (2004)) GaP-NMF Model If the number of latent variable is not specified in advance: ■ GaP-NMF assumes that the data is drawn according to the following generative process: The overall gain of the corresponding source l Used to control the number of latent variables Based on the formula that (Abdallah&Plumbley (2004)) GaP-NMF Model Kingman ,1993 ■ The number of nonzero is the number of the latent variables K. ■ If L increased towards infinity, the nonzero L which expressed by K is finite and obeys: Outline ■ Introduction ■ Terminology ■ Problem statement and Contribution of this Paper ■ Gap-NMF Model ■ Variational Inference ■ Definition ■ Variational Objective Function ■ Coordinate Ascent Optimization ■ Other Approaches ■ Evaluation Definition of Variational Inference ■ Variational inference approximates the posterior distribution with a simpler distribution, whose parameters are optimized to be close to the true posterior. ■ Under this paper’s condition: Posterior Distribution What measured Definition of Variational Inference ■ Variational inference approximates the posterior distribution with a simpler distribution, whose parameters are optimized to be close to the true posterior. ■ Under this paper’s condition: Variational Distribution Posterior Distribution Approximates Variational distribution assumption with free parameters What measured Definition of Variational Inference ■ Variational inference approximates the posterior distribution with a simpler distribution, whose parameters are optimized to be close to the true posterior. ■ Under this paper’s condition: Variational Distribution Adjust Parameters Posterior Distribution Approximates Variational distribution assumption with free parameters What measured Outline ■ Introduction ■ Terminology ■ Problem statement and Contribution of this Paper ■ Gap-NMF Model ■ Variational Inference ■ Definition ■ Variational Objective Function ■ Coordinate Ascent Optimization ■ Other Approaches ■ Evaluation Variational Objective Function ■ Assume each variable obeys the following Generalized Inverse-Gaussian (GIG) family: Variational Objective Function ■ Assume each variable obeys the following Generalized Inverse-Gaussian (GIG) family: It is Gamma family Variational Objective Function ■ Assume each variable obeys the following Generalized Inverse-Gaussian (GIG) family: It is Gamma family Denotes a modified Bessel function of the second kind Deduction(1) From Jordan et al., 1999 ■ The difference between the left and right sides is the Kullback-Leibler divergence between the true posterior and the variational distribution q. ■ Kullback-Leibler divergence : for probability distributions P and Q of a discrete random variable their K–L divergence is defined to be Deduction(2) Deduction(2) Using Jensen’s inequality Objective function ■ L= Bounded by ■ The objective function becomes + ■ Maximize the objective function defined above with the corresponding parameters. ■ The distribution is obtained: ■ Because these three distributions are independent, we gain approximates Outline ■ Introduction ■ Terminology ■ Problem statement and Contribution of this Paper ■ Gap-NMF Model ■ Variational Inference ■ Definition ■ Variational Objective Function ■ Coordinate Ascent Optimization ■ Other Approaches ■ Evaluation Coordinate Ascent Algorithm(1) ■ The derivative of the objective function with respect to variational parameters equals to zero to obtain: ■ Similarly: Coordinate Ascent Algorithm(2) ■ Using Lagrange multipliers, then the bound parameters become ■ Then updating bound parameters and variational parameters according to equations 14,15,16,17 and18 to ultimately reaching a local minimum. Outline ■ Introduction ■ Terminology ■ Problem statement and Contribution of this Paper ■ Gap-NMF Model ■ Variational Inference ■ Definition ■ Variational Objective Function ■ Coordinate Ascent Optimization ■ Other Approaches ■ Evaluation Other Approaches ■ ■ ■ ■ Finite Bayesian Model ( also called GIG-NMF). Finite Non-Bayesian Model. EU-Nonnegative Matrix Factorization. KL-Nonnegative Matrix Factorization. Outline ■ Introduction ■ Terminology ■ Problem statement and Contribution of this Paper ■ Gap-NMF Model ■ Variational Inference ■ Definition ■ Variational Objective Function ■ Coordinate Ascent Optimization ■ Other Approaches ■ Evaluation Evaluation on Synthetic data(1) ■ The data is generated according to the following model: Evaluation on Synthetic data(2) Evaluation on Recorded Music Conclusion ■ Gap-NMF model is capable of determining the number of latent source automatically. ■ The key step of the paper is to use variational distribution to approximate posterior distribution. ■ Gap-NMF can work well on analyzing and processing recorder music, it can be applicable to other types of audio. ■Thank you!