Codebook-based Scalable Music Tagging with Poisson Matrix Factorization Laboratory for the Recognition and Organization of Speech and Audio Dawen Liang, John Paisley, Dan Ellis • Columbia University Summary IN THE CITY OF NEW YORK Results Turn music tagging to matrix completion MSD + Last.fm Treat each track as a row (features + bag-of-tags) ✦ Regard the tag information as incomplete ✦ ✦ Tagging vocabulary of 561 words after pre-selection ✦ 371,209 songs for training and 2,757 songs for test ✦ Data spliting + generating scripts available online ✦ Evaluation on both annotation and retrieval Large-scale tagging on MSD ✦ COLUMBIA UNIVERSITY ✦ Stochastic variational inference When we can learn from more data, the model can do better Interpret vector-quantization histogram ✦ Potentially help for other tasks (e.g. recommendation) Music tagging with PMF Poisson matrix factorization (PMF) ✦ Low-rank approximation to the data in high-dimension ✦ Poisson likelihood model Generalized KL-divergence ✦ Gamma prior: induce sparsity Bag-of-tags 5 2 1 3 1 6 1 5 2 6 0 1 3 6 7 1 2 2 4 1 ? ≈ Bag-of-tags { 3 Acoustic features { { Short answer: maybe, but for substantial improvement we need more powerful model ✦ ✕ 1 ? y(N x D) θ(N x K) β(K x D) Incoming data Weights Latent factors Exploit the shared latent structure between the acoustic features and semantic tags θn, k ~ Gamma(a, a) βk,d ~ Gamma(b, b) yn,d ~ Poisson(c·θnTβd) Interpret vector-quantization representation VQ histograms are hard to interpret Look at highly probable words in the bag-of-tags factors to understand what portion of the acoustic codeword space is being captured ✦ ✦ Inference Posterior intractable, variational approximation: p(θ, β | y) ≈ q(θ, β | λ) = q(θ) q(β) ✦ Minimize KL(q || p), closed-form updates ✦ { Acoustic features Bag-of-tags β(K x D) Latent factors { { Songs If we have even more data, can we do better? { Acoustic features Songs ✦ Implicitly downweight “0”, esp. for implicit feedback Fun fact: “prog rock” and “songs over 20 minutes long” are in the same factor Stochastic variational inference ✦ ✦ ✦ Process massive data by small mini-batches Pre-condition the gradient by inverse Fisher information matrix Natural gradient step, fast convergence Python code available: http://github.com/dawenl/stochastic_PMF