Dawen Liang, John Paisley, Dan Ellis • Columbia University Python

advertisement
Codebook-based Scalable Music Tagging
with Poisson Matrix Factorization
Laboratory for the Recognition and
Organization of Speech and Audio
Dawen Liang, John Paisley, Dan Ellis • Columbia University
Summary
IN THE CITY OF NEW YORK
Results
Turn music tagging to matrix completion
MSD + Last.fm
Treat each track as a row (features + bag-of-tags)
✦ Regard the tag information as incomplete
✦
✦
Tagging vocabulary of 561 words after pre-selection
✦ 371,209 songs for training and 2,757 songs for test
✦ Data spliting + generating scripts available online
✦ Evaluation on both annotation and retrieval
Large-scale tagging on MSD
✦
COLUMBIA UNIVERSITY
✦
Stochastic variational inference
When we can learn from more data, the model can do better
Interpret vector-quantization histogram
✦
Potentially help for other tasks (e.g. recommendation)
Music tagging with PMF
Poisson matrix factorization (PMF)
✦
Low-rank approximation to the data in high-dimension
✦ Poisson likelihood model Generalized KL-divergence
✦
Gamma prior: induce sparsity
Bag-of-tags
5
2
1
3
1
6
1
5
2
6
0
1
3
6
7
1
2
2
4
1
?
≈
Bag-of-tags
{
3
Acoustic
features
{
{
Short answer: maybe, but for substantial improvement we
need more powerful model
✦
✕
1
?
y(N x D)
θ(N x K)
β(K x D)
Incoming data
Weights
Latent factors
Exploit the shared latent structure
between the acoustic features and
semantic tags
θn, k ~ Gamma(a, a)
βk,d ~ Gamma(b, b)
yn,d ~ Poisson(c·θnTβd)
Interpret vector-quantization representation
VQ histograms are hard to interpret
Look at highly probable words in the bag-of-tags factors to
understand what portion of the acoustic codeword space is
being captured
✦
✦
Inference
Posterior intractable, variational approximation:
p(θ, β | y) ≈ q(θ, β | λ) = q(θ) q(β)
✦ Minimize KL(q || p), closed-form updates
✦
{
Acoustic
features Bag-of-tags
β(K x D)
Latent factors
{
{
Songs
If we have even more data, can we do better?
{
Acoustic
features
Songs
✦
Implicitly downweight “0”, esp. for implicit feedback
Fun fact: “prog rock” and
“songs over 20 minutes
long” are in the same factor
Stochastic variational inference
✦
✦
✦
Process massive data by small mini-batches
Pre-condition the gradient by inverse Fisher information matrix
Natural gradient step, fast convergence
Python code available: http://github.com/dawenl/stochastic_PMF
Download