Online Learning for Latent Dirichlet Allocation Matthew D. Hoffman, David M. Blei and Francis Bach NIPS 2010 Presented by Lingbo Li Latent Dirichlet Allocation (LDA) 1) Draw each topic 2) For each document: 1) Draw topic proportions 2) For each word: 1) Draw 2) Draw Batch variational Bayes for LDA For a collection of documents, infer: • Per-word topic assignment • Per-document topic proportion • topic distributions True posterior Optimize is approximated by over the variational parameters Online variational inference for LDA • Mini-batches: • Hyperparameter estimation: Analysis of convergence Analysis of convergence • Multiply the gradients by the inverse of an appropriate positive definite matrix H to speed up stochastic gradient algorithms. • H: the Fisher information matrix of the variational distribution q Experiments Use perplexity on held-out data as a measure of model: • • • are fit using the E step in algorithm 2; Evaluating learning parameters • Two corpora: 352,549 documents from the journal Nature, and 100,000 documents from the English version Wikipedia. • For each corpus, set aside a 1,000-document test set and a separate 1,000-document validation set. • Run online LDA for five hours on the remaining documents from each corpus for Compare batch and online on fixed corpora: True online