Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011 Overview • Introduction – nontrivial input features for text. • Conditional Random Fields • CdTM and CTRF • Model Inference • Experimental Results Introduction • Topic models such as LDA are not “feature-based” in their inability to efficiently incorporate nontrivial features (contextual or summary features). • Further, they assume a bag-of-words construction, discarding order information that may be important. • The authors propose a model that addresses both feature and independence limitations using a conditional random field (CRF) than a fully generative model. Conditional Random Fields • A conditional random field (CRF) is a way to label and segment structured data that removes independence assumptions imposed by HMMs. • The underlying idea of CRFs is that a sequence of random variables Y is globally conditioned on a sequence of observations X. Image source Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report.. Department of Computer and Information Science, University of Pennsylvania, 2004. Conditional Topic Model • Assume a set of features denoting arbitrary local and global features. • The topic weight vector is defined as where f is a vector of feature functions defined on the features a and Conditional Topic Model • The inclusion of Y is in following sLDA where the topic model regresses to a continuous or discrete response. • is the standard topic distributions over words. • This model does not impose word order dependence. Feature Functions • Consider, for example, the set of word features “positive adjective”, “negative adjective”, “positive adjective with an inverting word”, “negative adjective with an inverting word”, so M=4. • If the word is “good” will yield a feature function vector [ 1 0 0 0]’ while the word “not bad” will yield [ 0 0 0 1]’ • The features are then concatenated depending on the topic assignment of the word . Suppose = h, then the feature f for “good” is a length MK vector: [ 0 0 0 0 | 0 0 0 0 |…| 1 0 0 0 |…| 0 0 0 0 | 0 0 0 0 ]’ k=2 k=h k=K-1 k=K k=1 Conditional Topic Random Fields • The generative process of CTRF for a single document is Conditional Topic Random Fields • The term is a conditional topic random field over the topic assignments of all the words in one sentence and has the form • In the linear chain CTRF, the authors consider both singleton and pairwise feature functions Singleton Pairwise • The cumulative feature function value on a sentence is • The pairwise feature function zero if is assumed to be Model Inference • Inference is performed in a similar variational fashion as in Correlated Topic Models (CRM). • The authors introduce a relaxation of the lower bound due to the introduction of the CRF, although for the univariate CdTM, the variational posterior can be computed exactly. • A close form solution is not available for , so an efficient gradient descent approach is used instead. Empirical Results • The authors use hotel reviews built by crawling TripAdvisor. • The dataset consists of 5000 reviews with lengths between 1500 and 6000 words. The dataset also includes an integer (15) rating for each review. Each rating was represented by 1000 documents. • POS tags were employed to find adjectives. • Noun phrase chunking was used to associate words with good or bad connotations. The authors also extracted whether an inverting word is with 4 words of each adjective. • Lexicon size was 12000 when rare and stop words were removed. Comparison of RatingPrediction Accuracy Equation Source: Blei, D. & McAuliffe, J. Supervised topic models. NIPS, 2007. Topics Ratings and Topics • Here, the authors show that supervised CTRF (sCTRF) shows good separation of rating scores among the topics (top row) compared to MedLDA (bottom row). Feature Weights • Five features were considered: Default–equal to one for any word; Pos-JJ–positive adjective; Neg-JJ–negative adjective; Re-Pos-JJ–positive adjective that has a denying word before it; and Re-Neg-JJ–negative adjective that has a denying word before it. • The default feature dominates when truncated to 5 topics, but becomes less important at higher truncation levels.