Presentation

advertisement
Conditional Topic Random Fields
Jun Zhu and Eric P. Xing
(@cs.cmu.edu)
ICML 2010
Presentation and Discussion by Eric Wang
January 12, 2011
Overview
• Introduction – nontrivial input features for text.
• Conditional Random Fields
• CdTM and CTRF
• Model Inference
• Experimental Results
Introduction
• Topic models such as LDA are not “feature-based” in their
inability to efficiently incorporate nontrivial features
(contextual or summary features).
• Further, they assume a bag-of-words construction, discarding
order information that may be important.
• The authors propose a model that addresses both feature and
independence limitations using a conditional random field
(CRF) than a fully generative model.
Conditional Random Fields
• A conditional random field (CRF) is a way to label and
segment structured data that removes independence
assumptions imposed by HMMs.
• The underlying idea of CRFs is that a sequence of random
variables Y is globally conditioned on a sequence of
observations X.
Image source
Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report.. Department of Computer
and Information Science, University of Pennsylvania, 2004.
Conditional Topic Model
• Assume a set of features
denoting arbitrary local and global
features.
• The topic weight vector is defined
as
where f is a vector of feature
functions defined on the features a
and
Conditional Topic Model
• The inclusion of Y is in following
sLDA where the topic model
regresses to a continuous or discrete
response.
•
is the standard topic
distributions over words.
• This model does not impose word
order dependence.
Feature Functions
• Consider, for example, the set of word features “positive
adjective”, “negative adjective”, “positive adjective with an
inverting word”, “negative adjective with an inverting word”,
so M=4.
• If the word is “good” will yield a feature function vector
[ 1 0 0 0]’
while the word “not bad” will yield
[ 0 0 0 1]’
• The features are then concatenated depending on the topic
assignment of the word
. Suppose = h, then the feature f
for “good” is a length MK vector:
[ 0 0 0 0 | 0 0 0 0 |…| 1 0 0 0 |…| 0 0 0 0 | 0 0 0 0 ]’
k=2
k=h
k=K-1 k=K
k=1
Conditional Topic Random Fields
• The generative process of CTRF for a single document is
Conditional Topic Random Fields
• The term
is a conditional topic random field over
the topic assignments of all the words in one sentence and has
the form
• In the linear chain CTRF, the authors consider both singleton
and pairwise feature functions
Singleton
Pairwise
• The cumulative feature function value on a sentence is
• The pairwise feature function
zero if
is assumed to be
Model Inference
• Inference is performed in a similar variational fashion as in
Correlated Topic Models (CRM).
• The authors introduce a relaxation of the lower bound due to
the introduction of the CRF, although for the univariate CdTM,
the variational posterior can be computed exactly.
• A close form solution is not available for , so an efficient
gradient descent approach is used instead.
Empirical Results
• The authors use hotel reviews built by crawling TripAdvisor.
• The dataset consists of 5000 reviews with lengths between
1500 and 6000 words. The dataset also includes an integer (15) rating for each review. Each rating was represented by
1000 documents.
• POS tags were employed to find adjectives.
• Noun phrase chunking was used to associate words with good
or bad connotations. The authors also extracted whether an
inverting word is with 4 words of each adjective.
• Lexicon size was 12000 when rare and stop words were
removed.
Comparison of RatingPrediction Accuracy
Equation Source:
Blei, D. & McAuliffe, J. Supervised topic models. NIPS, 2007.
Topics
Ratings and Topics
• Here, the authors show that supervised CTRF (sCTRF) shows
good separation of rating scores among the topics (top row)
compared to MedLDA (bottom row).
Feature Weights
• Five features were considered: Default–equal to one for any
word; Pos-JJ–positive adjective; Neg-JJ–negative adjective;
Re-Pos-JJ–positive adjective that has a denying word before
it; and Re-Neg-JJ–negative adjective that has a denying word
before it.
• The default feature dominates when truncated to 5 topics, but
becomes less important at higher truncation levels.
Download