SENTIMENTAL ANALYSIS OF BLOGS BY COMBINING LEXICAL KNOWLEDGE WITH TEXT CLASSIFICATION. By Prem Melville, Wojciech Gryc, Richard D. Lawrence 1 Presented By Rohan Kothari PURPOSE Explosion of user- generated content on the web. Blogs Reviews Consumer Forums It is important for companies to monitor the discussion around the product. 2 INTRODUCTION The tracking of such discussion on weblogs can be done using “sentiment analysis”. Sentiment Analysis (Lexical Knowledge) Focuses on the task of identifying whether the piece of text expresses a positive or a negative opinion Relating the opinion to the respective subject matter. 3 ALTERNATE APPROACH Sentiment Analysis (Text Classification) Text Classification is the task of assigning predefined categories to free-text documents. A widespread application of text classification is spam filtering, where email messages are classified into the two categories of spam and non-spam, respectively. 4 SENTIMENT ANALYSIS ON BLOGS We focus on sentiment analysis on a blog which may raise following questions How to identify the subset of blogs that discuss a specific product? How do we detect and characterize specific sentiment expressed about an entity? Having identified a subset of relevant blogs, how do we identify most authoritative or influential bloggers? 5 PROPOSED SOLUTION A generative model that is based on a lexicon of sentiment-laden words, and second model trained on labeled documents. The distributions from these two models are then adaptively pooled to create a composite multinomial Naïve Bayes classifier that captures both sources of information. For the purpose of the paper, sentiment detection is treated as binary polarity classification (+ve & -ve sentiment classes). 6 BASELINE APPROACHES The baseline approaches to using background knowledge in such a classification of documents. Lexical Classification Feature Supervision Lexical Classification Absence of labeled data in domain. Solely relies on lexicon defining polarity of words 7 Given a lexicon of positive and negative terms, a straightforward approach to use this information is to measure the frequency of occurrence of these terms in each document. Probability that a test Document D belongs to the positive class is Here a and b are number of occurrence positive and negative term in the document. 8 A document is then classified as positive if Where t is the classification threshold, otherwise it is classified as negative. In absence of any prior information of relative positivity or negativity of the terms, we use t=0.5 (We assume that document is positive if there are more positive term then negative). 9 FEATURE SUPERVISION Given a representative set of words for each class (i.e. lexicon) we create a representative document for each class containing all the representative words. Then we compute the cosine similarity between each document in the unlabeled set with the representative documents. Then we assign each unlabeled document to the class with the highest similarity, and then train a Naive Bayes classifier using these pseudo labeled examples. 10 POOLING MULTINOMIALS The multinomial Naïve Bayes classifier commonly used for text categorization relies on three assumptions. Documents are produced by mixture model. There is one to one correspondence between each mixture component and class. Each Mixture component is multinomial distribution of words, i.e. given a class , the words in a document are produced independent of each other. 11 The Naïve Bayes classification rule uses Bayes theorem to compute the class membership probabilities of each class P(cj) is the probability of class cj and P(cj|D) is the probability of the document given the class. 12 COMBINING PROBABILITY DISTRIBUTIONS Pooling distributions is a general approach for combining information from multiple sources or experts (probability distribution). We consider two experts Learned based on Labeled training data. A generative model that explains the lexicon. Two approaches to combine probability distributions. Linear opinion pool Logarithmic opinion pool 13 In Linear Opinion pool the aggregate probability is calculated as shown below: K is number of experts. is probability assigned by expert k to word wi occurring in a document of class cj. Weights sum to 1. Combined probability using logarithmic opinion pool. 14 A GENERATIVE BACKGROUND KNOWLEDGE MODEL In this model we only focus on the conditional probabilities of each word given the class. We assume that feature – class associations provided in the lexicon are implicitly arrived by human experts by examining the sentiments of document. The exact values of these conditionals are derived below based on the set of properties these distribution must satisfy. Some important notations are shown below 15 16 Property 1 Since we do not know the relative polarity of terms in dictionary, we assume all positive terms are equally likely to occur in positive document, and the same is true for negative documents. Property 2 If a document Di has α positive terms and β negative terms and a document Dj has β positive terms and α negative terms, we would like Di to be considered as positive document and Dj negative document. 17 Property 3 Since a positive document is more likely to contain a positive term than a negative term, and vice versa. R is the polarity level which measures how much more likely it is for a positive term to occur in positive document compared to negative term. Property 4 Since each component of our mixture model is probability distribution we have following constraints. 18 • Conditions 19 EMPIRICAL EVALUATION Data Set To demonstrate the generality of the approach three very different domains were chosen. Blog discussing enterprise-software products. Political Blogs discussing US Presidential candidates and online movie reviews. Online Movie reviews. 20 RESULTS 21 CONCLUSION An effective framework for incorporating lexical knowledge in supervised learning for text categorization. The results demonstrate that when provided with even a few training examples, a combination of background lexical information with supervised learning can produce better results. 22 THANK YOU!! 23