Rohan Kothari

advertisement
SENTIMENTAL ANALYSIS OF
BLOGS BY COMBINING
LEXICAL KNOWLEDGE WITH
TEXT CLASSIFICATION.
By Prem Melville, Wojciech Gryc,
Richard D. Lawrence
1
Presented By
Rohan Kothari
PURPOSE

Explosion of user- generated content on the
web.
 Blogs
 Reviews
 Consumer Forums

It is important for companies to monitor the
discussion around the product.
2
INTRODUCTION

The tracking of such discussion on weblogs can
be done using “sentiment analysis”.

Sentiment Analysis (Lexical Knowledge)

Focuses on the task of identifying whether
the piece of text expresses a positive or a
negative opinion

Relating the opinion to the respective subject
matter.
3
ALTERNATE APPROACH

Sentiment Analysis (Text Classification)
Text Classification is the task of assigning
predefined categories to free-text documents.
A widespread application of text classification
is spam filtering, where email messages are
classified into the two categories of spam and
non-spam, respectively.
4
SENTIMENT ANALYSIS ON BLOGS

We focus on sentiment analysis on a blog
which may raise following questions

How to identify the subset of blogs that
discuss a specific product?

How do we detect and characterize specific
sentiment expressed about an entity?

Having identified a subset of relevant blogs,
how do we identify most authoritative or
influential bloggers?
5
PROPOSED SOLUTION

A generative model that is based on a lexicon
of sentiment-laden words, and second model
trained on labeled documents.

The distributions from these two models are
then adaptively pooled to create a composite
multinomial Naïve Bayes classifier that
captures both sources of information.

For the purpose of the paper, sentiment
detection is treated as binary polarity
classification (+ve & -ve sentiment classes).
6
BASELINE APPROACHES


The baseline approaches to using background
knowledge in such a classification of
documents.

Lexical Classification

Feature Supervision
Lexical Classification

Absence of labeled data in domain.

Solely relies on lexicon defining polarity of words
7



Given a lexicon of positive and negative terms,
a straightforward approach to use this
information is to measure the frequency of
occurrence of these terms in each document.
Probability that a test Document D belongs to
the positive class is
Here a and b are number of occurrence positive
and negative term in the document.
8

A document is then classified as positive if


Where t is the classification threshold, otherwise it
is classified as negative.
In absence of any prior information of relative
positivity or negativity of the terms, we use
t=0.5 (We assume that document is positive if
there are more positive term then negative).
9
FEATURE SUPERVISION

Given a representative set of words for each
class (i.e. lexicon) we create a representative
document for each class containing all the
representative words.

Then we compute the cosine similarity between
each document in the unlabeled set with the
representative documents.

Then we assign each unlabeled document to
the class with the highest similarity, and then
train a Naive Bayes classifier using these
pseudo labeled examples.
10
POOLING MULTINOMIALS

The multinomial Naïve Bayes classifier
commonly used for text categorization relies on
three assumptions.



Documents are produced by mixture model.
There is one to one correspondence between each
mixture component and class.
Each Mixture component is multinomial distribution
of words, i.e. given a class , the words in a
document are produced independent of each other.
11

The Naïve Bayes classification rule uses Bayes
theorem to compute the class membership
probabilities of each class

P(cj) is the probability of class cj and P(cj|D) is
the probability of the document given the class.
12
COMBINING PROBABILITY
DISTRIBUTIONS


Pooling distributions is a general approach for
combining information from multiple sources or
experts (probability distribution).
We consider two experts



Learned based on Labeled training data.
A generative model that explains the lexicon.
Two approaches to combine probability
distributions.


Linear opinion pool
Logarithmic opinion pool
13

In Linear Opinion pool the aggregate probability
is calculated as shown below:




K is number of experts.
is probability assigned by expert k to word
wi occurring in a document of class cj.
Weights
sum to 1.
Combined probability using logarithmic opinion
pool.
14
A GENERATIVE BACKGROUND
KNOWLEDGE MODEL




In this model we only focus on the conditional
probabilities of each word given the class.
We assume that feature – class associations
provided in the lexicon are implicitly arrived by
human experts by examining the sentiments of
document.
The exact values of these conditionals are
derived below based on the set of properties
these distribution must satisfy.
Some important notations are shown below
15
16

Property 1


Since we do not know the relative polarity of terms
in dictionary, we assume all positive terms are
equally likely to occur in positive document, and the
same is true for negative documents.
Property 2

If a document Di has α positive terms and β
negative terms and a document Dj has β positive
terms and α negative terms, we would like Di to be
considered as positive document and Dj negative
document.
17


Property 3

Since a positive document is more likely to contain
a positive term than a negative term, and vice
versa.

R is the polarity level which measures how much
more likely it is for a positive term to occur in
positive document compared to negative term.
Property 4

Since each component of our mixture model is
probability distribution we have following
constraints.
18
•
Conditions
19
EMPIRICAL EVALUATION

Data Set

To demonstrate the generality of the approach
three very different domains were chosen.

Blog discussing enterprise-software products.

Political Blogs discussing US Presidential candidates
and online movie reviews.

Online Movie reviews.
20
RESULTS
21
CONCLUSION


An effective framework for incorporating lexical
knowledge in supervised learning for text
categorization.
The results demonstrate that when provided
with even a few training examples, a
combination of background lexical information
with supervised learning can produce better
results.
22
THANK YOU!!
23
Download