Opinion Mining: A Multifaceted Problem

advertisement
Opinion Mining : A Multifaceted Problem
Lei Zhang
University of Illinois at Chicago
Some slides are based on Prof. Bing
Liu’s presentation
Introduction



Most text information processing methods (e.g. web
search, text mining) work with factual information
but not deal with opinion information.
Opinion Mining
Computational study
expressed in text
of
opinions,
sentiments
Why opinion mining now? mainly because of the
Web, we can get huge volumes of opinionated text
Why opinion mining is important

Whenever we need to make a decision, we would
like to hear other’s advice.
In the past.



Individual : Friends or family.
Business : Surveys and consultants.
Word of mouth on the Web
People can express their opinions in reviews, forum
discussions, blogs…
A popular problem

Intellectually challenging & major applications
A popular research topic in recent years in NLP (Natural
Language Processing) and Web data mining
 A lot of companies in US.



It touches every aspect of NLP and is well-scoped.
Potentially it would be a major application for NLP
But this problem is NOT easy.
An example review
“I bought an iPhone a few days ago. It was such a nice
phone. The touch screen was really cool. The voice
quality was clear too. Although the battery life was
not long, that is ok for me. However, my mother was
mad with me as I did not tell her before I bought the
phone. She also thought the phone was too expensive,
and wanted me to return it to the shop. …”

What we see?
Opinions, targets of opinions, and opinion holders
Target entity

Definition (entity): an entity e is a product, person,
event or organization. e is represented as


a hierarchy of components, sub-components, and so on.
Each node represents a component and is associated
with a set of attributes of the component.
An opinion can be expressed on any node or attribute
of the node.
 To simplify our discussion, we use the term features
to represent both components and attributes.

What is an opinion

An opinion is a quintuple
(ej, fjk, soijkl, hi, tl),
where
 ej is a target entity.
 fjk is a feature of the entity ej.
 soijkl is the sentiment value of the opinion of the
opinion holder hi on feature fjk of entity ej at time
tl. soijkl is +ve, -ve, or neu, or a more granular
rating.
 hi is an opinion holder.
 tl is the time when the opinion is expressed.
Opinion mining objective
 Objective:

Discover all quintuples (ej, fjk, soijkl, hi, tl),


i.e., mine the five corresponding pieces of information
in each quintuple, and
Or, solve some simpler problems.
 With

given an opinionated document,
the quintuples,
Unstructured Text  Structured Data
Traditional data and visualization tools can be used to
slice, dice and visualize the results in all kinds of ways
 Enable qualitative and quantitative analysis.

Sentiment classification: doc-level
 Classify
a document (e.g., a review) based on the
overall sentiment expressed by opinion holder


Classes: positive, or negative (and neutral)
It assumes
Each document focuses on a single entity and
contains opinions from a single opinion holder.
Subjectivity analysis : sentence-level
 Sentence-level

sentiment analysis has two tasks:
Subjectivity classification: Subjective or objective.
Objective: e.g., “I bought an iPhone a few days ago.”
 Subjective: e.g., “It is such a nice phone.”


Sentiment classification: For subjective sentences
or clauses, classify positive or negative.
Positive: e.g., “It is such a nice phone.”
 Negative: e.g., “The screen is bad.”

Feature-based sentiment analysis
 Sentiment
classification at both document and
sentence (or clause) levels are NOT sufficient,

they do not tell what people like and/or dislike
A positive opinion on an entity does not mean that the
opinion holder likes everything.
 An negative opinion on an entity does not mean that
the opinion holder dislikes everything.

Feature-based opinion summary
“I bought an iPhone a few days
ago. It was such a nice phone.
The touch screen was really cool.
The voice quality was clear too.
Although the battery life was not
long, that is ok for me. However,
my mother was mad with me as
I did not tell her before I bought
the phone. She also thought the
phone was too expensive, and
wanted me to return it to the
shop. …”
….
Feature based summary:
Feature1: Touch screen
Positive: 212
 The touch screen was really cool.
 The touch screen was so easy to use and
can do amazing things.
…
Negative: 6
 The screen is easily scratched.
 I have a lot of difficulty in removing
finger marks from the touch screen.
…
Feature2: battery life
…
Visual comparison

Summary of
reviews of
Cell Phone 1
+
_
Voice

Comparison of
reviews of
+
Cell Phone 1
Cell Phone 2
_
Screen
Battery Size Weight
Feature-based opinion summary
Opinion mining is challenging
“This past Saturday, I bought a Nokia phone and
my girlfriend bought a Moto phone with
Bluetooth. We called each other when we got
home. The voice on my phone was not so clear,
worse than my previous phone. The battery life
was long. My girlfriend was quite happy with her
phone. I wanted a phone with good sound quality.
So my purchase was a real disappointment. I
returned the phone yesterday.”
Opinion mining is a multifaceted problem
 (ej, fjk,
soijkl, hi, tl),
ej - an entity: Named entity extraction (more)
 fjk - a feature of ej: Information extraction
 soijkl is sentiment: Sentiment determination
 hi is an opinion holder: Information/Data
Extraction
 tl is the time: Data Extraction

 Co-reference
resolution
 Relation extraction
 Synonym match (voice = sound quality) …
Entity extraction (competing entities)


An entity can be a product, service,
organization or event in opinion document.
person,
“This past Saturday, I bought a Nokia phone and my
girlfriend bought a Moto phone with Bluetooth.”
Nokia and Moto(Motorola) are entities.
Why we need entity extraction


Without knowing the entity, the piece of opinion
has little value.
Companies want to know the competitors in the
market. This is the first step to understand the
competitive landscape from opinion documents.
Related work
Named entity recognition (NER)
Aims to identity entities such as names of persons,
organizations and locations in natural language text.
Our problem is similar to NER problem, but with
some differences.
1.
2.
3.
4.
5.
Fine grained entity classes (products, service) rather
than coarse grained entity classes (people, location,
organization )
Only want a specific type: e.g. a particular type of drug
names.
Neologism : e.g. “Sammy” (Sony) , “SE” (Sony-Ericsson)
Feature sparseness (lack of contextual patterns)
Data noise (over-capitalization , under-capitalization)
NER methods
 Supervised
learning methods
The current dominant technique for addressing the
NER problem
Hidden Markov Models (HMM)
Maximum Entropy Models (ME)
Support Vector Machines (SVM)
Conditional Random Field (CRF)
Shortcomings:
Rely on large sets of labeled examples. Labeling is
labor-intensive and time-consuming.
NER methods
 Unsupervised
learning methods
Mainly clustering. Gathering named entities from
clustered groups based on the similarity of
context. The techniques rely on lexical resources
(e.g., WordNet), on lexical patterns and on
statistics computed on a large unannotated
corpus.
Shortcomings:
low precision and recall for the result
NER methods
 Semi-supervised
learning methods
Show promise for identifying and labeling entities.
Starting with a set of seed entities, semi-supervised
methods use either class specific patterns to populate
an entity class or distributional similarity to find
terms similar to the seeds.
Specific methods:
Bootstrapping
Co-traning
Distributional similarity
Set expansion problem



To find competing entities, the extracted entities
must be relevant, i.e., they must be of the same
class/type as the user provided entities.
The user can only provide a few names because there
are so many different brands and models.
Our problem is actually a set expansion problem,
which expands a set of given seed entities.
Set expansion problem


Given a set Q of seed entities of a particular class C,
and a set D of candidate entities, we wish to
determine which of the entities in D belong to C.
That is, we “grow” the class C based on the set of
seed examples Q.
This is a classification problem. However, in
practice, the problem is often solved as a ranking
problem.
Distributional similarity



Distributional similarity is classical method for set
expansion problem.
It compares the similarity of the word distribution of
the surround words of a candidate entity and the
seed entities, and then ranking the candidate
entities based on the similarity values.
Our experiment shows this approach is inaccurate.
Positive and unlabeled learning model
(PU learning model)



A two-class classification model.
Given a set P of positive examples of a particular
class and a set U of unlabeled examples (containing
hidden positive and negative cases), a classifier is
built using P and U for classifying the data in U or
future test cases.
The set expansion problem can be mapped into PU
learning exactly.
S-EM algorithm




S-EM is an algorithm under PU learning model.
It is based on Naïve Bayes classification and
Expectation Maximum (EM) algorithm.
The main idea of S-EM is to use spy technique to
identify some reliable negatives (RN) from the
unlabeled set U, and then use an EM algorithm to
learn from P, RN and U-RN .
We use classification score to rank entities.
S-EM algorithm
(Liu et.al, ICML 2002)
Our algorithm
(Li, Zhang, et al., ACL 2010)
Given positive set P and unlabelled set U, S-EM
produces a Bayesian classifier C, which is used to
classify each vector u  U and to assign a probability
p (+|u) to indicate the likelihood that u belongs to
the positive class.
Entity ranking

Rank candidate d : Let Md be the median of {P(+|Vector
1), P(+|Vector 2), P(+|Vector 3), ……, P(+|Vector n)}.
The final score (fs) for d is defined as:
fs (d )=Md * log ( 1 + n )
Where n is the frequency count of candidate entity d in
the corpus.


A high fs (d) implies a high likelihood that d is in the
expanded entity set.
Candidate entities with higher median score and
higher frequency count in the corpus will be ranked
high.
Feature extraction
“The voice on my phone was not so clear, worse than
my previous phone. The battery life was long”
Feature indicators
(1) Dependency relation

Opinions words modify object
features,
e.g.,
“This camera takes great pictures”
Exploits the dependency relations of
Opinions and features to extract
Features.
Given a set of seed opinion words (no
feature input), we can extract
features and also opinion words
iteratively.
Extraction rules
Feature extraction
(2) Part-whole relation pattern
A part-whole pattern indicates one object is part of another
object. It is a good indicator for features if the class concept
word (the “whole” part) is known.
(3) “No” pattern
a specific pattern for product review and forum posts.
People often express their comments or opinions on
features by this short pattern (e.g. no noise)
Feature ranking
Rank extracted feature candidates by feature importance. If
a feature candidate is correct and important, it should be
ranked high. For unimportant feature or noise, it should be
ranked low.
Two major factors affecting the feature importance.
Feature relevance: it describes how possible a feature
candidate is a correct feature.
Feature frequency: a feature is important, if appears
frequently in opinion documents.
HITS algorithm for feature relevance
There is a mutual enforcement relation between opinion
words, part-whole relation and “no” patterns and features. If an
adjective modifies many correct features, it is highly possible to
be a good opinion word. Similarly, if a feature candidate can be
extracted by many opinion words, part-whole patterns, or “no”
pattern, it is also highly likely to be a correct feature. The Web
page ranking algorithm HITS is applicable.
Our algorithm ( Zhang, et al., COLING 2010)
(1)Extract features by dependency relation, part-whole pattern etc.
(2)Compute feature score using HITS without considering
frequency.
(3)The final score function considering the feature frequency
S = S(f) * log (freq(f))
freq(f) is the frequency count of feature f. and S(f) is the authority
score of feature f.
Identify opinion orientation

For each feature, we identify the sentiment or opinion
orientation expressed by a reviewer.

Almost all approaches make use of opinion words and
phrases(Lexicon-based method).
Some opinion words have context independent
orientations, e.g., “great”.
 Some other opinion words have context dependent
orientations, e.g., “short”
 Many ways to use opinion words.


Machine learning methods for sentiment classification
at the sentence and clause levels are also applicable.
Aggregation of opinion words




Input: a pair (f, s), where f is a feature and s is a sentence
that contains f.
Output: whether the opinion on f in s is positive, negative,
or neutral.
Two steps:
 Step 1: split the sentence if needed based on BUT words
(but, except that, etc).
 Step 2: work on the segment sf containing f. Let the set
of opinion words in sf be w1, .., wn. Sum up their
orientations (1, -1, 0), and assign the orientation to (f, s)
accordingly.
wi .o
n
Step 2 can be changed to
i1 d (w , f )
i
with better results. wi.o is the opinion orientation of wi.
d(wi, f) is the distance from f to wi.
Basic opinion rules

(Liu, Ch. in NLP Handbook)
Negation rules: A negation word or phrase usually reverses
the opinion expressed in a sentence. Negation words include “no”
“not”, etc.
e.g. “ this cellphone is not good.”



But-clause rules: A sentence containing “but” also needs
special treatment. The opinion before “but” and after “but” are
usually the opposite to each other. Phrases such as “except that”
“except for” behave similarly.
e.g. “ I love Nicholas Cage but I really have no desire to see the
Sorcerer’s Apprentice ”
More…
Two main types of opinion

Direct Opinions: direct sentiment expressions on
some entity or feature


e.g., “the picture quality of this camera is great.”
Comparative Opinions: Comparisons expressing
similarities or differences of more than one entity or
feature . Usually stating an ordering or preference.

e.g., “car x is cheaper than car y.”
Comparative opinions

Gradable

Non-Equal Gradable: Relations of the type greater or less
than
e.g: “optics of camera A is better than that of camera B”

Equative: Relations of the type equal to
e.g: “camera A and camera B both come in 7MP ”

Superlative: Relations of the type greater or less than all
others
e.g: “camera A is the cheapest camera available in
market”
Mining comparative opinions
(Jindal and Liu, SIGIR 2006; Ding, Liu, Zhang, KDD 2009)

Objective: Given an opinionated document d,. Extract
comparative opinions:
(O1, O2, F, po, h, t),
where O1 and O2 are the object sets being compared
based on their shared features F, po is the preferred
object set of the opinion holder h, and t is the time when
the comparative opinion is expressed.
Thank you
Download