Opinion mining, sentiment orientation, pronoun resolution, reviews

advertisement
Dependency Parsing for Sentiment Classification in Feature
Opinion Mining
Maheshwar
Sapna Bhatt
Department of CSE
MVN University, Palwal
Department of CSE
MVN University, Palwal
maheshwar1524@gmail.com
sapnabhatt.1993@gmail.com
ABSTRACT
Almost all people want to receive more and more information
about the products, before they purchase them. Therefore, they
ask their friends, search on net and then decide to buy a
product. As there is tremendous increase in e-commerce,
almost every company provides a customer feedback data
form on its website. Many sites emphasis on participation of
users, more and more Websites, such as Amazon, Epinions,
UCI lead people to write their opinion about products they are
interested in. So, the number of product reviews from
customer is also increasing. Therefore it becomes impossible
for manufacturers to read every review for analyzing the
product. In this paper, we have used POS tagging for each
sentence, then extract features and summarize the product.
Our system gives good accuracy as we have worked coreference (pronoun) resolution before summarizing the
semantic orientation for all features. Firstly we have find
features of the product and then reduced them using Word Net
Similarity for grouping similar features.
Keywords
Opinion mining, sentiment orientation, pronoun resolution,
reviews summarization, word net similarity, text mining.
1. INTRODUCTION
As the internet users are increasing day by day there is
tremendous increase in e-commerce, for example purchasing
the products on Web. The explosion of social media has
created lots of opportunities for people to publicly share their
opinions and reviews, but has created serious problems when
it comes to making of decision from these opinions.
Manufacturers and citizens don’t have an effective technique
to analyze this mass conversation and interact meaningfully
with thousands of others. Now a day, almost every company
provides a customer feedback data form on its website, so that
a customer can give their opinion about that product.
Therefore, before purchasing any product people want to
receive more and more information about that product, it can
be either from their friends, customer reviews on internet.
They take the opinion of other people or web contents that
include customer reviews and blogs that express opinions on
products and services – which are collectively referred to as
customer feedback data on the Web. After reading the
customer feedback data the customer decides whether they
should buy it or not. Moreover customer feedback data not
helps customer for finding reviews but also the company
whose is making the product that is for its marketing and
product development plans.
The star rating assigned to a product may not give a lot of
information to a customer. Instead, the customer has to read
all the reviews to differentiate which of them tells positive and
negative aspects of product. There is various traditional
sentiment analysis approaches have been proposed to tackle
this challenge up to some extent. But, most of the classical
sentiment analysis techniques fail to identify the product
features liked or disliked by the customers. These techniques
divide feedback data of customers into two classes – positive
or negative [9]. Our task is different from traditional text
summarization [1], [2], [3] in a number of ways.
The mining opinion is not only associated with the topic of
document but also it expresses the opinion of the customer
[4].Sentiment analysis has been used for many purposes;
including predicting the outcome of an election [5].
In this paper, we present an opinion mining system which
uses semantic similarity cum analysis of text to identify key
information components from text documents. Our approach
is novel as we have done pronoun resolution of product
features form reviews, before finding the semantic orientation.
After feature extraction, similar features are grouped using
word-net similarity to remove the duplicity of features..
Finally we find out the polarity is expressed as numerical
score sentiment analysis algorithm [6]. Our task is performed
in following steps:
1. POS tagging [26] of each sentence is done using Stanford
tagger to identify features of the product.
2. Using word-net similarity for grouping similar features.
3. Using syntactic parse tree and then finding pronoun being
referred i.e. pronoun (co-reference) resolution
4. Creating Dependency relations for feature-opinion pair
extraction.
5. Estimating the semantic orientation of feature-opinion
phrase and assigning the given review to a class (positive,
negative, neutral)
6. Summarizing the results. This step aggregates the results
of previous steps and presents them.
The remaining paper is described in the following manner:
Section 2 presents related works on opinion mining and
analyzing sentiments. Section 3 presents the architectural
details of proposed opinion mining system. Section 4
describes the evaluation of the feature and opinion extraction
process. Finally, section 5 concludes the paper with possible
enhancements to the proposed system.
data) Ri = {R1, R2, ..,Rm}. For each review Rj, it may consist
of some sentimental sentences about the corresponding
product‟s features. Therefore, let F = {F1, F2,….,Fn}.
2. RELATED WORK
Problem Definition: Given a set of reviews for different
manufacturers‟ products P.
Firstly, the task is to identify and group the features words f
for each aspect A. Secondly, determine the pair of each aspect
and its sentiment O = (A,S) be a set of aspects of product,
such as price, battery life and keypad, touch etc.
2.1 Feature Extraction
Opinion mining is basically concerned with identifying
opinion words from reviews i.e. nice, good, bad, beautiful,
and great. Many researchers have worked on mining such
words and identifying their semantic orientations. Wenhao
Zhang, Hua, Wei Wan [7], extracts feature and groups explicit
features by using morphene based method but they have not
considered the multi-meaningful words and co-reference
resolution. Muhammad Abulaish, Jahiruddin, MN Doja,
Tanvir Ahmad [8] proposed an opinion mining system to
identify product features and opinions from review documents
but they have not refined the rule-set to improve the accuracy
of the system. In a bootstrapping approach is proposed [9],
which uses a small set of given seed opinion words to find
their synonyms and antonyms in WordNet. B liu and Minqing
Hu [9] have proposed a method for feature-opinion
summarization but they have not the pronoun resolution i.e.
co-reference resolution and strength of opinions. Ryu, Won,
Kyu,Ung [8] have used POS tagging for extracting features,
then discovered association rules and provided information
using PMI-IR algorithm. The sentiment analysis plays an
important role, and it is being extensively studied and
discussed since 1990s [10], [11], [16]. There are mainly two
main approaches to do sentiment analysis, one is based on
semantic analysis [14], [15], [17], [18], [19], and the other one
is based on machine learning [11], 12], 10], [13], [20], and the
methods based on machine learning are commonly used in
document-level sentiment analysis.
Although, there are various opinion mining methods that
extract features and opinions from document corpora, most of
them do not explicitly shows the semantic relationships
between them. Our proposed method differs from all these
approaches as we will compare the dependency tree generated
by Stanford Parser till we encounter other sentence that starts
with noun. By method we will be able to resolve co-reference
resolution problem. After that, we use word similarity for
grouping synonyms. Finally we find out the polarity/
orientation of the opinions given on the features by PMI
(point wise mutual information) [6] measure.
2.2 Product Feature Grouping
Grouping feature expressions, which are domain synonyms, is
critical for effective opinion summary [22]. Since there are
typically hundreds of feature expressions that can be
discovered from text for an opinion mining application, it‟s
very time-consuming and tedious for human users to group
them into feature categories. Some automated assistance is
needed. Unsupervised learning or clustering is the natural
technique for solving the problem. The similarity measures
used in clustering are usually based on some form of
distributional similarity [23], [24].
3. PROPOSED ARCHITECTURE
Let P = {P1, P2, . . . , Pn} be a set of products which are made
by different manufacturers, like ‟Nokia‟, ‟iPhone‟,
‟Google‟s Nexus One‟ and „Samsung‟s Galaxy‟, they are all
the cell phone but made by different companies. For each
product Pi, there exists a set of reviews (customer feedback
3.1
Feature Extraction
Stanford Parser [26] assigns parts-of-speech (POS) tags to
every word based on the context in which they appear. It is
basically used to identify the nouns i.e. the Product features,
adjectives i.e. opinions, adverbs i.e. used to express degree of
expressiveness of opinions. The Figure 1 demonstrates how
our system works and its procedure is shown below.
3.2
Procedure
The algorithm works in six steps:
Input to the algorithm is written customer review.
Output is the Classification i.e. Semantic
Orientation (positive or negative.) Steps
 Use part-of-speech tagger to identify features of the
product. 
 Grouping /Clustering similar features using Word Net
similarity 

 Using syntactic parse tree and then finding pronoun being
 referred i.e.co-reference resolution. 
 Creating Dependency relation for feature-opinion pair
extraction. 
 Estimating the semantic orientation of feature-opinion
phrase. 
Assign the given review to a class (positive, negative, and
neutral).
STEP 1: POS TAGGING AND FEATURES.
Product features are usually nouns or noun phrases in review
sentences. Thus the part-of-speech tagging is crucial. We used
the NLP processor linguistic parser to parse each review to
split text into sentences and to produce the part-of-speech tag
for each word.
Algorithm. Pseudo-Code for extracting product feature
candidates
//Input: S – Set of tagged sentences; s = s1, s2,…,sm
P – Set of noun phrase patterns
GI – Set of word in GI dictionary
//Output: PS – Set of product feature candidates
PS = ø
For each tagged sentence snin S
PC = ø
For i=1 to end of sentence sn
If i<Length(sn) – 2 Then x = 3
Else If i = Length(sn) – 2 Then x = 2
Else If i = Length(sn) – 1 Then x = 1
Else x = 0
End
End
End
For j = x to 0
GT = Ti to Ti+j /* POS Tag of wordi to wordi+j of sn */
GW = wordi to wordi+j
If GT in P and GW is not in GI then
i = i+j
PC = PC + GW
Break
End
End
End
PS = PS + PC
End
This step is to identify product feature-opinion candidates.
For each feature of product existing in the every dependency
relation, we will find the corresponding opinion words.
Dependency grammars represent sentence structures as a set
of dependency relationships. A dependency relationship is an
asymmetric binary relationship between a word called head or
governor, and another word called modifier or dependent. The
set of dependent words will form a dependency relation [27].
STEP 5: Estimating the semantic orientation of featureopinion phrase.
STEP 2: Grouping /clustering similar features using Word
Net similarity.
Algorithm:
For each feature f (k) stored in feature_list
For each feature f (j) [where j =k+1] in feature_list
If the similarity between f (k) and f (j) > threshold then
Group f(k) and f(j)
End if
end for
end for
STEP3: Using syntactic parse tree and then comparing the
two parse for pronoun recognition i.e. co-references
resolution.
After finding the nodes we will assign the nouns found to the
pronoun being used for them. This process goes on until we
find the next sentence having noun and we will repeat the
same process. We have used Stanford parser for generating
the full syntactic parse tree of the given sentence after
generating the entire parse tree. After that we have compared
the leftmost branches of parse tree to determine Pronoun in a
sentence with its previous adjacent parse tree having Noun.
Therefore, we will be able to determine the nouns that are
being referred by pronoun.
Algorithm:
procedure find(tree,node):
label node as traversed
if node = tagged_PRP or tagged_Noun
thenco_refer = node
else
for all edges e in tree.adjacentEdges (node) do
if edge e is untraversed then
w ← Tree.adjacentnode(node,e)
if node w is un-traversed then
label e as a discovery edge
recursively call find(tree,w)
else
label e as a back edge
Fig 2. Semantic orientation of the feature-opinion phrases
4. RESULTS AND EVALUATION
We now discuss on the performance of the whole system
which is analyzed by taking into account the performance of
the feature and opinion extraction process. We calculate the
true positive TP (number of correct feature-opinion pairs the
system identifies as correct), the false positive FP (number of
incorrect feature-opinion pairs the system falsely identifies as
correct), true negative TN (number of incorrect featureopinion pairs the system identifies as incorrect), and the false
negatives FN (number of correct feature-opinion pairs the
system fails to identify as correct). By using these values we
calculate the following performance measures as shown in
table 1.
Precision (π): the ratio of true positives among all retrieved
instances [21].
P = TP / (TP + FP)
Recall (ρ): the ratio of true positives among all positive
instances [21].
R= TP / (TP + FN)
(2)
F1-measure (F1): the harmonic mean of recall and
precision [21].
F1= 2PR/ (P+R)
STEP 4: Creating Dependency relation for feature-opinion
pair extraction.
(1)
(3)
Table 1. Performance evaluation of feature-opinion extraction process
Product Name
TP
FP
FN
TN
Recall
Precision
F1measure
Canon (camera)
18
02
06
07
75%
90%
81.81%
Nokia 6610
(phone)
15
02
04
06
78.94%
78.9%
78.9%
Toyata Camry
(car)
14
03
07
05
66.66%
82.35%
73.67%
73.53%
83.75%
78.12%
AVERGAE
5. CONCLUSION AND FUTURE
WORK
In this paper, we have proposed a system to identify product
features and opinions from review documents. The proposed
method also finds the sentiment polarity of opinion
sentences using sentiment analysis algorithm and provides
feature-based summarization. In our future work, we will
make an experiment on our method for improving accuracy.
We research natural language processing technique for
analyzing about implicit opinion sentence and analyzing
about complex sentence. It is because a review may be
written in short sentences and it becomes difficult to create
and compare dependency relationship. Therefore it gives less
accuracy in finding multi-meaningful words.
Reference
[1] Goldstein, J., Kantrowitz, M., Mittal, V., and Carbonell,
J.1999. Summarizing Text Documents: Sentence
Selection and Evaluation Metrics. SIGIR'99.
[2] Salton, Singhal, Buckley, C. and Mitra, M.
1996.Automatic Text Decomposition using Text
Segments and Text Themes. ACM Conference on
Hypertext.
[3] Tait, J. 1983. Automatic Summarizing of English
Texts.Ph.D. Dissertation, University of Cambridge.
[4] Esuli A., 2008. Automatic Generation of Lexical
Resources for Opinion Mining: Models, Algorithms and
Applications. Newsletter ACM SIGIR Forum,42(2)
[5] Wanner, F., C. Rohrdantz, F. Mansmann, D. Oelke and
D.A. Keim, 2009. Visual Sentiment Analysis of RSS
News Feeds Featuring the US Presidential Election in
2008. Workshop on Visual Interfaces to the Social and
the Semantic Web, Florida, USA,
[6] Turney, P. D. 2002. Thumbs up or thumbs down?
Semantic orientation applied to unsupervised
classification of reviews. Association for Computational
Linguistics, Morristown, NJ,417-424.
[7] Wenhao Zhang, HuaXu, Wei Wan, Weakness Finder:
Expert System with application 39 (2012) 10283-10291
[8] Ryu, Won, Kyu,Ung “A Method for Opinion Mining of
Product Reviews using Association Rules”ICIS 2009,
Seoul, Korea Copyright © 2009 ACM.
[9] Hu, M., Liu, B.: Mining and Summarizing Customer
Reviews. In: Proceedings of ACM SIGKDD
International Conference on Knowledge Discovery and
Data Mining (KDD 2004), USA, pp. 168–177 (2004)
[10] Liu, B. (2010). Sentiment analysis and subjectivity.
Handbook of natural language processing 9781420085921.
[11] Mullen, T., & Collier, N. (2004). Sentiment analysis
using support vector machines with diverse information
sources. In Proceedings of EMNLP (Vol. 4, pp. 412–
418).
[12] Pang, B., & Lee, L. (2008). Opinion mining and
sentiment analysis. Foundations and Trends in
Information Retrieval, 2(1–2), 1–135.
[13] Pang, B., Lee, L., &Vaithyanathan, S. (2002). Thumbs
up?: Sentiment classification using machine learning
techniques. Proceedings of the ACL-02 conference on
empirical methods in natural language processing (Vol.
10, pp. 79–86). Association for Computational
Linguistics.
[14] Popescu, A., &Etzioni, O. (2005). Extracting product
features and opinions from reviews. In Proceedings of
the conference on human language technology and
empirical methods in natural language processing (pp.
339–346). Association for Computational Linguistics.
[15] Saleh, M., Valdivia, M. T., Montejo-Ráez, A., &López,
L. A.(2011). Experiments with SVM to classify
opinions in different domains. Expert Systems , 38(12),
14799– 14804.
[16] Tang, H., Tan, S., & Cheng, X. (2009). A survey
on sentiment detection of reviews.Expert Systems
with Applications, 36(7), 10760–10773.
[17] Yang, D., & Powers, D. (2005). Measuring semantic
similarity in the taxonomy of wordnet. Proceedings of
the twenty-eighth Australasian conference on computer
science Australian Computer Society, Inc..
[18] Yu, H., &Hatzivassiloglou, V. (2003). Towards
answering opinion questions: Separating facts from
opinions and identifying the polarity of opinion
sentences. Proceedings of the 2003 conference on
empirical methods in natural language processing.
Association for Computational Linguistics.
[19] Zhai, Z., Liu, B., Xu, H., &Jia, P. (2011). Clustering
product features for opinion mining. In Proceedings of
the fourth ACM international conference on web search
and data mining (pp. 347–354). ACM.
[20] Zhang, Z., Ye, Q., Zhang, Z., & Li, Y. (2011).
Sentiment classification of Internet restaurant reviews
written in Cantonese. Expert Systems with Applications,
38(6),7674–7682.
[21] M. Abulaish, Jahiruddin, MN Doja, T. Ahmad, Feature
and Opinion Mining for Customer Review
Summarization, © Springer-Verlag Berlin Heidelberg
PReMI LNCS 5909, pp. 219–224, 2009.
[22] Liu B, Hu M, and Cheng J. Opinion Observer:
Analyzing and Comparing Opinions on the Web. in
Proceedings of WWW. 2005.342-351
[23] Bollegala D, Matsuo Y, and Ishizuka M. Measuring
semantic similarity between words using web search
engines. In Proceedings of WWW. 2007.757-766
[24] Pantel P, Crestan E, Borkovsky A, Popescu A, and Vyas
V. Web-scale distributional similarity and entity set
expansion. in Proceedings of EMNLP. 2009.938-947
[25] Stanford
Tagger
Version
1.6.2008.
http://wwwnlp.staford.edu/software/tagger.shtml
[26] Stanford
Parser
Version
1.6.
http://nlp.stanford.edu/software/lex-parser.shtml
2008.
[27] Gamgarni, Pattarachai, "Mining Feature-Opinion in
Online Customer Reviews for Opinion Summarization",
Journal of Universal Computer Science, vol. 16, no. 6
(2010),
Download