An Improved Deep Learning-based Approach for Sentiment

advertisement
An Improved Deep Learning-based Approach for Sentiment
Mining
Nurfadhlina Mohd Sharef, Mohammad Yasser Shafazand
Department of Computer Science, Faculty of Computer Science and Information Technology,
Universiti Putra Malaysia, Malaysia
nurfadhlina@upm.edu.my, 79.zand@gmail.com
Abstract. The sentiment mining approaches can typically be divided into lexicon and machine learning approaches. Recently there are an increasing number of approaches which combine both to improve the performance when
used separately. However, this still lacks contextual understanding which led
to the introduction of deep learning approaches. This paper enhances the
deep learning approach with the aid of semantic lexicons in situations where
the training has not been done in the desired domain. Scores are also computed instead of merely nominal classifications. Results suggest that the approach
outperforms its original in different domains not trained on.
1
Introduction
Sentiment mining revolves with the identification of the attitude of the writer and the contextual polarity of the written text. It utilizes natural language
processing, text analysis and computational linguistics to identify and extract
subjective information in the text. A basic task includes classifying the polarity such as positive, negative or neutral, also classifying features/aspects such
as memory, battery, and screen size, and emotions such as angry, sad and
happy.
Existing works on sentiment mining can be generally classified into machine
learning and dictionary approach. The machine learning approach is divided
into supervised and unsupervised approach. The former achieves reasonable
effectiveness but is usually domain specific, language dependent, and requires labelled data which is often labour intensive. The latter has high demand because publicly available data is often unlabelled and requires robust
adfa, p. 1, 2011.
© Springer-Verlag Berlin Heidelberg 2011
solutions. The dictionary approaches extract the polarity of the opinion
based on lexicons and dictionary (bag) of words such as MPQA lexicon and
SentiWordNet. Although the dictionary-based approaches do not require any
training, this approach cannot express the meaning of longer phrases in a
principled way. Furthermore, the compositionality feature is often lost when
there are more than one aspect and matched lexicon mentioned in the text.
A newly introduced approach based on recursive deep model called Stanford
Sentiment Mining (SSM) was aimed to maintain the semantic compositionality of sentiment detection by utilizing a Sentiment Treebank. SSM is therefore
built upon vector representations for phrases and computes the sentiment
based on how meaning is composed based on the sequence and structure of
the words in the sentence.
When trained on the Stanford Sentiment Treebank (SST), SSM outperforms
all previous methods. SSM is also the only model that can accurately capture
the effect of contrastive conjunctions as well as negation and its scope at
various tree levels for both positive and negative phrases. However, as a result of classification the SSM model generates a range of 5 values for each
sentiment (ranged between 0 to 4, where 0 means the sentiment is completely negative and 4 completely positive), showing that a specific scoring
range of the sentiment cannot be obtained. In SSM training must be done in
the specific scope of discussion, if not then the classification result will be
poor, especially in the case when the correct classification is neutral. Furthermore, scoring information is also lost when there are more than one
aspects mentioned in the text. SSM also has weak performance when the
text is short and the structure is small, usually leading to negative sentiments
which are the result of its training. It also uses the PENN tree structure for
labeled training data which is hard to obtain.
Therefore, this paper introduces a hybrid integration of SSM and SentiWordNet to gain the efficiency of SSM in terms of the sentence structure exploitation and the scoring ability in SentiWordNet. This combination increases the
possibility to give accurate results in domains that it has been never trained
in alongside learning sentence structures sentiments. This integration called
Sentimetrics also enables the sentiment intensity measurement. We also
improve the SSM by an online learning algorithm which indicates that our
engine is able to incrementally learn and update the scoring representation
of phrases. Summing up positive and negative points based on words in a
sentence is an approach used in most sentiment prediction systems. Our
engine predicts the sentiment based on sentence structure and a sentiment
word lexicon. This combinational model is not as easily fooled as previous
models and has competitive results compared to the other existing approaches.
The rest of the paper is structure as follows. In section two related works are
being discussed. Section three highlights the Sentimetrics engine while section four provides the evaluation of the Sentimetrics engine performance
against the existing models. Section five concludes the paper.
2
Related Works
Works on sentiment mining are divided into several focuses namely, (i) the
learning model which is divided into the identification of sentiment polarity,
(ii) subjectivity understanding and strength processing which include aspect/feature identification, negation and semantic orientation, and (iii) applications such as opinion mining, sentiment categorization and reviews
summarization which are useful for companies offering products or services
to analyze published reviews to determine user affinity to their products
instead of conducting costly market studies or customer satisfaction analyses.
2.1
Machine learning approaches
Bag of words classifiers [1] can work well in longer documents by relying on a
few words with strong sentiment like ‘awesome’ or ‘exhilarating.’ However,
the achievement of works based on this approach is constrained mainly because word order is ignored and cannot accurately classify hard examples of
negation. Machine learning approaches fill this gap by modelling learners
based on the extracted features such as syntactic, lexical, structural, labels or
the combination of these [2].
Therefore, semi-supervised learning, which uses a large amount of unlabeled
data together with labeled data to build better learning models, has attracted growing attention in sentiment classification. The most well-known machine learning methods for sentiment mining are the support vector machine
(SVM), Naïve Bayes, and the N-gram model [2].
Naive Bayes assumes a stochastic model of document generation. Using
Bayes’ rule, the model is inverted in order to pre- dict the most likely class
for a new document. SVMs seek a hyperplane represented by vector that
separates the positive and negative training vectors of documents with maximum margin. Instead of taking words as the basic unit, N-gram model takes
characters (letters, space, or symbols) as the basic unit in the algorithm.
Even though machine-learning approaches have made significant advances,
applying them to new texts require labeled training data sets. The compilation of these training data requires considerable time and effort, especially
since data should be current. In contrast, lexical-based approaches extract
the polarity of each sentence in a text. Afterwards, the sense of the opinion
words in the phrase is analyzed in order to group polarities, and thus classify
its sentiment.
2.2
Lexical based approaches
Typically, the lexicons are mapped to their semantic value such as adopted in
MPQA lexicon [3] and SentiWordNet [4]. Some works utilized the lexicons as
part of the feature generation technique to be incorporated in machine
learners [5]–[9] while some relies solely on the lexicon for the content identification [10]–[14].
Esuli and Sebastiani developed a WordNet of sentimental words, which is
called SentiWordNet [4]. The SentiWordNet is a popular lexicon-based sentiment analysis tool used by many researches. Eight ternary classifiers were
used to make the lexicon. Every WordNet synset was classified as positive,
negative or objective, which results into fine grained and exhaustive Lexicon.
A small subset of training data was manually tagged. Through usage of small
subset of tagged training set, the remaining training data got iteratively labeled or trained. Above-mentioned ternary classifier is made by applying two
classification algorithms (Rocchio learner and SVM) on four subsets of training data. As mentioned before, three generated categories are positive, negative or objective. Positivity, negativity or objectivity was demonstrated by
using a graded scale.
The main advantage of lexical-based approaches is that once they are built,
no training data are necessary. Unfortunately, they also have certain drawbacks. First, most are designed as glossaries of general language words, often
based on WordNet, and thus they do not contain either technical terms or
colloquial expressions. Secondly, since they are unable to consider contextdependent expressions, such systems achieve very limited accuracy in multidomain scenarios because the connotation of certain words can be either
positive or negative.
Works that combine both lexicon and machine learning approaches have
achieved higher precision [8], [15]. However, recently, works based on deep
learning were proposed which were aim to mitigate the necessity to prepare
huge amount of labeled training data by higher automation of the lexicon
exploitation.
3
A Hybrid of Deep Learning and Semantic Lexicon Approaches
The deep learning model is similar with neural network due to the shared
utilization of layers, trained in a layer-wise way. It is also an approach for
unsupervised learning of feature representations, at successively higher levels. This model computes compositional vector representations for phrases
of variable length and syntactic type. These representations will then be used
as features to classify each phrase.
A new model called the Recursive Neural Tensor Network (RNTN) [16] is introduced which labels every syntactically plausible phrase generated by the
Stanford Parser [17] to train and evaluate the compositional vector representations for phrases of variable length and syntactic type, generated by the
SST. SSM creates SST based on PENN tree formation where each node has a
sentiment. The vector representations will then be used as features to classify each phrase.
However, the RNTN model can only classify based on the nominal classification and has no means to calculate the score of the text. Therefore, we propose a method called Sentimetrics which aims to improve this by giving texts
a score resembling the range of positivity or negativity of the sentiment. This
would help in more precise mining. Our hypothesis is to increase the accuracy of SSM approach by merging it with a lexical resource. SSM needs to be
trained with a treebank in a category to give the desired accuracy. Using a
lexical resource could increase the probability of giving more accurate results
in categories that the system has never been trained before.
Error! Reference source not found. shows the flowchart of the Sentimetrics
engine. The input into the Sentimetrics engine goes through two different
paths of sentiment analyzing. In the first path, the same process used in SSM
will be used on the sentence to extract the sentiment tree, but there will be
no extra training involved and we will use the trained system as is. Then, the
polarity probability matrices [18] are used to compute a score to be used
later in conjunction with the results in the other path. In the second extraction path we have tokenized and POS tagged the words mentioned in the
SentiWordnet scored dictionary to find and score specific words based on
their tag. At the combination phase we replace the scores found in the second path into the sentiment tree created in the first path. This results to a
sentiment tree including SentiWordnet sfound sentiments. At the ending
step we recalculate the overall sentiment by traversing the tree and adding
the overall scores of each node. The calculation uses the SentiWordnet replaced scores and the semi-scores created using the polarity probability matrices together.
Fig. 1. Flowchart of Sentimetrics engine
We have also proposed an online learning framework in which the SSM
Treebank would be updated using the evaluation of the user as shown in
Error! Reference source not found.. The lexicon database is extended to
accept a variety of score sentiment dictionaries rather than SentiWordnet.
Fig. 2. Online learning framework in Sentimetrics Engine
4
Results
We have used two customer review datasets namely the Canon3G and the
Asus dataset [19] to evaluate our proposed method. We have compared our
results with SSM and the benchmark sentiment classifications. The SSM has
not been trained for the current datasets. It uses its default trained dataset
based on movie reviews. The SSM used in our Sentimetrics engine has the
same trained set as the SSM used for comparison. Our experiment measures
the improvement gained while combining the deep learning method with a
scored dictionary like SentiWordnet. The confusion matric shown in Table 1
shows the number of false and true positive and negatives. We have no Neutral value for the dataset itself as the sentences have been selected to be
either positive or negative. Table 2 compares the performance between the
SSM and Sentimetrics engine.
Table 1 Confusion table of the actual results, SSM and Sentimetrics for the Apex dataset.
Apex
Positive
Negative
Canon3G Positive
Negative
Sentimetrics
SSM
Positive Neutral Negative Positive Neutral Negative
62
56
30
46
50
53
30
93
74
23
73
101
84
66
17
67
52
47
9
27
16
6
23
23
Table 2 Comparison between SSM and Sentimetrics performance for the Apex dataset.
Apex
SSM
Sentimetrics
Canon3G SSM
Sentimetrics
Sensitivity Accuracy Precision Recall
F measure
0.667
0.425
0.309
0.324
0.316
0.673
0.394
0.419
0.336
0.372
0.918
0.413
0.404
0.698
0.511
0.903
0.457
0.503
0.7
0.585
For the Apex dataset the Sentimetrics shows a better sensitivity, accuracy,
precision, recall and also better results with the F measure while in the Canon3G dataset Sentimetrics outperform SSM in terms of accuracy, precision,
recall and F-measure. Note that the number of neutral classifications also
rise with the number of false positives and false negatives. This indicates that
the lacking of SSM when it is applied in the domains it is not trained.
Having a deep learning engine like SSM integrated inside the Sentimetrics
would help obtaining the general sentiment score while the dictionary helps
in situations where the score is very close to the border of another classification. For example if the score is 0.89 and the neutral class is declared between 0 and 1 then the classifier would assume the sentence to be neutral
while a single word with a positive sentiment could shift the score to be
more than 1 and thus be declared positive (the positive class declared between 1 and 2). For example the sentences: “This camera is worth every
penny” or “the g3 looks like a work of art !” are positive in structure and
which are detected by the deep learning mechanism integrated in Sentimetrics. There are sentences which the SSM is not trained to detect like “the
functionality on this camera is mind-blowing”, “i am stunned and amazed at
the quality of the raw images i am getting from this g3” or “it practically plays
almost everything you give it “. In these sentences the SSM has detected the
sentence to be neutral while Sentimetrics has correctly declared positive due
to the supportive Sentiwordnet dictionary. There are also scenarios which
the Sentimetrics fails in detecting the actual sentiment because it lacks the
training in that scope while there are also no words with sentimental values.
An example for this situation would be the sentence: “learning how to use it
will not take very long”.
5
Conclusion
Most works done in the field of sentiment mining are either emphasised on
machine learning mechanisms or dictionary approach mechanisms to detect
the sentiment. One of the latest of the sentiment analysers in the field of
machine learning is the SSM which is computes the sentiment based on how
meaning is composed it, but it has to be retrained in the specific domain,
using a labeled SST. It also has a limitation in its scoring due to its 5 state
classification. We proposed an engine that uses the SSM and the Sentiwordnet lexicon to extend the usage to domains never trained before. We used
two different datasets and showed that the Sentimetrics engine shows better results compared to SSM due to utilizing a sentiment lexicon into the
composed sentiment Treebank. We would consider online learning mechanisms and the ability to import various sentiment dictionaries for our future
work.
6
References
[1]
B. Pang and L. Lee, “Opinion Mining and Sentiment Analysis,” Found.
Trends® Inf. Retr., vol. 2, no. 1–2, pp. 1–135, 2008.
[2]
Q. Ye, Z. Zhang, and R. Law, “Sentiment classification of online
reviews to travel destinations by supervised machine learning
approaches,” Expert Syst. Appl., vol. 36, no. 3, pp. 6527–6535, Apr.
2009.
[3]
T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual
polarity in phrase-level sentiment analysis,” Proc. Conf. Hum. Lang.
Technol. Empir. Methods Nat. Lang. Process. - HLT ’05, pp. 347–354,
2005.
[4]
S. Baccianella, A. Esuli, and F. Sebastiani, “SentiWordNet 3.0: An
Enhanced Lexical Resource for Sentiment Analysis and Opinion
Mining,” in Proceedings of the Seventh conference on International
Language Resources and Evaluation LREC10, 2010, vol. 0, pp. 2200–
2204.
[5]
A. Montoyo, P. Martínez-Barco, and A. Balahur, “Subjectivity and
sentiment analysis: An overview of the current state of the area and
envisaged developments,” Decis. Support Syst., vol. 53, no. 4, pp.
675–679, Nov. 2012.
[6]
V. Hatzivassiloglou and K. R. McKeown, “Predicting the semantic
orientation of adjectives,” Proc. 35th Annu. Meet. Assoc. Comput.
Linguist. -, pp. 174–181, 1997.
[7]
W. Wang, H. Xu, and W. Wan, “Implicit feature identification via
hybrid association rule mining,” Expert Syst. Appl., vol. 40, no. 9, pp.
3518–3531, Jul. 2013.
[8]
A. Balahur, R. Mihalcea, and A. Montoyo, “Computational approaches
to subjectivity and sentiment analysis: Present and envisaged
methods and applications,” Comput. Speech Lang., vol. 28, no. 1, pp.
1–6, Jan. 2014.
[9]
M. Eirinaki, S. Pisal, and J. Singh, “Feature-based opinion mining and
ranking,” J. Comput. Syst. Sci., vol. 78, no. 4, pp. 1175–1184, Jul. 2012.
[10]
a. Moreo, M. Romero, J. L. Castro, and J. M. Zurita, “Lexicon-based
Comments-oriented News Sentiment Analyzer system,” Expert Syst.
Appl., vol. 39, no. 10, pp. 9166–9180, Aug. 2012.
[11]
D. Ghazi, D. Inkpen, and S. Szpakowicz, “Prior and contextual emotion
of words in sentential context,” Comput. Speech Lang., vol. 28, no. 1,
pp. 76–92, Jan. 2014.
[12]
F. H. Khan, S. Bashir, and U. Qamar, “TOM: Twitter opinion mining
framework using hybrid classification scheme,” Decis. Support Syst.,
vol. 57, pp. 245–257, Jan. 2014.
[13]
W. Zhang, H. Xu, and W. Wan, “Weakness Finder: Find product
weakness from Chinese reviews by using aspects based sentiment
analysis,” Expert Syst. Appl., vol. 39, no. 11, pp. 10283–10291, Sep.
2012.
[14]
N. Li and D. D. Wu, “Using text mining and sentiment analysis for
online forums hotspot detection and forecast,” Decis. Support Syst.,
vol. 48, no. 2, pp. 354–368, Jan. 2010.
[15]
A. Balahur, J. M. Hermida, and A. Montoyo, “Detecting implicit
expressions of emotion in text: A comparative analysis,” Decis.
Support Syst., vol. 53, no. 4, pp. 742–753, Nov. 2012.
[16]
R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng,
and C. Potts, “Recursive Deep Models for Semantic Compositionality
Over a Sentiment Treebank,” 2013.
[17]
D. Klein and C. D. Manning, “Accurate Unlexicalized Parsing,” in
Proceedings of the 41st Annual Meeting on Association for
Computational Linguistics, 2003, pp. 423–430.
[18]
D. Ghazi, D. Inkpen, and S. Szpakowicz, “Prior and contextual emotion
of words in sentential context ଝ,” Comput. Speech Lang., vol. 28, no.
1, pp. 76–92, 2014.
[19]
M. Hu and B. Liu, “Mining and summarizing customer reviews,” in
Proceedings of the tenth ACM SIGKDD international …, 2004.
Download