An Improved Deep Learning-based Approach for Sentiment Mining Nurfadhlina Mohd Sharef, Mohammad Yasser Shafazand Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Malaysia nurfadhlina@upm.edu.my, 79.zand@gmail.com Abstract. The sentiment mining approaches can typically be divided into lexicon and machine learning approaches. Recently there are an increasing number of approaches which combine both to improve the performance when used separately. However, this still lacks contextual understanding which led to the introduction of deep learning approaches. This paper enhances the deep learning approach with the aid of semantic lexicons in situations where the training has not been done in the desired domain. Scores are also computed instead of merely nominal classifications. Results suggest that the approach outperforms its original in different domains not trained on. 1 Introduction Sentiment mining revolves with the identification of the attitude of the writer and the contextual polarity of the written text. It utilizes natural language processing, text analysis and computational linguistics to identify and extract subjective information in the text. A basic task includes classifying the polarity such as positive, negative or neutral, also classifying features/aspects such as memory, battery, and screen size, and emotions such as angry, sad and happy. Existing works on sentiment mining can be generally classified into machine learning and dictionary approach. The machine learning approach is divided into supervised and unsupervised approach. The former achieves reasonable effectiveness but is usually domain specific, language dependent, and requires labelled data which is often labour intensive. The latter has high demand because publicly available data is often unlabelled and requires robust adfa, p. 1, 2011. © Springer-Verlag Berlin Heidelberg 2011 solutions. The dictionary approaches extract the polarity of the opinion based on lexicons and dictionary (bag) of words such as MPQA lexicon and SentiWordNet. Although the dictionary-based approaches do not require any training, this approach cannot express the meaning of longer phrases in a principled way. Furthermore, the compositionality feature is often lost when there are more than one aspect and matched lexicon mentioned in the text. A newly introduced approach based on recursive deep model called Stanford Sentiment Mining (SSM) was aimed to maintain the semantic compositionality of sentiment detection by utilizing a Sentiment Treebank. SSM is therefore built upon vector representations for phrases and computes the sentiment based on how meaning is composed based on the sequence and structure of the words in the sentence. When trained on the Stanford Sentiment Treebank (SST), SSM outperforms all previous methods. SSM is also the only model that can accurately capture the effect of contrastive conjunctions as well as negation and its scope at various tree levels for both positive and negative phrases. However, as a result of classification the SSM model generates a range of 5 values for each sentiment (ranged between 0 to 4, where 0 means the sentiment is completely negative and 4 completely positive), showing that a specific scoring range of the sentiment cannot be obtained. In SSM training must be done in the specific scope of discussion, if not then the classification result will be poor, especially in the case when the correct classification is neutral. Furthermore, scoring information is also lost when there are more than one aspects mentioned in the text. SSM also has weak performance when the text is short and the structure is small, usually leading to negative sentiments which are the result of its training. It also uses the PENN tree structure for labeled training data which is hard to obtain. Therefore, this paper introduces a hybrid integration of SSM and SentiWordNet to gain the efficiency of SSM in terms of the sentence structure exploitation and the scoring ability in SentiWordNet. This combination increases the possibility to give accurate results in domains that it has been never trained in alongside learning sentence structures sentiments. This integration called Sentimetrics also enables the sentiment intensity measurement. We also improve the SSM by an online learning algorithm which indicates that our engine is able to incrementally learn and update the scoring representation of phrases. Summing up positive and negative points based on words in a sentence is an approach used in most sentiment prediction systems. Our engine predicts the sentiment based on sentence structure and a sentiment word lexicon. This combinational model is not as easily fooled as previous models and has competitive results compared to the other existing approaches. The rest of the paper is structure as follows. In section two related works are being discussed. Section three highlights the Sentimetrics engine while section four provides the evaluation of the Sentimetrics engine performance against the existing models. Section five concludes the paper. 2 Related Works Works on sentiment mining are divided into several focuses namely, (i) the learning model which is divided into the identification of sentiment polarity, (ii) subjectivity understanding and strength processing which include aspect/feature identification, negation and semantic orientation, and (iii) applications such as opinion mining, sentiment categorization and reviews summarization which are useful for companies offering products or services to analyze published reviews to determine user affinity to their products instead of conducting costly market studies or customer satisfaction analyses. 2.1 Machine learning approaches Bag of words classifiers [1] can work well in longer documents by relying on a few words with strong sentiment like ‘awesome’ or ‘exhilarating.’ However, the achievement of works based on this approach is constrained mainly because word order is ignored and cannot accurately classify hard examples of negation. Machine learning approaches fill this gap by modelling learners based on the extracted features such as syntactic, lexical, structural, labels or the combination of these [2]. Therefore, semi-supervised learning, which uses a large amount of unlabeled data together with labeled data to build better learning models, has attracted growing attention in sentiment classification. The most well-known machine learning methods for sentiment mining are the support vector machine (SVM), Naïve Bayes, and the N-gram model [2]. Naive Bayes assumes a stochastic model of document generation. Using Bayes’ rule, the model is inverted in order to pre- dict the most likely class for a new document. SVMs seek a hyperplane represented by vector that separates the positive and negative training vectors of documents with maximum margin. Instead of taking words as the basic unit, N-gram model takes characters (letters, space, or symbols) as the basic unit in the algorithm. Even though machine-learning approaches have made significant advances, applying them to new texts require labeled training data sets. The compilation of these training data requires considerable time and effort, especially since data should be current. In contrast, lexical-based approaches extract the polarity of each sentence in a text. Afterwards, the sense of the opinion words in the phrase is analyzed in order to group polarities, and thus classify its sentiment. 2.2 Lexical based approaches Typically, the lexicons are mapped to their semantic value such as adopted in MPQA lexicon [3] and SentiWordNet [4]. Some works utilized the lexicons as part of the feature generation technique to be incorporated in machine learners [5]–[9] while some relies solely on the lexicon for the content identification [10]–[14]. Esuli and Sebastiani developed a WordNet of sentimental words, which is called SentiWordNet [4]. The SentiWordNet is a popular lexicon-based sentiment analysis tool used by many researches. Eight ternary classifiers were used to make the lexicon. Every WordNet synset was classified as positive, negative or objective, which results into fine grained and exhaustive Lexicon. A small subset of training data was manually tagged. Through usage of small subset of tagged training set, the remaining training data got iteratively labeled or trained. Above-mentioned ternary classifier is made by applying two classification algorithms (Rocchio learner and SVM) on four subsets of training data. As mentioned before, three generated categories are positive, negative or objective. Positivity, negativity or objectivity was demonstrated by using a graded scale. The main advantage of lexical-based approaches is that once they are built, no training data are necessary. Unfortunately, they also have certain drawbacks. First, most are designed as glossaries of general language words, often based on WordNet, and thus they do not contain either technical terms or colloquial expressions. Secondly, since they are unable to consider contextdependent expressions, such systems achieve very limited accuracy in multidomain scenarios because the connotation of certain words can be either positive or negative. Works that combine both lexicon and machine learning approaches have achieved higher precision [8], [15]. However, recently, works based on deep learning were proposed which were aim to mitigate the necessity to prepare huge amount of labeled training data by higher automation of the lexicon exploitation. 3 A Hybrid of Deep Learning and Semantic Lexicon Approaches The deep learning model is similar with neural network due to the shared utilization of layers, trained in a layer-wise way. It is also an approach for unsupervised learning of feature representations, at successively higher levels. This model computes compositional vector representations for phrases of variable length and syntactic type. These representations will then be used as features to classify each phrase. A new model called the Recursive Neural Tensor Network (RNTN) [16] is introduced which labels every syntactically plausible phrase generated by the Stanford Parser [17] to train and evaluate the compositional vector representations for phrases of variable length and syntactic type, generated by the SST. SSM creates SST based on PENN tree formation where each node has a sentiment. The vector representations will then be used as features to classify each phrase. However, the RNTN model can only classify based on the nominal classification and has no means to calculate the score of the text. Therefore, we propose a method called Sentimetrics which aims to improve this by giving texts a score resembling the range of positivity or negativity of the sentiment. This would help in more precise mining. Our hypothesis is to increase the accuracy of SSM approach by merging it with a lexical resource. SSM needs to be trained with a treebank in a category to give the desired accuracy. Using a lexical resource could increase the probability of giving more accurate results in categories that the system has never been trained before. Error! Reference source not found. shows the flowchart of the Sentimetrics engine. The input into the Sentimetrics engine goes through two different paths of sentiment analyzing. In the first path, the same process used in SSM will be used on the sentence to extract the sentiment tree, but there will be no extra training involved and we will use the trained system as is. Then, the polarity probability matrices [18] are used to compute a score to be used later in conjunction with the results in the other path. In the second extraction path we have tokenized and POS tagged the words mentioned in the SentiWordnet scored dictionary to find and score specific words based on their tag. At the combination phase we replace the scores found in the second path into the sentiment tree created in the first path. This results to a sentiment tree including SentiWordnet sfound sentiments. At the ending step we recalculate the overall sentiment by traversing the tree and adding the overall scores of each node. The calculation uses the SentiWordnet replaced scores and the semi-scores created using the polarity probability matrices together. Fig. 1. Flowchart of Sentimetrics engine We have also proposed an online learning framework in which the SSM Treebank would be updated using the evaluation of the user as shown in Error! Reference source not found.. The lexicon database is extended to accept a variety of score sentiment dictionaries rather than SentiWordnet. Fig. 2. Online learning framework in Sentimetrics Engine 4 Results We have used two customer review datasets namely the Canon3G and the Asus dataset [19] to evaluate our proposed method. We have compared our results with SSM and the benchmark sentiment classifications. The SSM has not been trained for the current datasets. It uses its default trained dataset based on movie reviews. The SSM used in our Sentimetrics engine has the same trained set as the SSM used for comparison. Our experiment measures the improvement gained while combining the deep learning method with a scored dictionary like SentiWordnet. The confusion matric shown in Table 1 shows the number of false and true positive and negatives. We have no Neutral value for the dataset itself as the sentences have been selected to be either positive or negative. Table 2 compares the performance between the SSM and Sentimetrics engine. Table 1 Confusion table of the actual results, SSM and Sentimetrics for the Apex dataset. Apex Positive Negative Canon3G Positive Negative Sentimetrics SSM Positive Neutral Negative Positive Neutral Negative 62 56 30 46 50 53 30 93 74 23 73 101 84 66 17 67 52 47 9 27 16 6 23 23 Table 2 Comparison between SSM and Sentimetrics performance for the Apex dataset. Apex SSM Sentimetrics Canon3G SSM Sentimetrics Sensitivity Accuracy Precision Recall F measure 0.667 0.425 0.309 0.324 0.316 0.673 0.394 0.419 0.336 0.372 0.918 0.413 0.404 0.698 0.511 0.903 0.457 0.503 0.7 0.585 For the Apex dataset the Sentimetrics shows a better sensitivity, accuracy, precision, recall and also better results with the F measure while in the Canon3G dataset Sentimetrics outperform SSM in terms of accuracy, precision, recall and F-measure. Note that the number of neutral classifications also rise with the number of false positives and false negatives. This indicates that the lacking of SSM when it is applied in the domains it is not trained. Having a deep learning engine like SSM integrated inside the Sentimetrics would help obtaining the general sentiment score while the dictionary helps in situations where the score is very close to the border of another classification. For example if the score is 0.89 and the neutral class is declared between 0 and 1 then the classifier would assume the sentence to be neutral while a single word with a positive sentiment could shift the score to be more than 1 and thus be declared positive (the positive class declared between 1 and 2). For example the sentences: “This camera is worth every penny” or “the g3 looks like a work of art !” are positive in structure and which are detected by the deep learning mechanism integrated in Sentimetrics. There are sentences which the SSM is not trained to detect like “the functionality on this camera is mind-blowing”, “i am stunned and amazed at the quality of the raw images i am getting from this g3” or “it practically plays almost everything you give it “. In these sentences the SSM has detected the sentence to be neutral while Sentimetrics has correctly declared positive due to the supportive Sentiwordnet dictionary. There are also scenarios which the Sentimetrics fails in detecting the actual sentiment because it lacks the training in that scope while there are also no words with sentimental values. An example for this situation would be the sentence: “learning how to use it will not take very long”. 5 Conclusion Most works done in the field of sentiment mining are either emphasised on machine learning mechanisms or dictionary approach mechanisms to detect the sentiment. One of the latest of the sentiment analysers in the field of machine learning is the SSM which is computes the sentiment based on how meaning is composed it, but it has to be retrained in the specific domain, using a labeled SST. It also has a limitation in its scoring due to its 5 state classification. We proposed an engine that uses the SSM and the Sentiwordnet lexicon to extend the usage to domains never trained before. We used two different datasets and showed that the Sentimetrics engine shows better results compared to SSM due to utilizing a sentiment lexicon into the composed sentiment Treebank. We would consider online learning mechanisms and the ability to import various sentiment dictionaries for our future work. 6 References [1] B. Pang and L. Lee, “Opinion Mining and Sentiment Analysis,” Found. Trends® Inf. Retr., vol. 2, no. 1–2, pp. 1–135, 2008. [2] Q. Ye, Z. Zhang, and R. Law, “Sentiment classification of online reviews to travel destinations by supervised machine learning approaches,” Expert Syst. Appl., vol. 36, no. 3, pp. 6527–6535, Apr. 2009. [3] T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual polarity in phrase-level sentiment analysis,” Proc. Conf. Hum. Lang. Technol. Empir. Methods Nat. Lang. Process. - HLT ’05, pp. 347–354, 2005. [4] S. Baccianella, A. Esuli, and F. Sebastiani, “SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining,” in Proceedings of the Seventh conference on International Language Resources and Evaluation LREC10, 2010, vol. 0, pp. 2200– 2204. [5] A. Montoyo, P. Martínez-Barco, and A. Balahur, “Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments,” Decis. Support Syst., vol. 53, no. 4, pp. 675–679, Nov. 2012. [6] V. Hatzivassiloglou and K. R. McKeown, “Predicting the semantic orientation of adjectives,” Proc. 35th Annu. Meet. Assoc. Comput. Linguist. -, pp. 174–181, 1997. [7] W. Wang, H. Xu, and W. Wan, “Implicit feature identification via hybrid association rule mining,” Expert Syst. Appl., vol. 40, no. 9, pp. 3518–3531, Jul. 2013. [8] A. Balahur, R. Mihalcea, and A. Montoyo, “Computational approaches to subjectivity and sentiment analysis: Present and envisaged methods and applications,” Comput. Speech Lang., vol. 28, no. 1, pp. 1–6, Jan. 2014. [9] M. Eirinaki, S. Pisal, and J. Singh, “Feature-based opinion mining and ranking,” J. Comput. Syst. Sci., vol. 78, no. 4, pp. 1175–1184, Jul. 2012. [10] a. Moreo, M. Romero, J. L. Castro, and J. M. Zurita, “Lexicon-based Comments-oriented News Sentiment Analyzer system,” Expert Syst. Appl., vol. 39, no. 10, pp. 9166–9180, Aug. 2012. [11] D. Ghazi, D. Inkpen, and S. Szpakowicz, “Prior and contextual emotion of words in sentential context,” Comput. Speech Lang., vol. 28, no. 1, pp. 76–92, Jan. 2014. [12] F. H. Khan, S. Bashir, and U. Qamar, “TOM: Twitter opinion mining framework using hybrid classification scheme,” Decis. Support Syst., vol. 57, pp. 245–257, Jan. 2014. [13] W. Zhang, H. Xu, and W. Wan, “Weakness Finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis,” Expert Syst. Appl., vol. 39, no. 11, pp. 10283–10291, Sep. 2012. [14] N. Li and D. D. Wu, “Using text mining and sentiment analysis for online forums hotspot detection and forecast,” Decis. Support Syst., vol. 48, no. 2, pp. 354–368, Jan. 2010. [15] A. Balahur, J. M. Hermida, and A. Montoyo, “Detecting implicit expressions of emotion in text: A comparative analysis,” Decis. Support Syst., vol. 53, no. 4, pp. 742–753, Nov. 2012. [16] R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts, “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” 2013. [17] D. Klein and C. D. Manning, “Accurate Unlexicalized Parsing,” in Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 2003, pp. 423–430. [18] D. Ghazi, D. Inkpen, and S. Szpakowicz, “Prior and contextual emotion of words in sentential context ଝ,” Comput. Speech Lang., vol. 28, no. 1, pp. 76–92, 2014. [19] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the tenth ACM SIGKDD international …, 2004.