Sentiment Analysis- A Competent Tool in Data Mining DhruvKanakia SuneetKarkala Hardikparekh Sanket Kulkarni DJ Sanghvi COE Vile Parle (West) Mumbai, India DJ Sanghvi COE Vile Parle(West) Mumbai, India DJ Sanghvi COE Vile Parle(West) Mumbai, India DJ Sanghvi COE Vile Parle(West) Mumbai, India dhruvkanakia7@gmail.com karkala.suneet8@gmail.com hardikparekh56@gmail.com sanket26lee@gmail.com ABSTRACT As more and more devices are getting access to the web the data produced has also increased enormously. Of all the total data produced till now 90% of it is produced in last two years, this stat itself shows how the revolution of internet is producing vast amount of data which if used effectively can do wonders. People now-a-days communicate, participate on many social websites, blogs, forums etc. from which can offer great opportunity to analyze the data, apply theories, algorithms and technologies that search and extract relevant data from huge quantities of data available from various websites and mine them for opinions thereafter. Data analysis is widely growing as a field and sentimental analysis is an important feature involved in it. Sentimental analysis is basically determining the attitude/ judgment/ evaluation/ emotional state or intended emotional communication of the speaker or the writer with the use of natural language processing, text analysis, computational logistics and various algorithms. The main target of this paper is to bring into notice the various sentimental analysis techniques which are used widely in the data analysis and the applications of sentimental analysis which can be an important tool for many business and e-commerce website and start-ups if used effectively. KEYWORDS: Sentiment analysis, SVM, Naïvebayes, lexicon. INTRODUCTION The World Wide Web is growing at an alarming rate not only in size but also in the types of services and contents provided. Each and every users are participating more actively and are generating vast amount of new data. In this era of automated systems and digital information every field of life is evolving rapidly and generating data because of which huge amounts of data produced in field of science, engineering, medical, marketing, finance etc. Automated systems are needed automated analysis and classification of data which help to take enterprise level decisions. This analysis techniques include various methods like text analysis, sentimental analysis etc. sentimental analysis is used to find opinions, identify the sentiments they express and classify the polarity as shown in fig below: Fig 1.1 There are three main classifications levels in sentiment analysis: 1. Document level classification. 2. Aspect level classification.3. Sentence level classification Document-level aims to classify an opinion document as expressing a positive or negative opinion or sentiment. It considers the document a basic information unit. Sentence level aims to classify the senment expressed in each sentence. However, there’s not much difference between document level and sentence level because sentence are just short documents. Classifying text at the document level or at the sentence level does not provide the necessary detail needed opinions on all aspects of the entity which is needed in many applications, to obtain these details; we need to go to the aspect level. Aspect-level aims to classify the sentiment with respect to the specific aspects of entities. The first step is to identify the entities and their aspects. The opinion holders can give different opinions for different aspects of the same entity. The data sets used in sentiment analysis are an important issue in this field. The main sources of data are from the product reviews. These reviews are important to the business holders as they can take business decisions according to the analysis results of users’ opinions about their products. The various sentimental classification techniques and algorithms are shown below. The word w is positively correlated to the class i, whenMi(w) is greater than 0. The word w is negatively correlatedto the class iwhen Mi(w) is less than 0. PMI is used in many applications like developing a contextual entropy model to expand a set of seed words generated from a small corpus of stock market news articles. Their contextual entropy measures the similarity between two words by comparing their contextual distributions using an entropy measure allowing discovery of words similar to seed words. Once the seed words has been expanded words are used to classify the sentiments of new articles. 3.1.2. CHI-SQUARE (x2) : Let n be the total number of documents in the collection, p(w) be the conditional probability for class i for documents which contain w, Pi be the global fraction of documents whichcontain w, Pi be the global fraction of documents containingthe class i, and F(w) be the global fraction of documents whichcontain the word w. Therefore, the x2-statistic of the wordbetween word w and class iis defined as Xi2 = n* F(w)2 *[ pi(w) – pi ]2 / [ F(w)*[(1-F(w)]* Pi *(1-P i)] Xi2 and PMI are two different ways of measuring the corelation between the terms and categories. X i2 is better than PMI as it is normalized value; therefore, these values are more comparable across terms in same category. X i2 is used in many applications and one example is contextual advertising. 3.1.3 Latent Semantic Indexing (LSI) Fig 1.2 classification of models 3. DIFFERENT MODELS: As shown in the fig. above different models will be explained: 3.1.1.Point-wise Mutual Information (PMI): The mutual information measure provides a formal way to model the mutual information between the features and the classes. This measure was derived from the information theory. The point-wise mutual information (PMI) Mi(w) between the word w and the class iis defined on the basis of the level of cooccurrence between the class iand word w. The expected cooccurrence of class iand word w, on the basis of mutual independence, is given by Pi * F(w), and the true co-occurrence is given by F(w) *pi(w). The mutual information is defined in terms of the ratiobetween these two values and is given by the followingequation: Mi(w) =[ log (F(w) * Pi (w) / F(w) * Pi ] = log [ pi (w) / Pi] Feature selection models attempt to reduce the dimensionality of data by picking from original set of attributes. Feature transformation features create a smaller set of features of as a function of original set of features. LSI is one of the famous feature transformation models. LSI method transforms text space to new axis systems which is linear combination of original word features. Principal component analysis is used to achieve this goal. It determines the axis system which retains the greatest level of information about the variations in the underlying attribute values. The main disadvantage of LSI is that it is an unsupervised technique which is blind to underlying class-distribution. Therefore, the features found by LSI are not necessarily the directions along which the class – distribution of the underlying documents can be separated. 4.Sentiment classification Techniques: Sentiment classification techniques can be divided into machine learning approach, lexicon based approach and hybrid approach. Machine learning approach uses machine learning algorithms and uses linguistic features. The lexicon based approach depends on sentiment lexicon, a collection of known and precompiled sentiment terms. The hybrid approach combines both approaches and is very common with sentiment lexicons playing an important role in majority of methods. There is a brief description of the algorithms and techniques in next subsection mentioned below 4.1. Lexicon-based approach Opinion words are employed in many sentiment classificationtasks. Positive opinion words are used to express some desiredstates, while negative opinion words are used to express someundesired states. There are also opinion phrases and idiomswhich together are called opinion lexicon. There are three mainapproaches in order to compile or collect the opinion word list.Manual approach is very time consuming and it is not usedalone. It is usually combined with the other two automatedapproaches as a final check to avoid the mistakes that resultedfrom automated methods. The two automated approaches arepresented in the following subsections. 4.1.1. Dictionary-based approach A small set of opinion words is collected manuallywith known orientations. Then, this set is grown by searchingin the well known corpora WordNet or thesaurus fortheir synonyms and antonyms. The newly found words areadded to the seed list then the next iteration starts. The iterativeprocess stops when no new words are found. After the processis completed, manual inspection can be carried out toremove or correct errors.The dictionary based approach has a major disadvantagewhich is the inability to find opinion words with domain andcontext specific orientations. Qiu and He used dictionary-based approach to identify sentiment sentences in contextualadvertising. They proposed an advertising strategy toimprove ad relevance and user experience. They used syntacticparsing and sentiment dictionary and proposed a rule basedapproach to tackle topic word extraction and consumers’ attitudeidentification in advertising keyword extraction. Theyworked on web forums from automotvieforums.com. Theirresults demonstrated the effectiveness of the proposedapproach on advertising keyword extraction and ad selection. 4.1.2. Corpus-based approach The Corpus-based approach helps to solve the problem of finding opinion words with context specific orientations. Its methods depend on syntactic patterns or patterns that occurtogether along with a seed list of opinion words to find otheropinion words in a large corpus. One of these methods wererepresented by Hatzivassiloglou and McKeown Theystarted with a list of seed opinion adjectives, and used themalong with a set of linguistic constraints to identify additionaladjective opinion words and their orientations. The constraintsare for connectives like AND, OR, BUT, EITHEROR. . .. . .;the conjunction AND for example says that conjoined adjectivesusually have the same orientation. This idea is calledsentiment consistency, which is not always consistent practically.There are also adversative expressions such as but,however which are indicated as opinion changes. In order todetermine if two conjoined adjectives are of the same or differentorientations, learning is applied to a large corpus. Then,the links between adjectives form a graph and clustering is performedon the graph to produce two sets of words: positiveand negative. 4.1.2.1. Statistical approach. Finding co-occurrence patterns orseed opinion words can be done using statistical techniques.This could be done by deriving posterior polarities using theco-occurrence of adjectives in a corpus, as proposed by Fahrniand Klenner]. It is possible to use the entire set of indexeddocuments on the web as the corpus for the dictionary construction.This overcomes the problem of the unavailabilityof some words if the used corpus is not large enough.The polarity of a word can be identified by studying theoccurrence frequency of the word in a large annotated corpusof texts. If the word occurs more frequently among positivetexts, then its polarity is positive. If it occurs more frequentlyamong negative texts, then its polarity is negative. Ifit has equal frequencies, then it is a neutral word.The similar opinion words frequently appear together in acorpus. This is the main observation that the state of the artmethods are based on. Therefore, if two words appear togetherfrequently within the same context, they are likely to have thesame polarity. Therefore, the polarity of an unknown wordcan be determined by calculating the relative frequency ofco-occurrence with another word. This could be done usingPMI. 4.1.2.2. Semantic approach. The Semantic approach gives sentimentvalues directly and relies on different principles for computingthe similarity between words. This principle gives similarsentiment values to semantically close words. WordNet forexample provides different kinds of semantic relationshipsbetween words used to calculate sentiment polarities. WordNetcould be used too for obtaining a list of sentiment words by iterativelyexpanding the initial set with synonyms and antonymsand then determining the sentiment polarity for an unknownword by the relative count of positive and negative synonymsof this word 4.1.3. Lexicon-based and natural language processing techniques Natural Language Processing (NLP) techniques are sometimesused with the lexicon-based approach to find the syntacticalstructure and help in finding the semantic relations. Moreoand Romerohave used NLP techniques as preprocessingstage before they used their proposed lexiconbased SA algorithm.Their proposed system consists of an automatic focusdetection module and a sentiment analysis module capableof assessing user opinions of topics in news items which usea taxonomy-lexicon that is specifically designed for news analysis.Their results were promising in scenarios where colloquiallanguage predominates.The approach for SA presented by Caro and Grella was based on a deep NLP analysis of the sentences, using adependency parsing as a pre-processing step. Their SA algorithmrelied on the concept of Sentiment Propagation, whichassumed that each linguistic element like a noun, a verb, etc.can have an intrinsic value of sentiment that is propagatedthrough the syntactic structure of the parsed sentence. Theypresented a set of syntactic-based rules that aimed to cover asignificant part of the sentiment salience expressed by a text.They proposed a data visualization system in which theyneeded to filter out some data objects or to contextualize thedata so that only the information relevant to a user query isshown to the user. In order to accomplish that, they presenteda context-based method to visualize opinions by measuring thedistance, in the textual appraisals, between the query and thepolarity of the words contained in the texts themselves. Theyextended their algorithm by computing the context-basedpolarity scores. Their approach approved high efficiency afterapplying it on a manual corpus of 100 restaurants reviews. 4.2. Machine learning approach: Machine learning approach relies on the famous machine learning algorithms to solve the SA as a regular text classificationproblem that makes use of syntactic and/or linguistic features.Text Classification Problem Definition: We have a set oftraining records D = {X1, X2, . . .,Xn} where each record islabeled to a class. The classification model is related to the features in the underlying record to one of the class labels. Then for a given instance of unknown class, the model is used to predict a class label for it. The hard classification problem is when only one label is assigned to an instance. The soft classification problem is when a probabilistic value of labels is assigned to an instance. 4.2.1. Supervised learning: The supervised learning methods depend on the existence oflabeled training documents. There are many kinds ofsupervised classifiers in literature. In the next subsections, wepresent in brief details some of the most frequently used classifiers in sentiment analysis. 4.2.1.1. Decision tree classifiers. Decision tree classifier providesa hierarchical decomposition of the training data spacein which a condition on the attribute value is used to dividethe data The condition or predicate is the presence orabsence of one or more words. The division of the data spaceis done recursively until the leaf nodes contain certain minimumnumbers of records which are used for the purpose ofclassification. There are other kinds of predicates which depend on thesimilarity of documents to correlate sets of terms which maybe used to further partitioning of documents. The differentkinds of splits are Single Attribute split which use the presenceor absence of particular words or phrases at a particular nodein the tree in order to perform the split. Similarity-basedmulti-attribute split uses documents or frequent words clustersand the similarity of the documents to these words clusters inorder to perform the split. Discriminatbased multi-attributesplit uses discriminants such as the Fisher discriminate forperforming the split 4.2.1.2 Linear classifiers. Given Xi ={x1 . . . . . . xn} is the normalizeddocument word frequency, vector Ai = {ai . . . . . an} isa vector of linear coefficients with the same dimensionality asthe feature space, and b is a scalar; the output of the linearpredictor is defined as p= Ai . Xi + b, which is the output ofthe linear classifier. The predictor p is a separating hyperplanebetween different classes. There are many kinds of linear classifiers;among them is Support Vector Machines (SVM) which is a form of classifiers that attempt to determine goodlinear separators between different classes. Two of the mostfamous linear classifiers are discussed in the followingsubsections. 4.2.1.2.1 Support Vector Machines Classifiers (SVM). Themain principle of SVMs is to determine linear separators in thesearch space which can best separate the different classes. Inthere are 2 classes x, o and there are 3 hyperplanes A,B and C. Hyperplane A provides the best separation betweenthe classes, because the normal distance of any of the datapoints is the largest, so it represents the maximum margin ofseparation.Text data are ideally suited for SVM classification because of the sparse nature of text, in which few features are irrelevant,but they tend to be correlated with one another andgenerally organized into linearly separable categories SVM can construct a nonlinear decision surface in the originalfeature space by mapping the data instances nonlinearly to an inner product space where the classes can be separated linearly with a hyperplane. .This discriminative classifier is considered the best text classification method (Rui Xia, 2011; Ziqiong, 201). M. Rushdi Saleh Et.al (2011) has applied the new research area by using Support Vector Machines (SVM) for testing different domains of data sets and using several weighting schemes. They have accomplished experiments with different features on three corpora. Two of them have already been used in several works. The SINAI Corpus has been built from Amazon.com specifically in order to prove the feasibility of the SVM for different domains. 4.2.1.2.2. Neural Network (NN). Neural Network consists of many neurons where the neuron is its basic unit. The inputs to the neurons are denoted by the vector overlineXi which is the word frequencies in the ith document. There are a set ofweights A which are associated with each neuron used in order to compute a function of its inputs f(.). The linear function of the neural network is: pi = A . XiIn a binary classificationproblem, it is assumed that the class label of Xi is denotedby yiand the sign of the predicted function pi yields the classlabel. Multilayer neural networks are used for non-linear boundaries.These multiple layers are used to induce multiple piecewiselinear boundaries, which are used to approximateenclosed regions belonging to a particular class. 4.2.1.3. Rule-based classifiers. In rule based classifiers, the dataspace is modeled with a set of rules. The left hand side representsa condition on the feature set expressed in disjunctivenormal form while the right hand side is the class label. Theconditions are on the term presence. Term absence is rarelyused because it is not informative in sparse data.There are numbers of criteria in order to generate rules, thetraining phase construct all the rules depending on these criteria.The most two common criteria are support and confidence. The support is the absolute number of instances in thetraining data set which are relevant to the rule. The Confidencerefers to the conditional probability that the right hand side ofthe rule is satisfied if the left-hand side is satisfied. 4.2.1.4. Probabilistic classifiers. Probabilistic classifiers usemixture models for classification. The mixture model assumesthat each class is a component of the mixture. Each mixturecomponent is a generative model that provides the probabilityof sampling a particular term for that component. These kindsof classifiers are also called generative classifiers. Three of themost famous probabilistic classifiers are discussed in the nextsubsections. 4.2.1.4.1. Naïve Bayes Classifier (NB). The Naïve Bayesclassifier is the simplest and most commonly used classifier. Naïve Bayes classification model computes the posterior probabilityof a class, based on the distribution of the words in thedocument. The model works with the BOWs feature extractionwhich ignores the position of the word in the document. It usesBayes Theorem to predict the probability that a given featureset belongs to a particular label. 1. Consider a training set of samples, each with the class labels T. There are k classes, C1, C2, . . . ,Ck. Every sample consists of an n-dimensional vector, X = { x1, x2, . . . ,xn}, representing n measured values of the n attributes, A1,A2, . ,An, respectively. 2. The classifier will classify the given sample X such that it belongs to the class having the highest posterior probability. That is X is predicted to belong to the class Ci if and only P(Ci |X) > P(Cj |X) for 1≤ j ≤ m, j≠ i. Thus we find the class that maximizes P(Ci |X). The maximized value of P(Ci |X) for class Ci is called the maximum posterior hypothesis By bayes theorem: P(A|B) = P(B|A)P(A) 𝑃(𝐴) The simplicity of the naïve bayes theorem is very useful when it comes to document classification (HanhoonKhang Et.al (2012), Melville et al., 2009; Rui Xia, 2011; Ziqiong, 2011).The main idea is to estimate the probabilities of categories given a test document by using the joint probabilities of words and categories. The simplicity of the Naïve Bayes algorithm makes this process efficient. HanhoonKhang Et.al (2012) has proposed an improved version of the Naïve Bayes algorithm and a unigrams + bigrams was used as the feature, the gap between the positive accuracy and the negative accuracy was narrowed to 3.6% compared to when the original Naïve Bayes was used, and that the 28.5% gap was able to be narrowed compared to when SVM was used. 4.2.1.4.2. Bayesian Network (BN). The main assumption ofthe NB classifier is the independence of the features. The otherextreme assumption is to assume that all the features are fullydependent. This leads to the Bayesian Network model which isa directed acyclic graph whose nodes represent randomvariables, and edges represent conditional dependencies. BNis considered a complete model for the variables and their relationships.Therefore, a complete joint probability distribution(JPD) over all the variables, is specified for a model. In Textmining, the computation complexity of BN is very expensive;that is why, it is not frequently used 4.2.1.4.3. Maximum Entropy Classifier: The MaxentClassifier (known as a conditional exponential classifier) convertslabeled feature sets to vectors using encoding. Thisencoded vector is then used to calculate weights for each featurethat can then be combined to determine the most likelylabel for a feature set. This classifier is parameterized by aset of X{weights}, which is used to combine the joint featuresthat are generated from a feature-set by an X{encoding}. Inparticular, the encoding maps each C{(featureset, label)} pairto a vector. The probability of each label is then computedusing the following equation: P(fs | label) = dotprod(weights; encode(fs; label)) sum(dotprod(weights; encode(fs;l))forlinlabels) 4.3. Weakly, learning semi and unsupervised The main purpose of text classification is to classify documentsinto a certain number of predefined categories. In order toaccomplish that, large number of labeled training documentsare used for supervised learning, as illustrated before. In textclassification, it is sometimes difficult to create these labeledtraining documents, but it is easy to collect the unlabeled documents.The unsupervised learning methods overcome thesedifficulties. Many research works were presented in this fieldincluding the work presented by Ko and Seo. They proposeda method that divides the documents into sentences,and categorized each sentence using keyword lists of eachcategory and sentence similarity measure.The concept of weak and semi-supervision is used in manyapplications. Youlan and Zhou have proposed a strategythat works by providing weak supervision at the level of featuresrather than instances. They obtained an initial classifierby incorporating prior information extracted from an existingsentiment lexicon into sentiment classifier model learning.They refer to prior information as labeled features and usethem directly to constrain model’s predictions on unlabeledinstances using generalized expectation criteria. 4.4. Meta classifiers In many cases, the researchers use one kind or more of classifiersto test their work. One of these articles is the work proposedby Lane and Clarke. They presented a MLapproach to solve the problem of locating documents carryingpositive or negative favorability within media analysis. Theimbalance in the distribution of positive and negative samples,changes in the documents over time, and effective training andevaluation procedures for the models are the challenges theyfaced to reach their goal. They worked on three data set generated by a media-analysis company. They classified documentsin two ways: detecting the presence of favorability, andassessing negative vs. positive favorability. They have used fivedifferent types of features to create the data sets from the rawtext. They tested many classifiers to find the best one which are(SVM, K-nearest neighbor, NB, BN, DT, a Rule learner andother). They showed that balancing the class distribution intraining data can be beneficial in improving performance,but NB can be adversely affected. 5. Applications of SENTIMENT Analysis: Each algorithm has its own particular way of analyzing. Using various algorithms sentiment analysis can be performed on social media websites blogs or websites and the analysis can be used widely. Thus the vast amount of unused and useless data can be turned valuable which can be used for various applications mentioned below: 5.1 Stock Mark Prediction: Stock market prediction is one of the important application of sentimental analysis. Stock market events are easily quantifiable using returns from indices and or individual stocks which provide meaningful and automated labels. Machine learning algorithm can be used to extract various significant stock movements that can collect appropriate pre, post and contemporaneous text from social media sources. A label can be provided to each sentence that is extracted related to a particular share and can be labeled as positive or negative. A model can be trained which predicts the labels of future sentences by taking into consideration net sentiment of each day and show it holds great significant power in for subsequent stock market movements. This is how sentimental analysis can be used in stock markets for successful trading strategies based on the system mentioned above and find significant returns over other baseline methods. 5.2 Politics: Sentiment analysis can be used for tracking opinions of public from various public forum and political blogs. It can be used by political organization to track the issues of the public and which issues are close to the voters heart and the political organizations can include them in their rallies which can have a positive effect on the people. It can also be used to identify whether the new scheme government want to finalize, people are happy with it or not. While, it can also be used to predict the poll results during election days by finding the sentiment of the people related to a particular organization. 5.3 Recommender system: Recommender system can be useful for getting user rating from text. Sentimental analysis can be used as a subcomponent technology for recommend systems by not recommending objects that receive negative feedback which can be classified as recommended or not recommended 5.4 Rank finding: Sentimental analysis can be used to track literary reputation. It can be used to perform analytics in a group of blogs that are related to the same field and by analyzing the number of users, comments and reviews one can predict who is more famous or adroit among the people. This can enable us to rank a blog and identify the experts work and rate it as the highest blog rank. 5.5 Business Sentimental analysis has been adopted by many business who deal with markets. The companies can take product reviews using sentimental analysis and also track their brand value as a whole using it. Using the sentiment analysis they can create their marketing strategy accordingly and also fetch any financial news. Various other applications are Automatic tracking of user feedback and opinions of brands and any products they have launched from review sites. Gauging reaction to company-related events and incidents, like during a new product launch it can give them instant feedback about the reception of the new product. It Monitoring crucial issues to avert harmful viral effects, like dealing with customer complaints that occur in social media and routing the complaints to the particular department that can handle it, before the complaints spread. Analyzing purchaser inclinations, competitors, and market Key challenges identified by researchers for this application include, identifying aspects of product, associating opinions with aspects of product, identifying fake reviews and processing reviews with no canonical forms. 5.6 Summarization Key challenges identified by researchers for this application include, identifying aspects of product, associating opinions with aspects of product, identifying fake reviews and processing reviews with no canonical forms. It includes analysis on comments related to features of a product, review sentences that give opinion of each feature and propose a summary of all the extracted extracted information. Summarization of single and multiple documents is also a feature that sentiment analysis can augment. 5.7 Government intelligence Government intelligence is one more application for sentiment analysis. It has been proposed by monitoring sources, the increase in weird or hostile communications can be tracked. It can also be used for efficient rule making where it can be used to assist in automatically analyzing the opinions about pending policies or government-regulation proposals. Other application includes tracking the citizens opinion about a new scheme , predicting the likelihood of success of a new legislative reform. 5.8 Geographical uses: Sentimental analysis can be an effective tool at the time of disaster where the sentiments of the people change according to not only the location of the users but also the distance from the disaster. People usually have many doubts who stay far away from where the disaster has happened so a model can be built which is integrated with the system that can help response organizations to have a real time map which displays both the physical disaster and the spikes of intense emotional activity during the course of the disaster. This can be used for future iteration for getting real time alerts of the emotional status of the affected population. Conclusion and Future scope: Sentiment mining research is of utmost importance not only for commercial establishments but also for the common man. With the World Wide Web offering various ideas and opinions it is very important to be aware of the malicious opinions also. Based on our comprehensive literature reviews and discussions, we argue that we are actually initiating new research questions of analyzing online product reviews and other valuable online information from a domain users point of view and exploring how such online reviews can really benefit ordinary users. In the case of product reviews there exists a visible gap between the designersperspective and the domain users perspective. Also that, not a single classifier can be called completely efficient as the results depend on a number of factors. The data used in SA are mostly on Product Reviews in theoverall count.Naïve Bayes and Support Vector Machines are the most frequentlyused ML algorithms for solving SC problem. They are considereda reference model where many proposed algorithms arecompared to. The other kinds of data areused more frequently over recent years specially the socialmedia. The other kinds of data are news articles or news feeds;web Blogs, social media, and others. SA using non-English languages has attracted researchersInformation from micro-blogs, blogs and forums as well as news source, is widely used in SA recently. This media informationplays a great role in expressing people’s feelings, oropinions about a certain topic or product. Using social networksites and micro-blogging sites as a source of data still needs deeper analysis. There are some benchmark data sets especially in reviews like IMDB which are used for algorithms evaluation. In many applications, it is important to consider the contextof the text and the user preferences. That is why we need to makemore research on context-based SA. Using TL techniques, wecan use related data to the domain in question as a training data.Using NLP tools to reinforce the SA process has attractedresearchers recently and needs some more advancements. The non-English languages includethe other Latin languages (Spanish, Italian); Germaniclanguages (German, Dutch); Far East languages (Chinese,Japanese, Taiwanese); Middle East languages (Arabic). still, the English language is the most frequentlyused language due to the availability of its resources includinglexica, corpora and dictionaries. This opens a new challengeto researchers in order to build lexica, corpora and dictionariesresources for other languages. References: 1. Wilson T, Wiebe J, Hoffman P. Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of HLT/ EMNLP; 2005. 2. Michael Hagenau, Michael Liebmann, Dirk Neumann. Automatednews reading: stock price prediction based on financialnews using contextcapturing features. DecisSuppSyst; 2013. 3. LambovDinko, PaisSebastia˜ o, Dias Ga˜ el. Merged agreementalgorithms for domain independent sentiment analysis. In:Presented at the Pacific Association for, Computational Linguistics(PACLING’11); 2011. 4. Russell, S. &Norvig, P. Artificial Intelligence: A Modern Approach, London: Prentice Hall, 2003. 5. Quinlan, J. R. “Improved use of continuous attributes in C4.5”, Journal of Artificial Intelligence Research, Vol. 4, 1996, pp. 77-90 6. .Langseth, H. & Nielsen, T. “Classification using Hierarchical Naïve Bayes models”, Machine Learning, Vol. 63, No. 2, 2006, pp. 135-159. 7. Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on (Volume:1 ) 8. Das, S. and Chen, M., Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the Asia Pacific Finance Association Annual Conference (APFA),2 001. 9. Ferguson, P., O‘Hare, N., Davy, M., Bermingham, A., Tattersall, S., Sheridan, P., Gurrin, C., and Smeaton, A. F., Exploring the use of paragraphlevel annotations for sentiment analysis in financial blogs.1st Workshop on Opinion Mining and Sentiment Analysis (WOMSA),2009. 10. Denecke, K..,UsingSentiWordNet for Multilingual Sentiment Analysis .Proc. of the IEEE 24th International Conference on Data Engineering Workshop (ICDEW 2008), IEEE Press:507-512.