Virtual Knowledge Studio (VKS) Information Studies SentiStrength: Sentiment Strength Detection in MySpace and Twitter Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK SentiStrength Objective 1. Detect positive and negative sentiment strength in short informal text 1. 2. 3. Develop workarounds for lack of standard grammar and spelling Harness emotion expression forms unique to MySpace or CMC (e.g., :-) or haaappppyyy!!!) Classify simultaneously as positive 1-5 AND negative 1-5 sentiment 2. Apply to MySpace comments and social issues SentiStrength Algorithm - Core List of 890 positive and negative sentiment terms and strengths (1 to 5), e.g. ache = -2, dislike = -3, hate=-4, excruciating -5 encourage = 2, coolest = 3, lover = 4 Sentiment strength is highest in sentence; or highest sentence if multiple sentences Examples positive, negative -2 1, -2 My legs ache. 3 You are the coolest. 3, -1 I hate Paul but encourage him. 2, -4 -4 2 Term Strength Optimisation Term strengths (e.g., ache = -2) initially fixed by human coder Term strengths optimised on training set with 10-fold cross-validation Adjust term strengths to give best training set results then evaluate on test set E.g., training set: “My legs ache”: coder sentiment = 1,-3 => adjust sentiment of “ache” from -2 to -3. SentiStrength Algorithm -Extra Spelling correction for repeated letters Helllllo -> Hello (emphasis: llll) Tagging approach used (see next slide) Extra heuristics Emphasis acts to enhance + or – emotion Emotion words ignored in questions Take strongest positive or negative expression in whole comment Booster words (e.g., very, some) Tagging HIIIIII MY MATE!!!!!!!! <w equiv="HI" em="IIIII">HIIIIII</w> <w>MY</w> <w>MATE</w> <p equiv="!" em="!!!!!!!">!!!!!!!!</p> HI MY MATE! 2 Overall 3, -1 3 mate = 2 Experiments Development data = 2600 MySpace comments coded by 1 coder Test data = 1041 MySpace comments coded by 3 independent coders Comparison against a range of standard machine learning algorithms Inter-coder agreement Krippendorff’s inter-coder weighted alpha = 0.5743 for positive and 0.5634 for negative sentiment Only moderate agreement between coders but it is a hard 5-category task Comparison +ve -ve agree- agreement ment Coder 1 vs. 2 51.0% 67.3% Coder 1 vs. 3 55.7% 76.3% Coder 2 vs. 3 61.4% 68.2% Machine learning methods +ve Machine learning methods -ve Results:+ve sentiment strength Algorithm Opt. AccuFeat. racy Acc. +/- 1 class Corr. Mean % abs. error SentiStrength - 60.6% 96.9% .599 22.0% Simple logistic regression 700 58.5% 96.1% .557 23.2% SVM (SMO) 800 57.6% 95.4% .538 24.4% J48 classification tree 700 55.2% 95.9% .548 24.7% JRip rule-based classifier 700 54.3% 96.4% .476 28.2% SVM regression (SMO) 100 54.1% 97.3% .469 28.2% AdaBoost 100 53.3% 97.5% .464 28.5% Decision table 200 53.3% 96.7% .431 28.2% Multilayer Perceptron 100 50.0% 94.1% .422 30.2% Naïve Bayes 100 49.1% 91.4% .567 27.5% Baseline - 47.3% 94.0% - 31.2% Random - 19.8% 56.9% .016 82.5% Results:-ve sentiment strength Algorithm Opt. feat. Accuracy Acc. +/- 1 class Corr. Mean % absolute error SVM (SMO) 100 73.5% 92.7% .421 16.5% SVM regression (SMO) 300 73.2% 91.9% .363 17.6% Simple logistic regression 800 72.9% 92.2% .364 17.3% SentiStrength - 72.8% 95.1% .564 18.3% Decision table 100 72.7% 92.1% .346 17.0% JRip rule-based classifier 500 72.2% 91.5% .309 17.3% J48 classification tree 400 71.1% 91.6% .235 18.8% Multilayer Perceptron 100 70.1% 92.5% .346 20.0% AdaBoost 100 69.9% 90.6% - 16.8% Baseline - 69.9% 90.6% - 16.8% Naïve Bayes 200 68.0% 89.8% .311 27.3% Random - 20.5% 46.0% .010 157.7% SentiStrength Components Type Consecutive +ve words not used as boosters % 61.2 Emoticons ignored Negating words not switch (e.g., not happy) SentiStrength standard configuration 61.2 61.0 60.9 Booster words ignored (e.g., very) Automatic spelling correction disabled Exclamation marks not given a strength of 2 Extra multiple letters not used as boosters 60.7 60.6 60.6 60.4 Neutral words with emphasis not counted as +ve SentiStrength with all the above changes 60.1 57.5 Example differences/errors THINK 4 THE ADD Computer (1,-1), Human (2,-1) 0MG 0MG 0MG 0MG 0MG 0MG 0MG 0MG!!!!!!!!!!!!!!!!!!!!N33N3R!!!!!!!!!!!!!!!! Computer (2,-1), Human (5,-1) Selected variations tested Accuracy +/- 1 class corr. Modification (for positive sentiment) Mean Abs. % err. Negating words not used to switch following sentiment (e.g., not happy) 60.87% 97.50% .6206 21.28% SentiStrength standard algorithm 60.64% 96.90% .5986 21.96% Exclamation marks not given a strength of 2 60.51% 96.62% .6035 21.47% Automatic spelling correction disabled 60.39% 96.88% .5961 22.05% Extra multiple letters not used as emotion boosters 60.21% 96.81% .5952 22.16% Neutral words with emphasis not counted as positive emotion 60.13% 96.79% .5966 21.90% SentiStrength with no extras 57.44% 96.07% .6073 21.91% Application - Evidence of emotion homophily in MySpace Automatic analysis of sentiment in 2 million comments exchanged between MySpace friends Correlation of 0.227 for +ve emotion strength and 0.254 for –ve People tend to use similar but not identical levels of emotion to their friends in messages Sentistrength Collective Emotions in Cyberspace CYBEREMOTIONS = data gathering + complex systems methods + ICT outputs Application – sentiment in Twitter events Analysis of a corpus of 1 month of English Twitter posts Automatic detection of spikes (events) Sentiment strength classification of all posts Assessment of whether sentiment strength increases during important events Result – negative sentiment normally increases, positive sentiment might tend to increase Automatically-identified Twitter spikes Chile Hawaii #oscars Tiger Woods Conclusion Automatic classification of emotion on a 5 point positive and negative scale seems possible for MySpace …And other similar short computer text messages? Hard to get accuracy much over 60%? Next = analyse emotion in online debates Publication Thelwall, M., Buckley, K., Paltoglou, G. Cai, D., & Kappas, A. (in press). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology. Thelwall, M., Wilkinson, D. & Uppal, S.(2010). Data mining emotion in social network communication: Gender differences in MySpace, Journal of the American Society for Information Science and Technology, 61(1), 190-199.