slides - gplsi

advertisement
Virtual Knowledge Studio (VKS)
Information Studies
SentiStrength: Sentiment
Strength Detection in MySpace
and Twitter
Mike Thelwall
Statistical Cybermetrics Research
Group
University of Wolverhampton, UK
SentiStrength Objective
1. Detect positive and negative sentiment
strength in short informal text
1.
2.
3.
Develop workarounds for lack of standard
grammar and spelling
Harness emotion expression forms unique to
MySpace or CMC (e.g., :-) or haaappppyyy!!!)
Classify simultaneously as positive 1-5 AND
negative 1-5 sentiment
2. Apply to MySpace comments and social
issues
SentiStrength Algorithm - Core
List of 890 positive and negative
sentiment terms and strengths (1 to 5),
e.g.


ache = -2, dislike = -3, hate=-4,
excruciating -5
encourage = 2, coolest = 3, lover = 4
Sentiment strength is highest in
sentence; or highest sentence if multiple
sentences
Examples
positive, negative
-2
1, -2
My legs ache.
3
You are the coolest.
3, -1
I hate Paul but encourage him.
2, -4
-4
2
Term Strength Optimisation
Term strengths (e.g., ache = -2) initially
fixed by human coder
Term strengths optimised on training
set with 10-fold cross-validation


Adjust term strengths to give best training
set results then evaluate on test set
E.g., training set: “My legs ache”: coder
sentiment = 1,-3 => adjust sentiment of
“ache” from -2 to -3.
SentiStrength Algorithm -Extra
Spelling correction for repeated letters

Helllllo -> Hello (emphasis: llll)
Tagging approach used

(see next slide)
Extra heuristics




Emphasis acts to enhance + or – emotion
Emotion words ignored in questions
Take strongest positive or negative expression in
whole comment
Booster words (e.g., very, some)
Tagging
HIIIIII MY MATE!!!!!!!!
<w equiv="HI" em="IIIII">HIIIIII</w>
<w>MY</w>
<w>MATE</w>
<p equiv="!" em="!!!!!!!">!!!!!!!!</p>
HI MY MATE!
2
Overall 3, -1
3
mate = 2
Experiments
Development data = 2600 MySpace
comments coded by 1 coder
Test data = 1041 MySpace comments
coded by 3 independent coders
Comparison against a range of standard
machine learning algorithms
Inter-coder agreement
Krippendorff’s inter-coder
weighted alpha = 0.5743
for positive and 0.5634
for negative sentiment
Only moderate agreement
between coders
but it is a hard 5-category task
Comparison
+ve
-ve
agree- agreement ment
Coder 1 vs. 2
51.0% 67.3%
Coder 1 vs. 3
55.7% 76.3%
Coder 2 vs. 3
61.4% 68.2%
Machine learning methods +ve
Machine learning methods -ve
Results:+ve sentiment strength
Algorithm
Opt. AccuFeat.
racy
Acc.
+/- 1
class
Corr. Mean %
abs.
error
SentiStrength
-
60.6%
96.9%
.599
22.0%
Simple logistic regression
700
58.5%
96.1%
.557
23.2%
SVM (SMO)
800
57.6%
95.4%
.538
24.4%
J48 classification tree
700
55.2%
95.9%
.548
24.7%
JRip rule-based classifier
700
54.3%
96.4%
.476
28.2%
SVM regression (SMO)
100
54.1%
97.3%
.469
28.2%
AdaBoost
100
53.3%
97.5%
.464
28.5%
Decision table
200
53.3%
96.7%
.431
28.2%
Multilayer Perceptron
100
50.0%
94.1%
.422
30.2%
Naïve Bayes
100
49.1%
91.4%
.567
27.5%
Baseline
-
47.3%
94.0%
-
31.2%
Random
-
19.8%
56.9%
.016
82.5%
Results:-ve sentiment strength
Algorithm
Opt.
feat.
Accuracy
Acc.
+/- 1
class
Corr.
Mean %
absolute
error
SVM (SMO)
100
73.5%
92.7%
.421
16.5%
SVM regression (SMO)
300
73.2%
91.9%
.363
17.6%
Simple logistic regression 800
72.9%
92.2%
.364
17.3%
SentiStrength
-
72.8%
95.1%
.564
18.3%
Decision table
100
72.7%
92.1%
.346
17.0%
JRip rule-based classifier
500
72.2%
91.5%
.309
17.3%
J48 classification tree
400
71.1%
91.6%
.235
18.8%
Multilayer Perceptron
100
70.1%
92.5%
.346
20.0%
AdaBoost
100
69.9%
90.6%
-
16.8%
Baseline
-
69.9%
90.6%
-
16.8%
Naïve Bayes
200
68.0%
89.8%
.311
27.3%
Random
-
20.5%
46.0%
.010
157.7%
SentiStrength Components
Type
Consecutive +ve words not used as boosters
%
61.2
Emoticons ignored
Negating words not switch (e.g., not happy)
SentiStrength standard configuration
61.2
61.0
60.9
Booster words ignored (e.g., very)
Automatic spelling correction disabled
Exclamation marks not given a strength of 2
Extra multiple letters not used as boosters
60.7
60.6
60.6
60.4
Neutral words with emphasis not counted as +ve
SentiStrength with all the above changes
60.1
57.5
Example differences/errors
THINK 4 THE ADD

Computer (1,-1), Human (2,-1)
0MG 0MG 0MG 0MG 0MG 0MG 0MG
0MG!!!!!!!!!!!!!!!!!!!!N33N3R!!!!!!!!!!!!!!!!

Computer (2,-1), Human (5,-1)
Selected variations tested
Accuracy +/- 1
class
corr.
Modification (for positive sentiment)
Mean
Abs.
% err.
Negating words not used to switch
following sentiment (e.g., not happy)
60.87% 97.50% .6206
21.28%
SentiStrength standard algorithm
60.64% 96.90% .5986
21.96%
Exclamation marks not given a strength
of 2
60.51% 96.62% .6035
21.47%
Automatic spelling correction disabled
60.39% 96.88% .5961
22.05%
Extra multiple letters not used as
emotion boosters
60.21% 96.81% .5952
22.16%
Neutral words with emphasis not
counted as positive emotion
60.13% 96.79% .5966
21.90%
SentiStrength with no extras
57.44% 96.07% .6073
21.91%
Application - Evidence of emotion
homophily in MySpace
Automatic analysis of sentiment in 2
million comments exchanged between
MySpace friends
Correlation of 0.227 for +ve emotion
strength and 0.254 for –ve
People tend to use similar but not
identical levels of emotion to their
friends in messages
Sentistrength
Collective Emotions
in Cyberspace
CYBEREMOTIONS = data gathering + complex systems methods + ICT outputs
Application – sentiment in Twitter
events
Analysis of a corpus of 1 month of English
Twitter posts
Automatic detection of spikes (events)
Sentiment strength classification of all posts
Assessment of whether sentiment strength
increases during important events

Result – negative sentiment normally increases,
positive sentiment might tend to increase
Automatically-identified Twitter
spikes
Chile
Hawaii
#oscars
Tiger Woods
Conclusion
Automatic classification of emotion on a
5 point positive and negative scale
seems possible for MySpace
…And other similar short computer text
messages?
Hard to get accuracy much over 60%?
Next = analyse emotion in
online debates
Publication
Thelwall, M., Buckley, K., Paltoglou, G. Cai, D.,
& Kappas, A. (in press). Sentiment strength
detection in short informal text. Journal of the
American Society for Information Science and
Technology.
Thelwall, M., Wilkinson, D. & Uppal, S.(2010). Data mining
emotion in social network communication: Gender differences in
MySpace, Journal of the American Society for Information
Science and Technology, 61(1), 190-199.
Download