Comparing and Combining Sentiment Analysis Methods

advertisement
Comparing and Combining
Sentiment Analysis Methods
Pollyanna Gonçalves (UFMG, Brazil)
Fabrício Benevenuto (UFMG, Brazil)
Matheus Araújo (UFMG, Brazil)
Meeyoung Cha (KAIST, Korea)
Sentiment Analysis on Social Networks
 Key component of a new wave of applications that explore social network
data
 Summary of public opinion about:
 politics, products, services (e.g. a new car, a movie), etc.
 Monitor social network data (in real-time)
 Common as polarity analysis (positive or negative)
Sentiment Analysis Methods
 Which method to use?
 There are several methods proposed for different contexts
 There are several popular methods
 Validations based on examples, comparisons with baseline, with use of
limited datasets
 There is not a proper comparison among methods
 Advantages? Disadvantages? Limitations?
This talk
 Compare 8 popular sentiment analysis methods
 Focus on the task of detecting polarity: positive vs. negative
 Combine methods
 Deploy the methods in a system --- www.ifeel.dcc.ufmg.br
Methods &
Methodology
Comparing
& Combining
Ifeel System
& Conclusions
Emoticons
 Extracted from instant messages services
 Skype, MSN, Yahoo Messages, etc.
 Grouped as positive and negative
Linguistic Inquiry and Word Count (LIWC)
 Lexical method (paid software)
 Allows to optimize the lexical dictionary -> we used the default
 Measures various emotional, cognitive, and structural components
 We only consider sentiment-relevant categories such as positivity, negativity
SentiWordNet
 Lexical approach based on the WordNet dictionary
 Groups words in synonyms
 Detects positivity, negativity, and neutrality of texts
PANAS-t
 Lexical method adapted from a psychometric scale
 Consists of a dictionary of adjectives associated to sentiments
 Positive: Joviality, assurance, serenity, and surprise
 Negative: Fear, sadness, guilt, hostility, shyness and fatigue
Happiness Index
 Uses a well-known lexical dictionary namely Affective Norms for
English Words (ANEW)
 Produces a scale of happiness
 1 (extremely happy) to 9 (extremely unhappy)
 We consider [1..5) for negative and [5..9] for positive
SentiStrengh
 Combines 9 supervised machine learning methods
 Estimates the strength of positive and negative sentiment in a text
 We used the trained model provided by the authors
SAIL/AIL Sentiment Analyzer (SASA)
 Machine learning method, trained with Naïve Bayes’ model
 Trained model implemented as a python library
 Classify tweets in JSON format for positive, negative, neutral and
unsure
SenticNet
 Extract cognitive and affective
information using natural language
processing techniques
 Uses the affective categorization model
Hourglass of Emotions
 Provides an approach that classify
messages as positive and negative
Methodology
 Comparison of coverage and prediction performance across different
datasets
 Dataset 1: human labeled
 About 12,000 messages labeled with Amazon Mechanical Turk:
 Twitter, MySpace, YouTube and Digg comments, BBC and Runners World forums
 Dataset 2: unlabeled
 Complete snapshot from Twitter (collected in 2009) ~2 billion tweets
 Extracted tragedies, disasters, movie releases, and political events
 Focus on the English messages
Methods &
Methodology
Comparing
& Combining
Ifeel System
& Conclusions
What is the coverage of each method?
Coverage vs. Prediction Performance
 Emoticons: best prediction and worst coverage
 SentiStrenght: second in prediction and third in coverage
Prediction Performance across datasets
Twitter
MySpace
Youtube
BBC
Digg
Runners World
PANAS-t
0.643
0.958
0.737
0.396
0.476
0.698
Emoticons
0.929
0.952
0.948
0.359
0.939
0.947
SASA
0.750
0.710
0.754
0.346
0.502
0.744
SenticNet
0.757
0.884
0.810
0.251
0.424
0.826
SentiWordNet
0.721
0.837
0.789
0.384
0.456
0.780
SentiStrength
0.843
0.915
0.894
0.532
0.632
0.778
Happiness Index
0.774
0.925
0.821
0.246
0.393
0.832
LIWC
0.690
0.862
0.731
0.377
0.585
0.895
 Strong variations across datasets
Prediction Performance across datasets
Twitter
MySpace
Youtube
BBC
Digg
Runners World
PANAS-t
0.643
0.958
0.737
0.396
0.476
0.698
Emoticons
0.929
0.952
0.948
0.359
0.939
0.947
SASA
0.750
0.710
0.754
0.346
0.502
0.744
SenticNet
0.757
0.884
0.810
0.251
0.424
0.826
SentiWordNet
0.721
0.837
0.789
0.384
0.456
0.780
SentiStrength
0.843
0.915
0.894
0.532
0.632
0.778
Happiness Index
0.774
0.925
0.821
0.246
0.393
0.832
LIWC
0.690
0.862
0.731
0.377
0.585
0.895
 Worst performance for datasets containing formal text
Polarity Analysis
Detected only
positive
Sentiments!
Even disasters were
classified
predominantly as
positive
 Methods tend to detect more positive sentiments
 Positive as positive is usually greater than negative as negative
Combined Method
 Combines 7, of the 8 methods analyzed
 Emoticons, SentiStrength, Happiness Index, SenticNet, SentiWordNet, PANAS-t, SASA
 Removed LIWC (paid method)
 Weights are distributed according to the rank of prediction performance:
 Higher weight for the method with highest F-measure
 Emoticon received weight 7 and PANAS-t 1
Combined Method
 Best coverage and second in prediction performance
 4 methods combined are sufficient
Methods &
Methodology
Comparing
& Combining
Ifeel System
& Conclusions
iFeel (Beta version)
www.ifeel.dcc.ufmg.br
 Example for:
 “Feeling too happy today :)“
 Deploys all methods, except LIWC
 Allows to evaluate an entire file
 Allows to change parameters on the
methods
Conclusions
 We compare 8 popular sentiment analysis methods for detecting polarity
 No method had the best results in all analysis
 Prediction performance largely varies according to the dataset
 Most methods are biased towards positivity
 We propose a combined method
 Achieves high coverage and high prediction performance
 Ifeel: methods deployed and easily available
 Future work: Compare others methods like POMS and EMOLEX
Thank you!
Questions?
www.dcc.ufmg.br/~fabricio
www.ifeel.dcc.ufmg.br
fabricio@dcc.ufmg.br
Download