Sentiment Analysis Kristina Lerman University of Southern California CS 599: Social Media Analysis University of Southern California 1 How do people feel about movies? Estimate this… …using only this! Huge interest, but why? • Help consumers and brands understand the opinions being expressed about – Events • Court decisions, protests, acts of congress – Products • Movies, consumer electronics – People • Political candidates, Dictators – Locations • Restaurants, hotels, vacation destinations Mood and emotion • Moods are physiological in origin – Influenced by levels of neurotransmitters, hormones, … • Moods also depend on external factors – Daily routing, work, commuting, eating, … – Products used by a person • Two dimensions of mood – Positive affect • Enthusiasm, delight, activeness, alertness, happiness, … – Negative affect • Distress, fear, anger, guilt, disgust, sadness, … • Can we accurately measure mood from text? Main ideas • Text messages (tweets, blog posts) use distinctive words to convey emotions – Identify features (words, linguistic features) that are highly indicative of emotions • Train classifier to recognize emotion in text – Supervised machine learning • Need labeled data to train classifier • Features are noisy. How to filter them to improve classifier performance? • What classifier to use? – Automatically classify the emotion of a new text message using only the features of the message Recognizing blog moods “ESSE: Exploring mood on the Web” by Sood and Vasserman • Main idea – Current search engines are able to find content on the web efficiently but do little to connect the user and content with emotion. • Searching for a blog post based on keywords simply returns the posts most associated with those words, but the emotional nature of the posts is not considered. – Train a classifier to recognize the emotion in texts • Contributions – Training data collected from LiveJournal, with user-labeled emotions – Handling challenges: noisy labels Google Blog Search ESSE: Emotional State Search Engine • Searches an index of blogs for key terms • Allows the user to search based on one or more emotions as well as choose a source of indexed blogs Classification • Train classifier to recognize the mood of LiveJournal posts – Training phase • More than 600,000 posts, each represented as a feature vector • Each post labeled by author with one of 130 different moods • Naïve Bayes classifier relates post with its label (user indicated mood) – Test phase • Classifier labels a new post with a mood Training data: LiveJournal Mood distribution Classification (cont) • But, classifier is prone to overfitting – Too many noisy labels, not enough generalization – Solution: use clustering to reduce the number of labels by collapsing related labels into one of 4 (happy, sad, angry, other) • K Means clustering was used to cluster the posts into 3 groups (happy, sad, or angry), removing the outliers and reducing the data set to 31 moods, or ~130,000 posts K Means clustering • Each mood represented as a feature vector – Component of the vector is the number of times that feature occurred in all posts tagged with that mood – Moods “happy”, “sad” and “angry” are initial cluster centroids hyper gloomy happy sad giddy angry irate K Means clustering • Iterate until no further change – Moods closest to cluster centroid are assigned to that cluster – Recalculate cluster centroid hyper gloomy happy sad giddy angry irate Reduced labels Naïve Bayes classifier • The probability of a post being classified as class c, given a set of feature f is equal to the prior (P(c)) times the product of all possibilities of all features, given class c. • Post classified as most likely class, i.e., class c with highest conditional probability ESSE query • After training, the ESSE system is able to search different given indexes • Index scores higher for words used frequently in a document but less so for words that are used frequently in many documents • Mood classification and filtering is performed on the fly Evaluation R = posts relevant to C P = posts labeled with C Evaluation TP = true positives FN TP FP = false positives TN = true negatives FP TN FN = false negatives Evaluation Sentiment of Twitter posts “Twitter as a Corpus for Sentiment Analysis and Opinion Mining” by Pak & Paroubek • Main idea – People widely use microblogging platforms (e.g., Twitter) to express opinions. Understanding opiniosns would be useful for marketing and social sciences – But, it is challenging to extract sentiment from microblog posts, because they are very short (e.g., 140 characters) • Contributions – Automatically collect training data from Twitter – Use linguistic features to automatically recognize the sentiment of posts • Positive, negative, objective Twitter sentiment • Twitter posts often express opinions – Which posts express positive sentiment? Negative sentiment? • Posts are short: few words to go by to recognize an opinion Sentiment classification • Train classifier to recognize positive and negative sentiment • But, need lots of training data containing posts expressing positive and negative opinions, as well as objective posts not expressing an opinion Training data collection • Query Twitter for posts containing – Happy emoticons… :-), :), =), :D, … positive posts – Sad emoticons… :-(, :(, =(, ;(, … negative posts – Links to news articles objective posts Zipf law Distribution of word counts in the data set is a power-law Do linguistic features help? Subjective vs objective posts • Relative prevalence of POS tags across subjective posts (positive or negative) and objective posts “I”, “he” “Wow”, “OMG” sub “most”, “best” obj P “I found”, “you saw” person, place or thing Negative vs Positive • Relative prevalence of POS tags across negative and positive posts • Prevalence has less discriminative power than for objective vs subjective posts -ve “missed”, “bored” “most”, “best” +ve P “whose” Supervised Machine Learning Sentiment Classifier Trojans Rule! input output Labeled messages Positive Classifying the sentiment of tweets • Train the classifier – Features • Remove stop words, URLs • n-gram: sequence of n consecutive words from a post • binary (0,1) feature reflecting presence or absence of an n-gram – Filtering • Discard common n-grams, which are uniformly distributed across all data set. These don’t allow to discriminate between sentiments • Entropy of an n-gram g across different sentiments S. • High entropy g is evenly distributed across all sentiments Given a message M, what is its sentiment s? Unigram (1-gram) trojans rule great home bad news Total count Positive Message Count 6 22 40 10 2 3 5000 Negative Message Count 5 6 1 10 30 7 5000 Objective Message Count 5 25 2 10 2 44 5000 Example of calculating P(s|M) P(+|”trojans rule”) = P(+)* product of probabilities P(unigrams|+) = P(+) * P(“trojans”|+) * P(“rule”|+) =0.333 * 6/5000 * 22/5000 • Similarly for P(-|”trojans rule”) and P(obj|”trojans rule”) • Sentiment with the largest probability wins. Results precision F-measure • Classify the sentiment of 200 messages. • Ground truth: messages were manually annotated for their sentiment recall # training samples Summary • Authors of Twitter messages use linguistic features to describe emotions (positive or negative sentiment messages) or state facts (objective messages) – Some part-of-speech tags may be strong indicators of emotional text • Use examples of positive, negative, and objective messages collected from Twitter to train a classifier – Recognize sentiment of a new message based on its words and POS tags Global mood patterns “Diurnal and seasonal moods vary with work, sleep and daylength across diverse cultures” by Golder and Macy • Can automated sentiment analysis be applied to social media data to provide a global picture of human mood? • Do moods have a time scale: diurnal, seasonal? Corpus of Twitter tweets • • • • • • Up to 400 public messages from each user 2.4 million individuals worldwide 509 million messages between 2/08-1/10 84 identified countries English only Date, Time, and country latitude LIWC • Linguistic Inquiry and Word Count • James W. Pennabaker, U. Texas @ Austin – “Virtually no one in psychology has realized that low-level words can give clues to large-scale behaviors” – Recent book: The Secret Life of Pronouns (2011) • 4,500 words and word stems – Each in one or more word categories • “cried” in sadness, negative emotion, overall affect, verb, past tense verb. • 0.88 sensitivity and 0.97 specificity Testing LIWC Online http://liwc.net/liwcresearch07.php Methodology • Examined within-individual Positive Affect (PA) and Negative Affect (NA) independently, – E.g., fraction of PA words appearing in an individual’s messages every hour – To eliminate between-individual variation, subtract the mean: PAu*(h) = PAu(h) - <PAu(h)> • Additional analysis on 4 English-speaking regions: Africa, India, UK/Aus, US/Can Two peaks in PA daily; PA is higher on weekends Mood governed by diurnal cycles, not culture PA is higher when days are growing longer Small correlation No significant correlation Digression: significance testing • “PA is higher when change in daylength is positive … (r=0.00121, p<0.001)” • Is there a trend? (measure correlation) – Calculate correlation: y vs x • Is the trend significant? Or can it be observed purely by chance? – Null-hypothesis: there is no trend – Calculate p-value P-value • "reject the null hypothesis" when the p-value turns out to be less than a predetermined significance level – Significance level often set to 0.05 or 0.01 • Correlation values are normally distributed – P-value is the probability of falsely rejecting the null hypothesis Summary • Confirm findings from psychological studies – Psychology studies are small scale, on homogeneous population vs Twitter study on large, heterogeneous population – Mood changes are associated with diurnal (sleep-wake) cycles • PA highest in the morning and before midnight • PA highest on weekends • Universal and independent of culture – Seasonal mood changes • PA decreases as days grow shorter “winter blues” • Possible to do psychology through text analysis of social media data