Sentiment Lexicons Instructor: Smaranda Muresan Columbia University smara@ccls.columbia.edu Announcements • Class setup on Courseworks too. Class website linked to Courseworks (“Class website” tab) • TA’s (Arpit Gupta) office hours – Monday 4:15-5:15pm in TA room in Mudd • TA’s email: – ta.cmsm@gmail.com Class Today • Word level sentiment analysis (Sentiment Lexicons) • Discussion of the two papers • Introduction to Sentiment Analysis beyond words (sentence level, text level) (to facilitate discussion of articles next week) What is sentiment analysis? • Attempts to identify the sentiment/opinion that a person may hold towards an object/person/topic etc • It is a finer grain analysis compared to subjectivity analysis Sentiment Analysis Positive Negative Subjectivity analysis Subjective This film should be brilliant. It sounds like a great plot, Neutral Objective the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up. Why sentiment analysis? • Movie: is this review positive or negative? • Products: what do people think about the new iPhone? • Public sentiment: how is consumer confidence? Is despair increasing? • Politics: what do people think about this candidate or issue? • Prediction: predict election outcomes or market trends from sentiment 5 Goal of today’s lecture • Gain insights into how sentiment is expressed lexically • Begin developing resources that are useful in higher level classification (phrase level, sentence level, document level) • Explore different philosophies on how to build such large scale sentiment lexicons What are we classifying gross (gross,adj) (gross,noun) (gross,verb) gross out GROSS!!! The soup was gross – 1 star The horror movie was gross – 5 stars Words • Adjectives – positive: honest important mature large patient • Ron Paul is the only honest man in Washington. • Kitchell’s writing is unbelievably mature and is only likely to get better. • To humour me my patient father agrees yet again to my choice of film Words • Adjectives – negative: harmful hypocritical inefficient insecure • It was a macabre and hypocritical circus. • Why are they being so inefficient ? Slide from Janyce Wiebe Other parts of speech • Verbs – positive: praise, love – negative: blame, criticize • Nouns – positive: pleasure, enjoyment – negative: pain, criticism How to build sentiment lexicons • Hand Annotated/Compiled Lexicons • WordNet-based approaches • Distributional Approaches General Inquirer (GI) • Harvard General Inquirer Database (Stone, 1966) – Total of 11,788 terms – http://www.wjh.harvard.edu/~inquirer/spreadsheet_guide .htm – http://www.wjh.harvard.edu/~inquirer/homecat.htm – Positive (1915 words) vs Negative (2291 words) • (rest of 7582 could be consider Neutral) – – – – – Strong vs Weak Active vs Passive Overstated versus Understated Pleasure, Pain, Virtue, Vice Motivation, Cognitive Orientation, etc WordNet (Miller, 1995; Fellbaum, 1998) • Semantic Lexical resource • http://wordnetweb.princeton.edu/perl/webwn www.globalwordnet.org (multilingual) Synsets (denote different senses of a word) Micro-WNOp (Cerini et al 1997) 1105 Wordnet Sysnsets related to opinion topic (initial words were selected from the GI) http://www-3.unipv.it/wnop/ Micro-WNOp (Carrenini et al 1997) Micro-WNOp statistics reduced to the 702 sysnsets when everyone agreed ISSUES with Hand built Lexicons such as GI, Micro-WNOp??? How to build sentiment lexicons • Hand Annotated/Compiled Lexicons • WordNet-based approaches • Distributional Approaches Simple sense/sentiment propagation • Hypothesis: Sentiment is constant throughout regions of lexically related items. Thus, sentiment properties of hand-built seed-sets will be preserved as we follow WordNet relations out from them. • SentiWordNet (Esuli and Sebastiani, 2006) – Approx 1.7 Million words – Using WordNet and Machine Learning (Classifiers). – Each synset is assigned three scores • Positive • Negative • Objective Values in 3 dimension sum to 1. Ex: P=0.75, N=0, O=0.25 Building SentiWordNet • Lp, Ln, Lo are the three seed sets • Iteratively expand the seed sets through K steps • Train the classifier for the expanded sets Expansion of seed sets Ln Lp The sets at the end of kth step are called Tr(k,p) and Tr(k,n) Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n) Committee of classifiers • Train a committee of classifiers of different types and different K-values for the given data • Observations: – Low values of K give high precision and low recall – Accuracy in determining positivity or negativity, however, remains almost constant Useful Sentiment Tutorial • http://sentiment.christopherpotts.net/ • Has code related to WordNet propagation methods (used in SentiWordNet) • Many other pointers! • Issues with the WordNet based propagation lexicons? Other Sentiment Lexicons MPQA Subjectivity Cues Lexicon Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005. Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003. • Home page: http://www.cs.pitt.edu/mpqa/subj_lexicon.html • 6885 words from 8221 lemmas – 2718 positive – 4912 negative • Each word annotated for intensity (strong, weak) • GNU GPL 24 Bing Liu Opinion Lexicon Minqing Hu and Bing Liu. Mining and Summarizing Customer Reviews. ACM SIGKDD-2004. • Bing Liu's Page on Opinion Mining • http://www.cs.uic.edu/~liub/FBS/opinionlexicon-English.rar • 6786 words – 2006 positive – 4783 negative 25 Disagreements between polarity lexicons Christopher Potts, Sentiment Tutorial, 2011 Opinion Lexicon MPQA Opinion Lexicon General Inquirer 33/5402 (0.6%) General Inquirer SentiWordNet 49/2867 (2%) 1127/4214 (27%) 32/2411 (1%) 1004/3994 (25%) 520/2306 (23%) SentiWordNet 26 How to build sentiment lexicons • Hand Annotated/Compiled Lexicons • WordNet-based approaches • Distributional Approaches – 2 papers for discussion today Predicting the semantic orientation of adjectives Hatzivassiloglou & McKeown 1997 Presenter: Smaranda Muresan Goal • Predicting polarity of adjectives from a large corpus • Test the hypothesis: the morphosyntactic properties of coordination provide reliable information about adjectival oppositions and lexical polarities • Adjectives conjoined by “and” have same polarity – Fair and legitimate, corrupt and brutal – *fair and brutal, *corrupt and legitimate • Adjectives conjoined by “but” do not – fair but brutal Approach • Extract conjunctions of adjectives from a large corpus, along with relevant morphological relations • Use a log-linear regression model to predict orientation of two different adjectives • Use a clustering algorithm to separate the adjectives into two subsets of different orientation • Use average frequencies in each group to assign the label (group with highest frequency is labeled positive) Seed data • Label seed set of 1336 adjectives (all >20 in 21 million word Wall Street Journal corpus) – 657 positive • adequate central clever famous intelligent remarkable reputed sensitive slender thriving… – 679 negative • contagious drunken ignorant lanky listless primitive strident troublesome unresolved unsuspecting… Further validation: ask 4 human judges to label a subset of 500 adjectives: 96.97% average inter-judge agreement 32 Validating the Hypothesis • Run a parser on 21 million words dataset to get 15,048 conjunction tokens involving 9,296 pairs of distinct adjective pairs. • Each conjunction was classified into : – 1) conjunction used (and, or, but ,…) – 2) type of modification (attributive, predicative) – 3) number modified noun (singular or plural) • Considered conjunction where both members were in the seed set (e.g. clever and sensitive) • Count percentage of conjunction in each category with adjectives of same or different orientation Validating Hypothesis For almost all the cases p-values are low. Hence the statistics are significant. ‘and’ usually joins adjectives of same orientation ‘but’ is opposite and joins adjectives of different orientation Link Prediction brutal helpful corrupt nice fair irrational classy Baseline: always use same orientation – 77.84% the “but” rule morphological rules (adequate-inadequate) Better idea: supervised learning using log-linear regression Result of Prediction • Log Linear Regression models performs slightly better than baseline Clustering for partitioning the graph into two groups Log Linear model generates a dissimilarity score between two adjective between 0 and 1 brutal helpful corrupt nice fair irrational classy 37 Labeling the clusters Two key insights about pairs of words of opposite orientations: - semantically unmarked member has positive orientation (e.g honest (unmarked) vs dishonest (marked)) - semantically unmarked member is the most frequent + brutal helpful corrupt nice fair irrational classy 38 Output polarity lexicon • Positive – bold decisive disturbing generous good honest important large mature patient peaceful positive proud sound stimulating straightforward strange talented vigorous witty… • Negative – ambiguous cautious cynical evasive harmful hypocritical inefficient insecure irrational irresponsible minor outspoken pleasant reckless risky selfish tedious unsupported vulnerable wasteful… 39 Output polarity lexicon • Positive – bold decisive disturbing generous good honest important large mature patient peaceful positive proud sound stimulating straightforward strange talented vigorous witty… • Negative – ambiguous cautious cynical evasive harmful hypocritical inefficient insecure irrational irresponsible minor outspoken pleasant reckless risky selfish tedious unsupported vulnerable wasteful… 40 Evaluating Clustering of Adjectives • Tried to account for graph connectivity • Used the adjectives from seed set (A) and links given by conjunction and morphological rules • Separate in training/testing using a parameter α – higher α creates subset of A such that more adjectives are connected to each other. Clustering Results • Highest accuracy obtained when highest number of links were present. • Ratio of group frequency correctly identified the positive subgroup Graph Connectivity and Performance • Parameter P measures how well each link is predicted independently – Precision • Parameter k – average number of links for each adjective: • Goal: even if P is low, given enough data (high k) a high performance for group prediction is achieved Results Discussion points What do you see the major contribution of this paper? - Helps to highlight in a quantitative way the relationship between sentiment and particular words and constructions (coordination)- useful linguistic insight - corpus best method (thus avoiding limitation of human built resources such as WordNet) - Can be extended to nouns and verbs. • Classic paper, cited 1127 times Discussion points • Does it have all the information for anyone to be able to replicate the results? – How is the dissimilarity value computed? (multiple values are delivered for an adjective pair in different environments) • What are the limitations of the approach? – Method is limited by human cleverness in coming up with useful constructions Velikovich et al Class Today • Word level sentiment analysis (Sentiment Lexicons) • Discussion of the two papers • Introduction to Sentiment Analysis beyond words (phrase level, text level) (to facilitate discussion of articles next week) What is sentiment analysis? • Attempts to identify the sentiment/opinion/attitude that a person may hold towards an object/person/topic etc Components 1. Holder (source) of attitude 2. Target (aspect) of attitude 3. Type of attitude • From a set of types – Like, love, hate, value, desire, etc. • Or (more commonly) simple weighted polarity: – positive, negative, neutral, together with strength 4. Text containing the attitude • Sentence or entire document 50 This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up. Sentiment Analysis • Simplest task: – Is the attitude of this text positive or negative? • More complex: – Rank the attitude of this text from 1 to 5 • Advanced: – Detect the target, source, or complex attitude types Sentiment Analysis • Simplest task: – Is the attitude of this text positive or negative? • More complex: – Rank the attitude of this text from 1 to 5 • Advanced: – Detect the target, source, or complex attitude types Sentiment Analysis A Baseline Algorithm Sentiment Classification in Movie Reviews Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86. Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. ACL, 271-278 • Polarity detection: – Is an IMDB movie review positive or negative? • Data: Polarity Data 2.0: – http://www.cs.cornell.edu/people/pabo/movie -review-data Text Classification: definition • The classifier (test phase): – Input: a document d (e.g., a movie review) – Output: a predicted class c from some fixed set of labels c1,...,cK (e,g,pos, neg) • The learner (training phase): – Input: a set of m hand-labeled documents (d1,c1),....,(dm,cm) – Output: a learned classifier f:d c IMDB data in the Pang and Lee database ✓ when _star wars_ came out some twenty years ago , the image of traveling throughout the stars has become a commonplace image . […] when han solo goes light speed , the stars change to bright lines , going towards the viewer in lines that converge at an invisible point . cool . _october sky_ offers a much simpler image–that of a single white dot , traveling horizontally across the night sky . [. . . ] ✗ “ snake eyes ” is the most aggravating kind of movie : the kind that shows so much potential then becomes unbelievably disappointing . it’s not just because this is a brian depalma film , and since he’s a great director and one who’s films are always greeted with at least some fanfare . and it’s not even because this was a film starring nicolas cage and since he gives a brauvara performance , this film is hardly worth his talents . Baseline Algorithm (adapted from Pang and Lee) • Tokenization • Feature Extraction • Classification using different classifiers – Naïve Bayes – MaxEnt – Support Vector Machines (SVM) Sentiment Tokenization Issues • • • • • • Deal with HTML and XML markup Twitter mark-up (names, hash tags) Capitalization (preserve for words in all caps) Phone numbers, dates Emoticons Useful code: – Christopher Potts sentiment tokenizer – Brendan O’Connor twitter tokenizer 58 Extracting Features for Sentiment Classification • How to handle negation – I didn’t like this movie vs – I really like this movie • Which words to use? – Only adjectives – All words • All words turns out to work better, at least on this data 59 Negation Das, Sanjiv and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the Asia Pacific Finance Association Annual Conference (APFA). Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86. Add NOT_ to every word between negation and following punctuation: didn’t like this movie , but I didn’t NOT_like NOT_this NOT_movie but I Classification methods • Naïve Bayes • MaxEnt • SVM Evaluating Classification • Evaluation must be done on test data that are independent of the training data – usually a disjoint set of instances • Classification accuracy: c/n where n is the total number of test instances and c is the number of test instances correctly classified by the system. – Adequate if one class per document • Results can vary based on sampling error due to different training and test sets. – Average results over multiple training and test sets (splits of the overall data) for the best results. Slide from Chris Manning Cross-Validation Iteration • Break up data into 10 folds – (Equal positive and negative inside each fold?) • For each fold – Choose the fold as a temporary test set – Train on 9 folds, compute performance on the test fold • Report average performance of the 10 runs 1 Test Training 2 3 4 5 Training Test Training Test Training Training Training Test Test Other issues in Classification • MaxEnt and SVM tend to do better than Naïve Bayes 64 Problems: What makes reviews hard to classify? • Subtlety: – Perfume review in Perfumes: the Guide: • “If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut.” 65 Thwarted Expectations and Ordering Effects • “This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up.” • Well as usual Keanu Reeves is nothing special, but surprisingly, the very talented Laurence Fishbourne is not so good either, I was surprised. 66 Due Next Class • Readings – Chapter 4 from Pang and Lee “Opinion Mining and Sentiment Analysis” book – 2 papers for discussions • A short data analysis assignment – Description on Courseworks under Assignments – Goal is to get a better understanding of data and the problems discussed in class – Grade: Excellent/Good/Insufficient – Due before class. No late submissions Next class • Discussion of 2 papers (50 minutes) – 25 minutes per paper – Prepare a 15 min presentations and lead discussion for 10 minutes • 5 min break • More in depth lecture on sentiment analysis & open questions (can lead to ideas for projects) – 30 minutes • Introduction to Emotion/Mood (25 minutes) Announcements • The assignments of paper for discussions will be done by Saturday, Feb 1, 5pm. • TA office hours – 4:15-5:15pm Mondays in the TA room in Mudd • TA email: ta.cmsm@gmail.com Email TA if you’d like a tutorial on Text Classification and existing toolkits Announcements • Grading policy slightly updated to include data analysis assignments – 10% data analysis assignments (3 assignments, grading Excellent/Good/Insufficient). No late submissions! See class website or details – 30% discussion of papers – 60% project • 10% literature review part • 5% class presentation • 45% final paper and project