A cognitive study of subjectivity extraction in sentiment annotation Abhijit Mishra1, Aditya Joshi1,2,3, Pushpak Bhattacharyya1 1 IIT Bombay, India 2 Monash University, Australia 3IITB-Monash Research Academy At 5th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis, ACL 2014, Baltimore Subjectivity Extraction • Goal: To identify subjective portions of text Motivation • Strong AI suggests that a machine must be perform sentiment analysis in a manner and accuracy similar to human beings • Do humans perform subjective extraction as well? A “cognitive study” of subjectivity extraction in sentiment annotation Outline • Sentiment Oscillations & Subjectivity Extraction • Experiment Setup • Anticipation & Homing • Conclusion & Future Work Sentiment Oscillations & subjectivity extraction • Subjective documents may be: Linear: Oscillating: The story was captivating. The actors did a great job. I absolutely loved the movie! The story was captivating. Only if they had better actors. But then I enjoyed the movie, on the whole. • Humans perform subjectivity extraction either as a result of “anticipation” or as “homing”. • Which of the two methods are adopted depends on the linear/oscillating nature of the subjective document. Experiment Setup (1/2) • A human annotator reads a document and predicts its sentiment • A Tobii T120 eye-tracker records eye movements while he/she reads the document * No time restriction, no user input required: to minimize errors. Experiment Setup (2/2) • Dataset – 3 Movie reviews in English from imdb – One linear, one oscillating, one between the two extremes (D0, D1, D2 respectively) • Three documents? Really?! – To eliminate predictability – To reduce errors due to fatigue • 12 human annotators (P0, .. P11 respectively) Observations: Anticipation (1/2) • In case of linear subjective documents, an annotator reads some sentences and begins to skip sentences. Observations: Anticipation (2/2) Document Length Average number of non-unique sentences read by participants D0 10 21 D1 9 33.83 D2 13 50.42 Observations: Homing (1/3) • In case of oscillating subjective documents, an annotator (a) first reads all sentences, (b) revisits some sentences again Observations: Homing (2/3) • Considerable overlap between sentences that are read in the second pass • All of them are subjective. Participant TFD-SE PTFD TFC-SE P5 7.3 8 21 P7 3.1 5 11 P9 51.94 10 26 P11 116.6 16 56 Reading statistics for D1 TFD: Total fixation duration for subjective extract; PTFD: Proportion of total fixation duration = (TFD)/(Total duration); TFC-SE: Total fixation count for subjective extract Observations: Homing (3/3) • Homing at a sub-sentence level – Sarcasm • Multiple regressions around the sarcasm portion for participant P1, document D1 • Participant P1 does not correctly detect the sentiment of the document – Thwarting Conclusion & Future Work • Based on how sentiment changes through a document, humans may perform subjectivity extraction as a result of anticipation or homing • Applications: – Pricing models for crowd-sourced annotation – Sentiment classifiers that incorporate “sentiment runlengths” References • WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarization With Wikipedia, Subhabrata Mukherjee and Pushpak Bhattacharyya, ECML PKDD 2012 • A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts, Bo Pang, Lillian Lee, ACL 2004