Virtual Knowledge Studio (VKS) Information Studies detecting and analysing emotion in social network sites MySpace comments case study Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK research motivation sentiment is a frequently overlooked key factor in communication and relationships needs to be investigated to understand the role of sentiment in new online environments identify suicide “at risk” discover emotional factors necessary for sustained online environments modify bots to detect and react appropriately to emotional communication talk structure part 1: background information about MySpace comment communication part 2: automatically detecting sentiment in MySpace comments MySpace comments are public or semi-public short messages exchanged by Friends but what is their purpose and what do they look like? comments Displayed in public on home page – public personal messages purpose 1: gossip (53% of dialogs) – examples of gossip comments I moved to Houston, Tx. I come home at the beginning of July well i just diyed my hair nearly black!! i regret not going to UMSX bc MZU is so much harder i sooo messed up :(( for a white guy tim knows a lot of rap song Tina talks about you all the time. Nigel said you were feeling bad purpose 2: coordination of offline activities (18% of dialogs) CALL ME WHEN YOU GET A CHANCE hey text me sometime.. [number] i hope to see you toniiite <3 I'm gonna be in ABD in Jan. for like a week, we gotta hang out Hey I can call you 2day?!! purpose 3: keeping in contact emotion in MySpace how important is emotion expression in social network communication? who uses emotion and what type of emotion? emotion in Friend comments most comments contain positive emotion (including formal expressions, such as “Love, Sue” or “raj x”) few contain negative emotion Emotion +ve -ve 1 (none) 34% 80% 2 28% 3 35% 11% 6% 4 3% 2% 5 (strong) 0% 1% Emotion strength in 819 random comments emotion in Friend comments positive emotion mainly used by females and mainly directed at females no gender difference in negative emotions From female To female To male From male 2.4 (+) 2.0 (+) 1.3 (-) 1.3 (-) 2.2 (+) 1.7 (+) 1.3 (-) 1.5 (-) Average emotion strength in 819 random comments Sentistrength To identify and analyse Collective Emotions in Cyberspace CYBEREMOTIONS = data gathering + complex systems methods + ICT outputs problem 1: non-standard English in MySpace comments Aspect of non-standard English Comm ents Typographic slang or abbreviations (e.g., omg, lol, hugz, @) 41% Slang, including dialect, swearing, and idiomatic slang sayings 51% Non-standard spelling other than the above 33% Non-standard punctuation 81% Pictograms 16% Interjections (e.g., haha, muahh, huh, but not oh). 13% Non-standard capitalisation 75% Other non-standard English grammar 56% Not standard formal written English (i.e., Any of the above) 97% common words in comments Rank Word 1-10 i, you, to, the, and, a, u, me, hey, my 11-20 it, for, in, love, is, that, so, up, your, on 21-30 have, of, are, just, lol, but, we, how, be, ya 31-40 at, was, well, what, get, like, good, im, know, out 41-50 been, this, with, see, hope, all, do, not, if, happy 51-60 miss, going, go, time, i'm, ur, back, some, got, there 61-70 when, can, will, thanks, its, or, by, from, now, whats 71-80 say, day, new, hi, much, one, no, about, haha, call 81-90 come, :), soon, too, need, birthday, 2, am, had, here 91-100 dont, doing, as, think, man, page, great, did, weekend, work Bold words are not in the top 100 for general British English, and italic words are not in the top 100 for general American English. problem 2: swearing rife in MySpace conveys positive and negative emotions ignored by existing sentiment analysis methods emphatic adverb/adjective OR adverbial booster OR premodifying intensifying negative adjective (36% of swearing) and we r guna go to town again n make a ryt fuckin nyt of it again lol see look i'm fucking commenting u back lol and stop fucking tickleing me!! Thanks for the party last night it was fucking good and you are great hosts. That 50's rock and roll weekender was fucking mint! yeah so me and sarah broke up and everythings fucking shit personal insult referring to defined entity (28% of swearing) tehe i am sorry.. i m such a sleep deprived twat alot of the time! lol Maxy is the soundest cunt in the world!!!! 3rd? i thought i was your main man number one? Fucker write bak cunt xxx You evil cunt! Haha lucky fuck idiomatic set phrase OR figurative extension of literal meaning (23%, mostly male) think am gonna get him an album or summet fuck nows got another copy of the reaction CD (will had fucked the last one lol) qu'est ce que fuck? what the fuck pubehead whos pete and why is this necicery mate Heh long story.. cant be fucked to explain :D SentiStrength objective 1. detect positive and negative emotion in 2. 3. 4. 5. MySpace comments develop workarounds for lack of grammar and spelling harness emotion expression forms unique to MySpace or CMC (e.g., :-) or haaappppyyy!!!) classify each MySpace comment as positive 1-5 AND negative 1-5 apply to social issues SentiStrength algorithm spelling correction for repeated letters Helllllo -> Hello (emphasis: llll) list of +ve and -ve words with strengths (party from LIWC; includes swearing) hate=-4, love =3 extra heuristics emphasis acts to enhance + or – emotion emotion words ignored in questions take strongest +ve & -ve expression in whole comment booster words (e.g., very, some) http://sentistrength.wlv.ac.uk/ sentiment strength estimation example HEEEEEEEEY BUDDY!!!!!!!! translation and extraction of emphasis HEY BUDDY! +1 +1 HEY BUDDY! 1 +1=2 Look up words in Sentiment strength dictionary 2 +1=3 overall – positive: 3, negative 1 word hey +ve 1 buddy 2 SentiStrength vs. std. classifiers Algorithm Positive SentiStrength 60.9% Support Vector Machines 56.2% Negative 73.0% 73.6% Simple logistic regression J48 classification tree Naïve Bayes Decision table 55.0% 54.9% 54.9% 54.8% 72.8% 72.6% 67.3% 73.8% JRip rule-based classifier 54.1% Multilayer Perceptron 49.6% 73.1% 71.4% Baseline 41.6% 71.2% Random 20.0% 20.0% 10-fold crossvalidation on 1041 humanclassified comments application - evidence of emotion homophily in MySpace automatic analysis of sentiment in 2 million comments exchanged between MySpace friends correlation of 0.227 for +ve emotion strength and 0.254 for –ve people tend to use similar but not identical levels of emotion to their friends in messages conclusions social network sites are a source of sentiment expressed in very informal language can identify positive and negative sentiment with reasonable accuracy applications: identifying social trends Identifying potential emotional “anomalies” bibliography Thelwall, M., Buckley, K., Paltoglou, G., Cai, D. & Kappas, A. (under review). Sentiment strength detection in short informal text. Thelwall, M., Wilkinson, D. & Uppal, S. (2010). Data mining emotion in social network communication: Gender differences in MySpace, Journal of the American Society for Information Science and Technology, 61(1), 190-199. Thelwall, M. (2008). Fk yea I swear: Cursing and gender in a corpus of MySpace pages, Corpora, 3(1), 83-107. Thelwall, M. (2009). Homophily in MySpace, Journal of the American Society for Information Science and Technology. 60(2), 219-231. Thelwall, M. (2009). Social network sites: Users and uses. In: M. Zelkowitz (Ed.), Advances in Computers 76. Amsterdam: Elsevier (pp. 19-73). Thelwall, M. & Wilkinson, D. (2010). Public dialogs in social network sites: What is their purpose?, Journal of the American Society for Information Science and Technology, 61(2), 392-404 http://www.cyberemotions.eu/snic.ppt references 2 Gobron, S., Ahn, J., Paltoglou, G., Thelwall, M. & Thalmann, D. (in press). From sentence to emotion: A real-time three-dimensional graphics metaphor of emotions extracted from text. The Visual Computer: International Journal of Computer Graphics. Thelwall, M. (2009). MySpace comments. Online Information Review, 33(1), 58-76. Thelwall, M. (2008). Social networks, gender and friending: An analysis of MySpace member profiles, Journal of the American Society for Information Science and Technology, 59(8), 1321-1330. http://www.danah.org/researchBibs/sns.html