Presentation - CyberEmotions

advertisement
Virtual Knowledge Studio (VKS)
Information Studies
detecting and analysing
emotion in social network sites
MySpace comments
case study
Mike Thelwall
Statistical Cybermetrics Research Group
University of Wolverhampton, UK
research motivation
sentiment is a frequently overlooked key
factor in communication and relationships
needs to be investigated to understand the
role of sentiment in new online environments



identify suicide “at risk”
discover emotional factors necessary for sustained
online environments
modify bots to detect and react appropriately to
emotional communication
talk structure
part 1: background information about
MySpace comment communication
part 2: automatically detecting
sentiment in MySpace comments
MySpace comments are
public or semi-public
short messages
exchanged by Friends
but what is their purpose
and what do they look like?
comments
Displayed in public on home page –
public personal messages
purpose 1: gossip (53% of dialogs)
– examples of gossip comments
I moved to Houston, Tx.
I come home at the beginning of July
well i just diyed my hair nearly black!!
i regret not going to UMSX bc MZU is so much
harder
i sooo messed up :((
for a white guy tim knows a lot of rap song
Tina talks about you all the time.
Nigel said you were feeling bad
purpose 2: coordination of offline
activities (18% of dialogs)
CALL ME WHEN YOU GET A CHANCE
hey text me sometime.. [number]
i hope to see you toniiite <3
I'm gonna be in ABD in Jan. for like a
week, we gotta hang out
Hey I can call you 2day?!!
purpose 3: keeping in contact
emotion in MySpace
how important is emotion expression in
social network communication?
who uses emotion and what type of
emotion?
emotion in Friend comments
most comments
contain positive
emotion (including
formal expressions,
such as “Love, Sue”
or “raj x”)
few contain negative
emotion
Emotion
+ve
-ve
1 (none)
34% 80%
2
28%
3
35% 11%
6%
4
3%
2%
5 (strong)
0%
1%
Emotion strength in 819
random comments
emotion in Friend comments
positive emotion
mainly used by
females and mainly
directed at females
no gender difference
in negative emotions
From
female
To
female
To
male
From
male
2.4 (+)
2.0 (+)
1.3 (-)
1.3 (-)
2.2 (+)
1.7 (+)
1.3 (-)
1.5 (-)
Average emotion strength in 819 random comments
Sentistrength
To identify and analyse
Collective Emotions
in Cyberspace
CYBEREMOTIONS = data gathering + complex systems methods + ICT outputs
problem 1: non-standard English
in MySpace comments
Aspect of non-standard English
Comm
ents
Typographic slang or abbreviations (e.g., omg, lol, hugz,
@)
41%
Slang, including dialect, swearing, and idiomatic slang
sayings
51%
Non-standard spelling other than the above
33%
Non-standard punctuation
81%
Pictograms
16%
Interjections (e.g., haha, muahh, huh, but not oh).
13%
Non-standard capitalisation
75%
Other non-standard English grammar
56%
Not standard formal written English (i.e., Any of the
above)
97%
common words in comments
Rank
Word
1-10
i, you, to, the, and, a, u, me, hey, my
11-20
it, for, in, love, is, that, so, up, your, on
21-30
have, of, are, just, lol, but, we, how, be, ya
31-40
at, was, well, what, get, like, good, im, know, out
41-50
been, this, with, see, hope, all, do, not, if, happy
51-60
miss, going, go, time, i'm, ur, back, some, got, there
61-70
when, can, will, thanks, its, or, by, from, now, whats
71-80
say, day, new, hi, much, one, no, about, haha, call
81-90
come, :), soon, too, need, birthday, 2, am, had, here
91-100
dont, doing, as, think, man, page, great, did, weekend,
work
Bold words are not in the top 100 for general British English, and italic words are not in the top 100 for general American English.
problem 2: swearing
rife in MySpace
conveys positive and negative emotions
ignored by existing sentiment analysis
methods
emphatic adverb/adjective OR adverbial booster
OR premodifying intensifying negative adjective
(36% of swearing)






and we r guna go to town again n make a
ryt fuckin nyt of it again lol
see look i'm fucking commenting u back
lol and stop fucking tickleing me!!
Thanks for the party last night it was fucking
good and you are great hosts.
That 50's rock and roll weekender was
fucking mint!
yeah so me and sarah broke up and
everythings fucking shit
personal insult referring to
defined entity (28% of swearing)
tehe i am sorry.. i m such a sleep
deprived twat alot of the time! lol
Maxy is the soundest cunt in the
world!!!!
3rd? i thought i was your main man
number one? Fucker
write bak cunt xxx
You evil cunt! Haha
lucky fuck
idiomatic set phrase OR figurative extension
of literal meaning (23%, mostly male)
think am gonna get him an album or
summet fuck nows
got another copy of the reaction CD
(will had fucked the last one lol)
qu'est ce que fuck?
what the fuck pubehead whos pete and
why is this necicery mate
Heh long story.. cant be fucked to
explain :D
SentiStrength objective
1. detect positive and negative emotion in
2.
3.
4.
5.
MySpace comments
develop workarounds for lack of grammar
and spelling
harness emotion expression forms unique to
MySpace or CMC (e.g., :-) or
haaappppyyy!!!)
classify each MySpace comment as positive
1-5 AND negative 1-5
apply to social issues
SentiStrength algorithm
spelling correction for repeated letters

Helllllo -> Hello (emphasis: llll)
list of +ve and -ve words with strengths (party
from LIWC; includes swearing)

hate=-4, love =3
extra heuristics




emphasis acts to enhance + or – emotion
emotion words ignored in questions
take strongest +ve & -ve expression in whole
comment
booster words (e.g., very, some)
http://sentistrength.wlv.ac.uk/
sentiment strength estimation
example
HEEEEEEEEY BUDDY!!!!!!!!
translation and extraction of emphasis
HEY BUDDY!
+1
+1
HEY BUDDY!
1 +1=2
Look up words in
Sentiment strength
dictionary
2 +1=3
overall – positive: 3, negative 1
word
hey
+ve
1
buddy 2
SentiStrength vs. std. classifiers
Algorithm
Positive
SentiStrength
60.9%
Support Vector Machines 56.2%
Negative
73.0%
73.6%
Simple logistic regression
J48 classification tree
Naïve Bayes
Decision table
55.0%
54.9%
54.9%
54.8%
72.8%
72.6%
67.3%
73.8%
JRip rule-based classifier 54.1%
Multilayer Perceptron
49.6%
73.1%
71.4%
Baseline
41.6%
71.2%
Random
20.0%
20.0%
10-fold
crossvalidation
on 1041
humanclassified
comments
application - evidence of emotion
homophily in MySpace
automatic analysis of sentiment in 2
million comments exchanged between
MySpace friends
correlation of 0.227 for +ve emotion
strength and 0.254 for –ve
people tend to use similar but not
identical levels of emotion to their
friends in messages
conclusions
social network sites are a source of
sentiment expressed in very informal
language
can identify positive and negative
sentiment with reasonable accuracy
applications:


identifying social trends
Identifying potential emotional “anomalies”
bibliography
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D. & Kappas, A. (under
review). Sentiment strength detection in short informal text.
Thelwall, M., Wilkinson, D. & Uppal, S. (2010). Data mining
emotion in social network communication: Gender
differences in MySpace, Journal of the American Society for
Information Science and Technology, 61(1), 190-199.
Thelwall, M. (2008). Fk yea I swear: Cursing and gender in a
corpus of MySpace pages, Corpora, 3(1), 83-107.
Thelwall, M. (2009). Homophily in MySpace, Journal of the
American Society for Information Science and Technology. 60(2),
219-231.
Thelwall, M. (2009). Social network sites: Users and uses. In:
M. Zelkowitz (Ed.), Advances in Computers 76. Amsterdam: Elsevier
(pp. 19-73).
Thelwall, M. & Wilkinson, D. (2010). Public dialogs in social
network sites: What is their purpose?, Journal of the American
Society for Information Science and Technology, 61(2), 392-404
http://www.cyberemotions.eu/snic.ppt
references 2
Gobron, S., Ahn, J., Paltoglou, G., Thelwall, M. &
Thalmann, D. (in press). From sentence to
emotion: A real-time three-dimensional
graphics metaphor of emotions extracted
from text. The Visual Computer: International
Journal of Computer Graphics.
Thelwall, M. (2009). MySpace comments. Online
Information Review, 33(1), 58-76.
Thelwall, M. (2008). Social networks, gender
and friending: An analysis of MySpace
member profiles, Journal of the American Society
for Information Science and Technology, 59(8),
1321-1330.
http://www.danah.org/researchBibs/sns.html
Download