Hal MacFie - Qi Statistics

advertisement
Open access and data processing of Social Media
(Twitter) data – a new and valuable
consumer research instrument
Thierry Worch, Anne Hasted & Hal MacFie
Overview
•
•
•
•
Using Twitter for Research – Macro vs Micro
The R based macro TwitteR
A food product application
Possible use in Sensory and Consumer Science
© Qi Statistics Ltd
Slide 2
What is Twitter?
• Online social network and microblog.
• Open text-based messages of up to 140 characters also known as
“Tweets”.
• Tweets are open:
– personal information (what people are doing/feeling);
– discussions;
– sharing information...
• Tweets are grouped together according to their content (use of “#word”).
• People can “follow” friends, celebrities or brands to stay updated.
• Over 500 million registered users in 2012, generating over 340 millions
tweets/day, and handling over 1.6 billion search queries/day.
© Qi Statistics Ltd
Slide 3
MACRO APPLICATION 1
Diurnal and Seasonal Mood Vary with Work,
Sleep, and Day length Across Diverse Cultures
•
•
•
•
Study from Golder et al.
Science 30 September 2011: 1878-1881.
Previous studies small samples of American students.
Students are exposed to varying academic schedules that
constrain when and how much they sleep.
• Retrospective self-reports, vulnerable to memory error and
experimenter demand effects.
• Researchers have acknowledged the limitations of this
methodology but have had no practical means for in situ realtime hourly observation of individual behavior in large and
culturally diverse populations over many weeks.
© Qi Statistics Ltd
Slide 4
Methodology
Twitter data access
2.4 million individuals
worldwide
509 million messages
February 2008 and January
2010
Linguistic Inquiry and Word
Count (LIWC) Analysis
Negative Term Frequencies
Positive Term Frequencies
Time of day
Time of day
© Qi Statistics Ltd
Slide 6
Results
• Individuals awaken in a good mood that deteriorates as the day
progresses—which is consistent with the effects of sleep and circadian
rhythm.
• Seasonal change in baseline positive affect varies with change in day
length.
• People are happier on weekends, but the morning peak in positive affect
is delayed by 2 hours, which suggests that people awaken later on
weekends.
© Qi Statistics Ltd
Slide 7
MACRO APPLICATION 2
Effects of the Recession on Public Mood in the UK
• Landsdall-Welfare, Lampos, & Cristianini (University of Bristol, UK).
• 484 million tweets 9.8 million UK users July 09 to Jan 12
© Qi Statistics Ltd
Slide 7
Results – 4 emotion categories
© Qi Statistics Ltd
Slide 8
Micro Application 1: Airline companies
• “R by example: mining Twitter for consumer attitudes towards airlines”, by
Jeffrey Breen (June 2011)
© Qi Statistics Ltd
Slide 9
Airline satisfaction scores
• Retrieved from www.theacsi.org
• Airlines do not score very high compared to other sectors.
© Qi Statistics Ltd
Slide 10
Example of Tweets
How can we
access and
summarize this
data?
© Qi Statistics Ltd
Slide 11
Searching tweets with twitteR
© Qi Statistics Ltd
Slide 12
Game Plan for the Sentiment Analysis
© Qi Statistics Ltd
Slide 13
Sentiment distributions
Positive
Negative
Southwest
United
Airlines
Southwest has much less negative tweets than United Airlines
© Qi Statistics Ltd
Slide 14
Micro Application 2: Chocolate Study
• 5 chocolate products/brands:
–
–
–
–
–
•
•
•
•
•
Cadbury
Twix
Snickers
Hershey
KitKat
Once a week for 8 weeks.
7000 tweets per brand.
Circle around Manchester with a radius of 500 Miles.
English only
Duplicated tweets (and re-tweets) removed.
© Qi Statistics Ltd
Slide 15
Sentiment Analysis
Positive
Negative
Cadburys
Kitkat
© Qi Statistics Ltd
Slide 16
Classification of the terms tweeted
after clean up using the R text mining routine TM
9 sensory descriptors in the top 25 of each product
Cadbury
chocolate
2077
eat
381
cream
279
bars
308
ice
60
milk
371
taste
98
cake
56
food
115
Snickers
chocolate
330
eat
1021
cream
485
bars
324
ice
494
milk
66
taste
60
cake
89
food
46
Twix
chocolate
eat
cream
bars
ice
milk
taste
cake
food
286
537
132
207
118
52
41
51
39
KitKat
chocolate
eat
cream
bars
ice
milk
taste
cake
food
363
544
110
74
54
37
52
94
57
Hershey
chocolate
eat
cream
bars
ice
milk
taste
cake
food
555
135
198
75
39
51
110
26
43
795
114
59
57
51
Hershey
sweet
dark
sweets
sauce
cupcake
71
26
18
17
14
5 sensory descriptors specific to 2 or less products
Cadbury
dairy
jelly
eggs
strawberries
dairymilk
© Qi Statistics Ltd
300
184
176
155
117
Snickers
flake
icecream
brownies
bites
flavour
79
68
36
35
35
Twix
crisps
coffee
bites
dairy
sweet
63
33
30
28
27
Slide 17
KitKat
chunky
mint
crisps
chunkies
coffee
Results (chocolate occasion)
Category Terms – 9 descriptors in the top 15 of each product
Cadbury
today
191
always
32
tomorrow
94
home
58
tonight
28
craving
30
birthday
32
breakfast
27
diet
29
Snickers
today
always
tomorrow
home
tonight
craving
birthday
breakfast
diet
Twix
149
68
31
45
33
55
25
63
40
today
always
tomorrow
home
tonight
craving
birthday
breakfast
diet
123
70
34
56
47
26
26
64
37
KitKat
today
186
always
63
tomorrow
49
home
42
tonight
57
craving
24
birthday
69
breakfast
56
diet
38
Hershey
today
104
always
57
tomorrow
49
home
34
tonight
24
craving
20
birthday
20
breakfast
13
diet
6
Unique Terms – 2 descriptors specific to 2 or less products
Cadbury
picnic
78
college
48
© Qi Statistics Ltd
Snickers
hungry
407
earlier
16
Twix
japan
hungry
Slide 18
75
24
KitKat
hungry
26
japan
24
Hershey
july
94
concert
44
Results (chocolate)
• Cadbury have been running a competition and
this is reflected in high frequency responses.
• Can see descriptors that appear to define the
category
• Can observe product specific descriptors for
sensory and occasion
© Qi Statistics Ltd
Slide 19
• Usage
Comments
– TwitteR package " easy " to use ( once you know how)
– Large number of texts required – even for micro studies
– Linguistic/Text processing software essential
• Micro Applications - Sensory research
– Vocabulary development to define a category
– Brand specific attributes
– Change in sentiment over time and place
• Research – Macro
– find a strong hypothesis and the numbers will do the rest
© Qi Statistics Ltd
Slide 20
Conclusion
• Useful open access research source
• Methodological research needed
• Specialised sensory algorithms needed
© Qi Statistics Ltd
Slide 21
Download