Using Social Media Data to Construct a Consumer

advertisement
Consumer sentiment
analysis with Twitter
Reetta Suonperä
August 2013
My dataset
• Two months, one csv.gz file per day
• In total about 1.2 billion tweets
• It's always easy for a person to say get over, but you don't
feel what heart feels to make that
statment|PrettynPinkC215|2011-02-01T04:01:16Z|201102-01T04:00:48Z|1296532876139018784|
The tools I use
• General approach: natural language processing (NLP)
• The Natural Language Toolkit (NLTK)
Introduction: the consumer sentiment index
• A survey-based indicator of consumer confidence or
sentiment
• History goes back to 1946 at University of Michigan
• Ireland’s consumer sentiment index by the ESRI since 1996
ESRI survey questions
• Q1: Economic situation in the country (next 12 months)
• Q2: Unemployment in the country (next 12 months)
• Q3: Household financial situation (12 months ago)
• Q4: Household financial situation (next 12 months)
• Q5: Good/bad time to buy large household items
Answers: positive/neutral/negative
This is what it looks like:
The KBC/ESRI consumer sentiment index
We can speculate on what drives sentiment –
but we can’t really know
On the June 2013 improvement in households’ assessment of
their personal finances:
“We think that the ECB rate cut in May played some role …
a combination of low inflation, early summer sales and
increasing signs of improvement in the residential property
market could have contributed…”
On the decline in the July 2013 index:
“We think reports that the Irish economy had fallen back
into recession and a couple of high profile job loss
announcements unnerved consumers last month.”
Motivation: why using Twitter could help
• More timely
• Continuous information
• Save money
• What drives sentiment
Previous research
• O’Connor et al (2010): From Tweets to Polls: Linking Text
Sentiment to Public Opinion Time Series
• An index based on tweets containing the word “jobs”
correlates with the Michigan index and Gallup’s daily poll
• Indices with economy or job correlate poorly!
The process (simplified)
Initial wordlist topics
General economic Unemployment/
Household
situation
employment
financial situation major hh items
General economy
Job losses
General
Acquire/buy
Good times
Job gains
Income
Cost
Bad times
Credit
Pricey
Econ policy
Feeling broke
Bargain
Feeling flush
Buying climate
Using WordNet to expand seed wordlist
• Use WordNet to find synonyms for initial keyword list:
• Words have many different meanings
• Include part-of-speech tag
• Word doesn’t exist in WordNet?
• Output does not include tenses or plurals
Pre-processing tasks
• Regular expressions for more basic tasks:
• Cleaning, tokenising URLs, usernames
• NLTK functionality for more complex tasks
• Stopword removal, stemming, POS-tagging
Fine selection – not there yet…
• Do more filtering using bigrams?
• “I broke”
• “pay cut”
• “new job”
• Use POS tags?
• Classification?
The to-do list
• Finalise fine selection
• Sentiment classification
• Visualisation
Resources
• www.nltk.org
• Natural Language Processing with
Python:http://nltk.org/book/
• Python Text Processing with NLTK 2.0 Cookbook
Resources
• O’Connor et al (2010): From Tweets to Polls: Linking Text
Sentiment to Public Opinion Time Series
• Bollen et al (2011): Twitter mood predicts the stock
market
• Bollen et al (2011): Modeling public mood and emotion:
Twitter sentiment and socio-economic phenomena
• Go et al (2009): Twitter sentiment classification using
distant supervision
• Jiang et al (2011): Target-dependent Twitter Sentiment
Classification
Questions?
Download