Consumer sentiment analysis with Twitter Reetta Suonperä August 2013 My dataset • Two months, one csv.gz file per day • In total about 1.2 billion tweets • It's always easy for a person to say get over, but you don't feel what heart feels to make that statment|PrettynPinkC215|2011-02-01T04:01:16Z|201102-01T04:00:48Z|1296532876139018784| The tools I use • General approach: natural language processing (NLP) • The Natural Language Toolkit (NLTK) Introduction: the consumer sentiment index • A survey-based indicator of consumer confidence or sentiment • History goes back to 1946 at University of Michigan • Ireland’s consumer sentiment index by the ESRI since 1996 ESRI survey questions • Q1: Economic situation in the country (next 12 months) • Q2: Unemployment in the country (next 12 months) • Q3: Household financial situation (12 months ago) • Q4: Household financial situation (next 12 months) • Q5: Good/bad time to buy large household items Answers: positive/neutral/negative This is what it looks like: The KBC/ESRI consumer sentiment index We can speculate on what drives sentiment – but we can’t really know On the June 2013 improvement in households’ assessment of their personal finances: “We think that the ECB rate cut in May played some role … a combination of low inflation, early summer sales and increasing signs of improvement in the residential property market could have contributed…” On the decline in the July 2013 index: “We think reports that the Irish economy had fallen back into recession and a couple of high profile job loss announcements unnerved consumers last month.” Motivation: why using Twitter could help • More timely • Continuous information • Save money • What drives sentiment Previous research • O’Connor et al (2010): From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series • An index based on tweets containing the word “jobs” correlates with the Michigan index and Gallup’s daily poll • Indices with economy or job correlate poorly! The process (simplified) Initial wordlist topics General economic Unemployment/ Household situation employment financial situation major hh items General economy Job losses General Acquire/buy Good times Job gains Income Cost Bad times Credit Pricey Econ policy Feeling broke Bargain Feeling flush Buying climate Using WordNet to expand seed wordlist • Use WordNet to find synonyms for initial keyword list: • Words have many different meanings • Include part-of-speech tag • Word doesn’t exist in WordNet? • Output does not include tenses or plurals Pre-processing tasks • Regular expressions for more basic tasks: • Cleaning, tokenising URLs, usernames • NLTK functionality for more complex tasks • Stopword removal, stemming, POS-tagging Fine selection – not there yet… • Do more filtering using bigrams? • “I broke” • “pay cut” • “new job” • Use POS tags? • Classification? The to-do list • Finalise fine selection • Sentiment classification • Visualisation Resources • www.nltk.org • Natural Language Processing with Python:http://nltk.org/book/ • Python Text Processing with NLTK 2.0 Cookbook Resources • O’Connor et al (2010): From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series • Bollen et al (2011): Twitter mood predicts the stock market • Bollen et al (2011): Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena • Go et al (2009): Twitter sentiment classiﬁcation using distant supervision • Jiang et al (2011): Target-dependent Twitter Sentiment Classification Questions?