Real-World Behavior Analysis through a Social Media Lens Mohammad-Ali Abbasi, Huan Liu Computer Science and Engineering, Arizona State University Sun-Ki Chai, Kiran Sagoo Department of Sociology, University of Hawai`i Ali2@asu.edu Data Mining and Machine Learning Lab Real-World Behavior Analysis through a Social Media Lens Real world Events/Behavior Data Mining and Machine Learning Lab 2 Real-World Behavior Analysis through a Social Media Lens Data Mining and Machine Learning Lab 3 Real-World Behavior Analysis through a Social Media Lens Data Mining and Machine Learning Lab 4 Real-World Behavior Analysis through a Social Media Lens Data Mining and Machine Learning Lab 5 Any correlation between social media numbers and election results? Mitt Romney Ron Paul Newt Gingrich Rick Santorum Barack Obama 1,520,000 900,000 295,000 173,000 25,500,000 370,000 260,000 1,447,000 160,000 12,920,000 Do we observe the same difference in the votes? Number of States carried? Data Mining and Machine Learning Lab http://en.wikipedia.org/wiki/Republican_Party_presidential_primaries,_2012 6 Objectives of the research • Studying the correlation between real-world collective behavior and social media data • Determining the relative effectiveness of a social media lens in analyzing and predicting real-world collective behavior • Exploring the domains and situations under which social media can be a predictor for real-world's behavior Data Mining and Machine Learning Lab 7 Data collection Active methods • Experiments • • Expensive • Time consuming Social Media Surveys • Field Study • Maybe dangerous • People leave many clues about themselves • Their interactions Passive methods reveal much about people •(ByWe can passively observe people’s activities observing and analyzing) • Behavior • Belongings • Documents, … Data Mining and Machine Learning Lab 8 Snooping Experimental psychology suggests that a person may be understood by what happens around him • Does what's on your desk reveal what's on your mind? • Do those pictures on your walls tell true tales about your character? Data Mining and Machine Learning Lab 9 Using online data for opinion polling • From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series • O'Connor et al. analyzed sentiment polarity of tweets and found a correlation of 80% with results from public opinion polls Data Mining and Machine Learning Lab 10 Some Existing Work • Stock Market Prediction using data collected data form twitter • Box-office revenues prediction for movies • Analyzing Arab-Spring using social media Most of the work in the field can be classified into two categories: • Behavior Analysis and finding a correlation • Behavior prediction Data Mining and Machine Learning Lab 11 Our approach: A four-step model Find equivalent groups in Real-World & Social Media Collect Related Online Data from Social Media Analyze Online Data (Behavior) Analyze the Real-World Behavior & find correlation Data Mining and Machine Learning Lab 12 Experimental settings Find a Group in real world and Social Media • Select based on more stable • characteristics Twitter to collect 35 million tweets related toļ¼Arab Spring Race, religion, primary language, and Collect Related Online Data from Social Media • • • • Analyze Online Data (Behavior) Analyze the Real-World Behavior • • • • country/region of origin Collect more than 1 million blogposts Arab-Spring movement 135,000 popular Facebook pages to collect Information Retrieval techniques data on posts, comments and like behavior on Facebook. Sentiment polarity analysis The data on real-world events has been Statistical methods collected fromanalysis Reuters.com Correlational • Multivariate regression analysis Data Mining and Machine Learning Lab 13 Correlation between online and real events Time that event in real-world happened Data Mining and Machine Learning Lab 14 Observations Time that event in real-world happened Data Mining and Machine Learning Lab 15 Observations • There could be correlations between real-world events and online discussions. However, – Correlation is not amount to prediction – Poor results for small events • Many real-world events left uncovered – Influence and cascade effects, causes too much non-relevant discussion in social media • What we have experimented – Finding Influential people – Analyzing Mood over the network Data Mining and Machine Learning Lab 16 What are people concerned about Data Mining and Machine Learning Lab 17 Challenges • Finding Relevant Communities – Analyzing Arab Spring tweets, show that 75 percent of the 1 million clicks on Libya-related tweets and 89 percent of the 3 million clicks for Egypt-related Tweets came from outside of the Arab world1 – The fallacy of millions of followers 1- http://www.stripes.com/blogs/stripes-central/stripes-central-1.8040/researchersskeptical-dod-can-use-social-media-to-predict-future-conflict-1.15529 Data Mining and Machine Learning Lab 18 Challenges • Data Collection – – – – Sufficient coverage of the data Source of data is unknown Spam Paid social media content • Online behavior Analysis – Unstructured, noisy text data – Language ambiguity Data Mining and Machine Learning Lab 19 Observations Real-World Behavior Prediction – Stark difference between click and taking real risk in the street Data Mining and Machine Learning Lab 20 Conclusions • Social media is helping us to understand the realworld’s events but is not a sole source • More research and development to make social media a reliable source for behavior analysis • Social event prediction using social media remains an open problem. More interdisciplinary research should be promoted. Data Mining and Machine Learning Lab 21 Thanks! Acknowledgments: This work is, in part, sponsored by ONR and AFOSR grants. We are grateful for the comments from anonymous reviewers and members of DMML lab at ASU Mohammad-Ali Abbasi ali2@asu.edu Data Mining and Machine Learning Lab 22