PPT - Arizona State University

advertisement
Real-World Behavior Analysis
through a Social Media Lens
Mohammad-Ali Abbasi, Huan Liu
Computer Science and Engineering, Arizona State University
Sun-Ki Chai, Kiran Sagoo
Department of Sociology, University of Hawai`i
Ali2@asu.edu
Data Mining and
Machine Learning Lab
Real-World Behavior Analysis
through a Social Media Lens
Real world Events/Behavior
Data Mining and
Machine Learning Lab
2
Real-World Behavior Analysis
through a Social Media Lens
Data Mining and
Machine Learning Lab
3
Real-World Behavior Analysis
through a Social Media Lens
Data Mining and
Machine Learning Lab
4
Real-World Behavior Analysis
through a Social Media Lens
Data Mining and
Machine Learning Lab
5
Any correlation between social media numbers and
election results?
Mitt Romney
Ron Paul
Newt Gingrich Rick Santorum
Barack Obama
1,520,000
900,000
295,000
173,000
25,500,000
370,000
260,000
1,447,000
160,000
12,920,000
Do we observe the same
difference in the votes?
Number of States carried?
Data Mining and
Machine Learning Lab
http://en.wikipedia.org/wiki/Republican_Party_presidential_primaries,_2012
6
Objectives of the research
• Studying the correlation between real-world
collective behavior and social media data
• Determining the relative effectiveness of a social
media lens in analyzing and predicting real-world
collective behavior
• Exploring the domains and situations under which
social media can be a predictor for real-world's
behavior
Data Mining and
Machine Learning Lab
7
Data collection
Active methods
• Experiments
•
• Expensive
• Time consuming
Social
Media
Surveys
• Field Study
•
Maybe dangerous
• People leave many clues about themselves
• Their
interactions
Passive
methods reveal much about people
•(ByWe
can passively
observe people’s activities
observing
and analyzing)
• Behavior
• Belongings
• Documents, …
Data Mining and
Machine Learning Lab
8
Snooping
Experimental psychology suggests that a person
may be understood by what happens around him
• Does what's on your desk reveal what's on your
mind?
• Do those pictures on your walls tell true tales
about your character?
Data Mining and
Machine Learning Lab
9
Using online data for opinion polling
• From Tweets to Polls: Linking Text
Sentiment to Public Opinion Time Series
• O'Connor et al. analyzed sentiment polarity
of tweets and found a correlation of 80% with
results from public opinion polls
Data Mining and
Machine Learning Lab
10
Some Existing Work
• Stock Market Prediction using data collected
data form twitter
• Box-office revenues prediction for movies
• Analyzing Arab-Spring using social media
Most of the work in the field can be classified into two categories:
• Behavior Analysis and finding a correlation
• Behavior prediction
Data Mining and
Machine Learning Lab
11
Our approach: A four-step model
Find equivalent groups in Real-World & Social Media
Collect Related Online Data from Social Media
Analyze Online Data (Behavior)
Analyze the Real-World Behavior & find correlation
Data Mining and
Machine Learning Lab
12
Experimental settings
Find a Group in real
world and Social Media
• Select based on more stable
• characteristics
Twitter to collect 35 million tweets related
toļƒ¼Arab
Spring
Race,
religion, primary language, and
Collect Related Online
Data from Social Media
•
•
•
•
Analyze Online Data
(Behavior)
Analyze the Real-World
Behavior
•
•
•
•
country/region of origin
Collect
more than 1 million blogposts
Arab-Spring movement
135,000 popular Facebook pages to collect
Information
Retrieval
techniques
data on posts,
comments
and like behavior
on Facebook.
Sentiment polarity analysis
The data on real-world events has been
Statistical methods
collected
fromanalysis
Reuters.com
Correlational
• Multivariate regression analysis
Data Mining and
Machine Learning Lab
13
Correlation between online and real events
Time that event in
real-world happened
Data Mining and
Machine Learning Lab
14
Observations
Time that event in
real-world
happened
Data Mining and
Machine Learning Lab
15
Observations
• There could be correlations between real-world events
and online discussions. However,
– Correlation is not amount to prediction
– Poor results for small events
• Many real-world events left uncovered
– Influence and cascade effects, causes too much non-relevant
discussion in social media
• What we have experimented
– Finding Influential people
– Analyzing Mood over the network
Data Mining and
Machine Learning Lab
16
What are people concerned about
Data Mining and
Machine Learning Lab
17
Challenges
• Finding Relevant Communities
– Analyzing Arab Spring tweets, show that 75 percent
of the 1 million clicks on Libya-related tweets and 89
percent of the 3 million clicks for Egypt-related
Tweets came from outside of the Arab world1
– The fallacy of millions of followers
1- http://www.stripes.com/blogs/stripes-central/stripes-central-1.8040/researchersskeptical-dod-can-use-social-media-to-predict-future-conflict-1.15529
Data Mining and
Machine Learning Lab
18
Challenges
• Data Collection
–
–
–
–
Sufficient coverage of the data
Source of data is unknown
Spam
Paid social media content
• Online behavior Analysis
– Unstructured, noisy text data
– Language ambiguity
Data Mining and
Machine Learning Lab
19
Observations
Real-World Behavior Prediction
– Stark difference between click and taking
real risk in the street
Data Mining and
Machine Learning Lab
20
Conclusions
• Social media is helping us to understand the realworld’s events but is not a sole source
• More research and development to make social
media a reliable source for behavior analysis
• Social event prediction using social media remains
an open problem. More interdisciplinary research
should be promoted.
Data Mining and
Machine Learning Lab
21
Thanks!
Acknowledgments:
This work is, in part, sponsored by ONR and AFOSR
grants. We are grateful for the comments from anonymous
reviewers and members of DMML lab at ASU
Mohammad-Ali Abbasi
ali2@asu.edu
Data Mining and
Machine Learning Lab
22
Download