1 Using GIS to Understand Behavior Patterns of Twitter Users

advertisement
Using GIS to Understand
Behavior Patterns of
Twitter Users
Yue Li
M.S. Civil/Geomatics Engineering
Purdue University
Committee: Dr.Jie Shan (Chair), Dr.Nicole Kong, Dr.James Bethel
1
Introduction
• Volunteered Geographic Information (VGI)1
− Emergency management, event detection,
tourist behavior, knowledge discovery…
• Twitter
− The most popular micro-blogging site
− Tweets with longitude and latitude
− A gold mine for scholars in geography,
linguistics, sociology, economics, health, and
psychology2
− Marketing, advertising, regulation,…
2
Research Goal
• To discover the spatio-temporal pattern of tweets
• To infer the human mobility patterns behind the
tweets
• To understand the lifestyle of college students
3
Study Area
• College town/city, Big Ten Universities
• West Lafayette, IN
− Most densely populated city in IN
− Home of Purdue University
• Ann Arbor, MI
− University of Michigan
• Bloomington, IN
− Indiana University, Bloomington
• Columbus, OH
− Ohio State University
4
Data
• Geo-tagged tweets downloaded with Twitter
Streaming API
• With longitude and latitude at time of posting
• Nov 18, 2013 to April 2, 2014
− West Lafayette : 59,238
− Ann Arbor: 220,117
− Bloomington :247,202
− Columbus: 1,936,238
5
Methods
• Pure Spatial
− Point density analysis
• Pure Temporal
• Spatio-Temporal
− Tweets in Land Use
− Event/Anomaly detection
− Individual twitter user patterns
6
Tweets in West Lafayette
7
Tweets in Ann Arbor
8
Tweets in Bloomington
9
Tweets in Columbus
10
Tweets by Hour
11
Tweets by Hour
12
Tweets and Land Use
•
Land use in Ann Arbor, MI
− Industrial
− Mixed Use
− Office
− Public/Education
− Recreation
− Residential
− Transportation
− Vacant
•
Spatially join the tweets
with land use
13
Tweets and Land Use
14
Tweets and Land Use
1 - Commercial; 2- Industrial; 3- Mixed Use; 4- Office; 5- Public/Education;
6 – Recreation; 7- Residence; 8- Transportation; 9- Vacant/River
15
Event Detection
• Spatially and temporally aggregated
− Football game, concert, festival,…
• Use Purdue shooting on Jan 21, 2014 as an example
− Lockdown from around 12-14pm
• Temporally
− 710 tweets in 12-14pm Jan 21, 231 unique users
− 7443 tweets in 12-14pm in the whole datasets, 1080
unique users
• Spatially
− How to measure spatial anomaly?
16
Hypotheses
• Challenge: Inhomogeneous/clustered process even
outside lockdown period
− Were tweets more significantly clustered during
lockdown than average?
• Intensity of tweets is correlated with distance to
campus buildings
• Extent of clustering is positively correlated with chisqare value
17
Covariate: Purdue Buildings
Purdue Building Shapefile converted to tesselation
R libraries: maptools, sp, spatstat
Functions: as.mask → im → tess
18
Randomization Test
Algorithm (by Ken Kellner):
1. Select 710 random tweets from dates 1/16/14 - 1/26/14 and hours 12am 14pm without replacement
2. Call quadratcount() and quadrat.test() on new random dataset
3. Save chi-square value
4. Repeat 1000 times to obtain distribution of chi-square values
5. Compare actual chi-square value obtained on 1/21/14 with distribution
6. Quasi-p value: proportion of values more extreme than obtained value
Assumption: greater chi-square value = more inhomogenous/clustered
Tested with simulation
19
Randomization Test Result
Chi-square: 85162.85
Quasi-p value: 0.038
•
We were able to detect a change in the pattern of tweets during the
lockdown, when presumably more people were stuck in Purdue
buildings than average.
20
Event Detection
• We can see anomaly from Twitter data both temporally
and spatially
• However, we are still looking for a complete and
integrated algorithm, and apply it to other events
• To be cont’d
21
Frequent Twitter Users
•
Top 10 Twitter users with the most tweets in Ann Arbor
•
Plot the tweets of individual Twitter user
•
Four typical patterns
− Work-Home
− Work-Road-Home
− Work-Home-Short Visit
− Multiple Clusters
22
Frequent Twitter Users
23
Frequent Twitter Users
24
Future Work
• On-going research
• Complete analysis in all 4 study areas, and
compare the patterns
• Develop/Find an algorithm for event detection
• …
• Any suggestions are welcomed!
25
References
• 1. Goodchild, M. F., 2007. Citizens as sensors: The
world of volunteered geography, GeoJournal, 69, 211221.
• 2. Ghosh, D., and R. Guha, 2013. What are we ‘tweeting’
about obesity? Mapping tweets with topic modeling
and Geographic Information System, Cartography and
Geographic Information Science, 40(2), 90-102.
26
QUESTIONS?
27
Download