Using GIS to Understand Behavior Patterns of Twitter Users Yue Li M.S. Civil/Geomatics Engineering Purdue University Committee: Dr.Jie Shan (Chair), Dr.Nicole Kong, Dr.James Bethel 1 Introduction • Volunteered Geographic Information (VGI)1 − Emergency management, event detection, tourist behavior, knowledge discovery… • Twitter − The most popular micro-blogging site − Tweets with longitude and latitude − A gold mine for scholars in geography, linguistics, sociology, economics, health, and psychology2 − Marketing, advertising, regulation,… 2 Research Goal • To discover the spatio-temporal pattern of tweets • To infer the human mobility patterns behind the tweets • To understand the lifestyle of college students 3 Study Area • College town/city, Big Ten Universities • West Lafayette, IN − Most densely populated city in IN − Home of Purdue University • Ann Arbor, MI − University of Michigan • Bloomington, IN − Indiana University, Bloomington • Columbus, OH − Ohio State University 4 Data • Geo-tagged tweets downloaded with Twitter Streaming API • With longitude and latitude at time of posting • Nov 18, 2013 to April 2, 2014 − West Lafayette : 59,238 − Ann Arbor: 220,117 − Bloomington :247,202 − Columbus: 1,936,238 5 Methods • Pure Spatial − Point density analysis • Pure Temporal • Spatio-Temporal − Tweets in Land Use − Event/Anomaly detection − Individual twitter user patterns 6 Tweets in West Lafayette 7 Tweets in Ann Arbor 8 Tweets in Bloomington 9 Tweets in Columbus 10 Tweets by Hour 11 Tweets by Hour 12 Tweets and Land Use • Land use in Ann Arbor, MI − Industrial − Mixed Use − Office − Public/Education − Recreation − Residential − Transportation − Vacant • Spatially join the tweets with land use 13 Tweets and Land Use 14 Tweets and Land Use 1 - Commercial; 2- Industrial; 3- Mixed Use; 4- Office; 5- Public/Education; 6 – Recreation; 7- Residence; 8- Transportation; 9- Vacant/River 15 Event Detection • Spatially and temporally aggregated − Football game, concert, festival,… • Use Purdue shooting on Jan 21, 2014 as an example − Lockdown from around 12-14pm • Temporally − 710 tweets in 12-14pm Jan 21, 231 unique users − 7443 tweets in 12-14pm in the whole datasets, 1080 unique users • Spatially − How to measure spatial anomaly? 16 Hypotheses • Challenge: Inhomogeneous/clustered process even outside lockdown period − Were tweets more significantly clustered during lockdown than average? • Intensity of tweets is correlated with distance to campus buildings • Extent of clustering is positively correlated with chisqare value 17 Covariate: Purdue Buildings Purdue Building Shapefile converted to tesselation R libraries: maptools, sp, spatstat Functions: as.mask → im → tess 18 Randomization Test Algorithm (by Ken Kellner): 1. Select 710 random tweets from dates 1/16/14 - 1/26/14 and hours 12am 14pm without replacement 2. Call quadratcount() and quadrat.test() on new random dataset 3. Save chi-square value 4. Repeat 1000 times to obtain distribution of chi-square values 5. Compare actual chi-square value obtained on 1/21/14 with distribution 6. Quasi-p value: proportion of values more extreme than obtained value Assumption: greater chi-square value = more inhomogenous/clustered Tested with simulation 19 Randomization Test Result Chi-square: 85162.85 Quasi-p value: 0.038 • We were able to detect a change in the pattern of tweets during the lockdown, when presumably more people were stuck in Purdue buildings than average. 20 Event Detection • We can see anomaly from Twitter data both temporally and spatially • However, we are still looking for a complete and integrated algorithm, and apply it to other events • To be cont’d 21 Frequent Twitter Users • Top 10 Twitter users with the most tweets in Ann Arbor • Plot the tweets of individual Twitter user • Four typical patterns − Work-Home − Work-Road-Home − Work-Home-Short Visit − Multiple Clusters 22 Frequent Twitter Users 23 Frequent Twitter Users 24 Future Work • On-going research • Complete analysis in all 4 study areas, and compare the patterns • Develop/Find an algorithm for event detection • … • Any suggestions are welcomed! 25 References • 1. Goodchild, M. F., 2007. Citizens as sensors: The world of volunteered geography, GeoJournal, 69, 211221. • 2. Ghosh, D., and R. Guha, 2013. What are we ‘tweeting’ about obesity? Mapping tweets with topic modeling and Geographic Information System, Cartography and Geographic Information Science, 40(2), 90-102. 26 QUESTIONS? 27